-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Retriable and non-retriable Pod failures for Jobs #3329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
/sig apps |
/assign |
/assign |
/sig scheduling |
Hello @alculquicondor 👋, 1.25 Enhancements team here. Just checking in as we approach enhancements freeze on 18:00 PT on Thursday June 23, 2022, which is just over 2 days from now. For note, This enhancement is targeting for stage Here's where this enhancement currently stands:
The open PR #3374 is addressing all the listed criteria above. We would just require getting it merged by the Enhancements Freeze. For note, the status of this enhancement is marked as |
With KEP PR #3374 merged, the enhancement is ready for the 1.25 Enhancements Freeze. For note, the status is now marked as |
Hello @alculquicondor 👋, 1.25 Release Docs Lead here. Please follow the steps detailed in the documentation to open a PR against |
@Atharva-Shinde @alculquicondor there is one more PR that should be included before the code freeze: kubernetes/kubernetes#111475 |
With all PRs to k/k merged, this KEP is now |
@mimowo Look like kubernetes/kubernetes#126169 has tests related to this KEP. Please make sure to get it merged before the test freeze deadline (01:00 UTC Wednesday 31st July 2024 / 19:00 PDT Tuesday 30th July 2024). |
@sreeram-venkitesh thanks for reaching out, I have updated the PR and will try to merge it, but OTOH I don't think it is required for this release cycle, because we will not promote these tests in this cycle anyway (per kubernetes/kubernetes#125482 (comment)). We have promoted already 2 tests which didn't have the flakiness issue: kubernetes/kubernetes#125482 |
@edithturn I was busy preparing the code updates for the release cycle and missed this message, but I have already opened the placeholder PR for the blog-post, see #3329 (comment), and I'm working on the content. Can we still include it? |
@mimowo @alculquicondor Can we close this issue since the feature is stable? |
There are some post GA tasks to complete https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3329-retriable-and-non-retriable-failures#deprecation |
@alculquicondor in preparation for the next release, could you give me an ETA for when you expect the post GA tasks to complete? Do we need to track this KEP in the next cycle? |
The only pending work is to remove the feature-gate in 1.33. I think we can close the issue, I see that other issues corresponding to GA features which are pending feature-gate removal are already closed, examples: AdmissionWebhookMatchConditions, AggregatedDiscoveryEndpoint, APIListChunking. |
It might be confusing that we have the "Modify the code to ignore the PodDisruptionConditions and JobPodFailurePolicy feature gates" to reflect the actual state" task in the Deprecation, but it was already done. I have created the KEP update: #4835. @alculquicondor please add these two PRs to the implementation list in the issue: kubernetes/kubernetes#125994 and kubernetes/kubernetes#126102 |
Added |
Closing the issue seems ok, given the precedent of other KEPs that already graduated. |
/close |
@mimowo: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/remove-label lead-opted-in |
Enhancement Description
One-line enhancement description (can be used as a release note): An API to influence retries based on exit codes and/or pod deletion reasons.
Kubernetes Enhancement Proposal: https://git.k8s.io/enhancements/keps/sig-apps/3329-retriable-and-non-retriable-failures
Discussion Link: RFE: ability to define special exit code to terminate existing job kubernetes#17244
Primary contact (assignee): @alculquicondor
Responsible SIGs: apps, api-machinery, scheduling
Enhancement target (which target equals to which milestone):
Alpha
k/enhancements
) update PR(s):k/k
) update PR(s):k/website
) update PR(s): Add docs for KEP-3329 Retriable and non-retriable Pod failures for Jobs website#35219Beta
k/enhancements
) update PR(s):k/k
) update PR(s):k/website
) update(s):Stable
k/enhancements
) update PR(s): Graduate Job Pod Failure Policy to stable #4661k/k
) update PR(s):k/website
) update(s):The text was updated successfully, but these errors were encountered: