New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retriable and non-retriable Pod failures for Jobs #3329
Comments
/sig apps |
/assign |
/assign |
/sig scheduling |
Hello @alculquicondor 👋, 1.25 Enhancements team here. Just checking in as we approach enhancements freeze on 18:00 PT on Thursday June 23, 2022, which is just over 2 days from now. For note, This enhancement is targeting for stage Here's where this enhancement currently stands:
The open PR #3374 is addressing all the listed criteria above. We would just require getting it merged by the Enhancements Freeze. For note, the status of this enhancement is marked as |
With KEP PR #3374 merged, the enhancement is ready for the 1.25 Enhancements Freeze. For note, the status is now marked as |
Hello @alculquicondor 👋, 1.25 Release Docs Lead here. Please follow the steps detailed in the documentation to open a PR against |
@Atharva-Shinde @alculquicondor there is one more PR that should be included before the code freeze: kubernetes/kubernetes#111475 |
thank you @mimowo, |
Yes, that is my proposal
@kerthcet already has an open WIP KEP :) |
The max-restarts KEP isn't the same as restart-rules, though, right? They
all seem complementary but not the same.
…On Thu, Jun 1, 2023, 5:19 PM Aldo Culquicondor ***@***.***> wrote:
Yes, that is my proposal
I don't want to stand in the way of solving real problems, but I worry
that this becomes conceptual debt that we will never pay off (or even
remember!)
@kerthcet <https://github.com/kerthcet> already has an open WIP KEP :)
And we have already received good user feedback about the failure policy
at the job level.
—
Reply to this email directly, view it on GitHub
<#3329 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKWAVBJ3D5XAFG2TOBVLB3XJEWPRANCNFSM5XSDYNYA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
That's the point I think Job API can leverage the
Yes, currently max-restarts KEP only accounts for the |
Yes, these are the pieces of work that need to be done to support fully restart policy
Note that:
IMO doing (1.) - (4.) under this KEP would prolong its graduation substantially, and different points have different priorities, so it is useful to de-couple. Recently we got this slack ask to support |
Is this planning to go to beta in 1.28? It's on the board for PRR but I don't see an open KEP PR linked here? |
@johnbelamaric the KEP is already in Beta. There are two updates for the third iteration of Beta:
Do you think (2.) should be decoupled as a dedicated PR for this KEP -3329 or can stay as is? |
Also asking @wojtek-t who is PRR reviewer of the other KEP (3939). |
The other KEP is close to merge so seems safe to keep the update coupled there. |
I discussed this with @deads2k and he was ok having a single PR for changes in both KEPs, because changes in 3329 are minimal and highly related to the other KEP |
Sure that's fine |
I reviewed that integration part from the PRR POV carefully too. |
Hi @mimowo and @alculquicondor, just checking in again before the enhancement freeze coming up on 01:00 UTC Friday, 16th June 2023. This enhancement is marked as |
#3940 just merged 😄 |
With all the requirements fulfilled this enhancement is now marked as |
1.28 Docs Shadow here. Does this enhancement work planned for 1.28 require any new docs or modification to existing docs? If so, please follows the steps here to open a PR against dev-1.28 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday 20th July 2023. Also, take a look at Documenting for a release to get yourself familiarize with the docs requirement for the release. Thank you! |
We will also have an improvement to this KEP as part of: kubernetes/kubernetes#117015 in 1.28. |
Hey @alculquicondor , could you please create a docs PR even if it is a draft PR with no content yet against dev-1.28 branch in the k/website repo. The deadline to create this draft PR is Thursday 20th July 2023. |
Thanks @Rishit-dagli, that's gonna be kubernetes/website#41745 (yes, the enhancements are tightly coupled). I added the necessary PRs to the issue description above. |
Hey @alculquicondor 👋 Enhancements Lead here With all the implementation(code related) PRs merged as per the issue description: This enhancement is now marked as |
removed. |
/remove-label lead-opted-in |
@alculquicondor please update the list of code changes for Beta with this PR: kubernetes/kubernetes#121103. |
Hello 👋 1.30 Enhancements Lead here, I'm closing milestone 1.28 now, /milestone clear |
Enhancement Description
One-line enhancement description (can be used as a release note): An API to influence retries based on exit codes and/or pod deletion reasons.
Kubernetes Enhancement Proposal: https://git.k8s.io/enhancements/keps/sig-apps/3329-retriable-and-non-retriable-failures
Discussion Link: RFE: ability to define special exit code to terminate existing job kubernetes#17244
Primary contact (assignee): @alculquicondor
Responsible SIGs: apps, api-machinery, scheduling
Enhancement target (which target equals to which milestone):
Alpha
k/enhancements
) update PR(s):k/k
) update PR(s):k/website
) update PR(s): Add docs for KEP-3329 Retriable and non-retriable Pod failures for Jobs website#35219Beta
k/enhancements
) update PR(s):k/k
) update PR(s):k/website
) update(s):The text was updated successfully, but these errors were encountered: