Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-graceful node shutdown #2268

Closed
xing-yang opened this issue Jan 14, 2021 · 70 comments · Fixed by #3320
Closed

Non-graceful node shutdown #2268

xing-yang opened this issue Jan 14, 2021 · 70 comments · Fixed by #3320
Assignees
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. stage/stable Denotes an issue tracking an enhancement targeted for Stable/GA status
Milestone

Comments

@xing-yang
Copy link
Contributor

xing-yang commented Jan 14, 2021

Enhancement Description

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 14, 2021
@xing-yang
Copy link
Contributor Author

/sig storage

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 14, 2021
@xing-yang
Copy link
Contributor Author

/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jan 14, 2021
@annajung annajung added tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status labels Jan 15, 2021
@annajung annajung added this to the v1.21 milestone Jan 15, 2021
@xing-yang xing-yang self-assigned this Jan 17, 2021
@annajung
Copy link
Contributor

Hi @xing-yang, 1.21 enhancements lead here.
I see that you’ve opted in this enhancement into 1.21, but I also see that this is tagged with participation from the SIG node. Is that accurate? If so, is there work that SIG node must deliver in 1.21 as well?

@yastij
Copy link
Member

yastij commented Jan 25, 2021

Hi @annajung - we're still in the process of seeing which changes are needed for sig-node

@jrsapi
Copy link

jrsapi commented Feb 5, 2021

Greetings @xing-yang ,
This is Joseph v1.21 enhancement shadow following up. For the enhancement to be included in the 1.21 milestone, it must meet the following criteria:

The KEP must be merged in an implementable state
The KEP must have test plans
The KEP must have graduation criteria
The KEP must have a production readiness review

Starting v1.21, all KEPs must include a production readiness review. Please make sure to take a look at the instructions and follow all steps.

Thank you!

@jrsapi
Copy link

jrsapi commented Feb 8, 2021

Greetings @xing-yang,

Enhancements Freeze is 2 days away, Feb 9th EOD PST

Enhancements team is aware that KEP update is currently in progress (PR #1116). Please make sure to work on PRR questionnaires and requirements and get them merged before the freeze. For PRR related questions or to boost the PR for PRR review, please reach out in slack #prod-readiness

Any enhancements that do not complete the following requirements by the freeze will require an exception.

[IN PROGRESS] The KEP must be merged in an implementable state
[IN PROGRESS] The KEP must have test plans
[IN PROGRESS] The KEP must have graduation criteria
[IN PROGRESS] The KEP must have a production readiness review

@xing-yang
Copy link
Contributor Author

Hi @jrsapi,
Thanks for the reminder! We still need more discussions to figure out some design issues. So it will not make it in 1.21.

@xing-yang xing-yang removed this from the v1.21 milestone Feb 8, 2021
@jrsapi jrsapi added tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team and removed tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team labels Feb 8, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 9, 2021
@YuikoTakada
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 24, 2021
@YuikoTakada
Copy link
Contributor

Thank you for this issue.
it would be better to update this issue's description according to:

We are trying to get KEP merged as "Provisional" and continue with prototyping in 1.22. We want to do more testing before targeting Alpha as this is a complicated problem.

In 1.23, we'll target Alpha.

Thanks!

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 22, 2021
@YuikoTakada
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 26, 2021
@xing-yang
Copy link
Contributor Author

/milestone v1.23

@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Aug 30, 2021
@salaxander salaxander removed the tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team label Aug 31, 2021
@xing-yang xing-yang added lead-opted-in Denotes that an issue has been opted in to a release stage/stable Denotes an issue tracking an enhancement targeted for Stable/GA status and removed stage/beta Denotes an issue tracking an enhancement targeted for Beta status labels May 22, 2023
@carlory
Copy link
Member

carlory commented May 24, 2023

Hi @xing-yang, I submit 2 PRs for code and website. Could you update the issue description?

@ruheenaansari34
Copy link

Hello @xing-yang 👋, 1.28 Enhancements team here.

Just checking in as we approach enhancements freeze on 1:00 UTC on Friday 16th June 2023.

This enhancement is targeting for stage stable for 1.28 (correct me, if otherwise)

Here's where this enhancement currently stands:

  • KEP readme using the latest template has been merged into the k/enhancements repo.
  • KEP status is marked as implementable for latest-milestone: 1.28
  • KEP readme has a updated detailed test plan section filled out
  • KEP readme has up to date graduation criteria
  • KEP has a production readiness review that has been completed and merged into k/enhancements.

For this KEP, we would need to take care of:

The status of this enhancement is marked as at risk. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

@SergeyKanzhelev
Copy link
Member

/milestone v1.28

@k8s-ci-robot k8s-ci-robot added this to the v1.28 milestone May 26, 2023
@Atharva-Shinde
Copy link
Contributor

Hey @xing-yang
With all the KEP requirements in place and merged into k/enhancements, this enhancement is all good for the upcoming enhancements freeze. 🚀

The status of this enhancement is marked as tracked. Please keep the issue description up-to-date with appropriate stages as well. Thank you :)

@YuikoTakada
Copy link
Contributor

YuikoTakada commented Jul 12, 2023

In order to move non-graceful node shutdown feature to GA,
kubernetes/kubernetes#118848 need to be merged by
01:00 UTC Wednesday 26nd July 2023 / 17:00 PDT Tuesday 25th July 2023: Week 11 — Test Freeze

according to https://github.com/kubernetes/sig-release/tree/master/releases/release-1.28 , right?

@ruheenaansari34
Copy link

Hey again @xing-yang 👋

Just checking in as we approach Code freeze at 01:00 UTC Friday, 19th July 2023 .

Here’s the enhancement’s state for the upcoming code freeze:

  • All the PRs that are related to your enhancement are linked in the above issue description (for tracking purposes). This includes code, tests, and documentation-related PR/s.
  • All code-related PR/s are merged or are in a merge-ready state (i.e. they have approved and lgtm labels applied) by the code freeze deadline. This includes any tests related PR/s too.

For this enhancement, it looks like the following code-related PR/s are open and they need to be merged or should be in a merge-ready state before the code freeze commences :

These are the code-related PR/s that I found on this KEP issue:

Please keep the issue description up-to-date with all the PR/s that are associated with this KEP and let me know if there are other PR/s in k/k we should be tracking for this KEP.

As always, we are here to help if any questions come up. Thanks!

@xing-yang
Copy link
Contributor Author

xing-yang commented Jul 17, 2023

Hi @ruheenaansari34, this PR kubernetes/kubernetes#118848 is a test so it should be merged by the test freeze deadline which is 01:00 UTC Wednesday 26nd July 2023 / 17:00 PDT Tuesday 25th July 2023.
All the code PRs that need to be merged by the code merge deadline are already merged. So we are good for tomorrow's code freeze. I've updated the issue description to reflect that. Thanks!

@Atharva-Shinde
Copy link
Contributor

Hey @xing-yang the documentation states that all the code related and test related PRs should at least be in a merge ready condition for the Code Freeze. I understand the confusion :) Let me get back to you back after a discussion with the release leads regarding the test freeze deadline. Thank you!

@xing-yang
Copy link
Contributor Author

Hi @Atharva-Shinde, before moving a feature to Beta, we require e2e tests to be merged first, otherwise, the feature will be blocked. This feature already has e2e test and integration test. When moving a feature to GA, we are adding additional tests so that should not block the feature. Please let me know what you and the release team think. Thanks!

@Atharva-Shinde
Copy link
Contributor

Hey @xing-yang thanks for the clarification, so the additional test related changes of this KEP might probably be merged/completed before the Test Freeze i.e 01:00 UTC Wednesday, 26th July, 2023, is that correct?

@xing-yang
Copy link
Contributor Author

xing-yang commented Jul 19, 2023

Hey @xing-yang thanks for the clarification, so the additional test related changes of this KEP might probably be merged/completed before the Test Freeze i.e 01:00 UTC Wednesday, 26th July, 2023, is that correct?

Hi @Atharva-Shinde, yes, that's the plan.

@xing-yang
Copy link
Contributor Author

There's a change in plan. We are not going to merge kubernetes/kubernetes#118848 because it depends on a specific cloud provider. We are working on adding an integration test instead: kubernetes/kubernetes#119478. Will try to get it merged when the code freeze is lifted.

@imdmahajankanika
Copy link

Hello @xing-yang! Is this feature require manual intervention? I mean if the problem occurs on production cluster during the non-working hours, does it need to wait until the cluster administrator arrives and add taint node.kubernetes.io/out-of-service on the problematic node?

@npolshakova
Copy link

/remove-label lead-opted-in

@k8s-ci-robot k8s-ci-robot removed the lead-opted-in Denotes that an issue has been opted in to a release label Aug 27, 2023
@YuikoTakada
Copy link
Contributor

Can we close this ticket? non-graceful node shutdown has moved to GA in ver1.28.

@SergeyKanzhelev
Copy link
Member

/close

@k8s-ci-robot
Copy link
Contributor

@SergeyKanzhelev: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jpbetz added a commit to jpbetz/enhancements that referenced this issue Nov 22, 2023
Shadow contributions:
- 1.27 shadow reviews: 6
- 1.28 shadow reviews: 9
- 1.29 - did not shadow due to time constraints with new SIG API Machinery TL role

Planned contributions:
- 1.30+ contribute to PRR, able to review roughly 12 KEPs per release.  I am happy to PRR more than 12 so long as they are SIG API Machinery KEPs, since I'll be reviewing those anyway..

Shadow reviewer promotion criteria:

Transitions from new to alpha

- kubernetes#3983
- kubernetes#3751

Transitions from alpha to beta

- kubernetes#3107
- kubernetes#2485

Transitions from beta to GA

- kubernetes#2268

Three enhancements that require coordination between multiple components.

- kubernetes#3751
- kubernetes#2485
- kubernetes#3107

Three enhancements that require version skew consideration (both HA and component skew): does behavior fail safely and eventually reconcile.

- kubernetes#2268 (component skew)
- kubernetes#2485 (component skew)
- kubernetes#3751 (HA skew - feature gated fields, component skew)
kubernetes#2268 (HA skew of controller considered)

Three enhancements that are outside your primary domain.

- kubernetes#3983 (SIG Node)
- kubernetes#3751 (SIG Storage)
- kubernetes#2268 (SIG Node)
- kubernetes#3107 (SIG Storage)

Examples where the feature requires considering the case of administering thousands of clusters. This comes up frequently for host-based features in storage, node, or networking.

- Yes. E.g. kubernetes#2268 (ability to analyze cluster in aggregate considered, rescheduling considered)

Examples where the feature requires considering the case of very large clusters. This is commonly covered by metrics.

- Yes. E.g. kubernetes#3751 (new API call, volume of calls considered)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. stage/stable Denotes an issue tracking an enhancement targeted for Stable/GA status
Projects
Status: Graduating
Status: Tracked
Development

Successfully merging a pull request may close this issue.