Pod Scheduling Readiness #3521

Huang-Wei · 2022-09-16T18:22:27Z

Enhancement Description

One-line enhancement description (can be used as a release note): Add a schedulingGates field to the Pod spec that marks a Pod's scheduling readiness.
Kubernetes Enhancement Proposal: 3521-pod-scheduling-readiness/README.md
Discussion Link: https://docs.google.com/document/d/1cPxF3wIbGRw_inv4CCG6z6qC4WjHAPR5GLT__dRBx7I/edit#
Primary contact (assignee): @Huang-Wei
Responsible SIGs: sig-scheduling
Enhancement target (which target equals to which milestone):
- Alpha release target (x.y): 1.26
- Beta release target (x.y): 1.27
- Stable release target (x.y):

Alpha

Give feedback

KEP: Pod Scheduling Readiness #3522

approved cncf-cla: yes kind/kep lgtm sig/scheduling size/XXL tide/merge-method-squash
[KEP-3521] Part 1: New Pod API .spec.schedulingGates kubernetes#113274

api-review approved area/code-generation area/e2e-test-framework area/test cncf-cla: yes kind/api-change kind/feature lgtm needs-priority release-note sig/apps sig/scheduling sig/testing size/XL triage/accepted
[KEP-3521] Part 2: Core scheduling implementation kubernetes#113275

api-review approved area/code-generation area/e2e-test-framework area/stable-metrics area/test cncf-cla: yes kind/api-change kind/feature lgtm needs-priority release-note sig/api-machinery sig/apps sig/instrumentation sig/scheduling sig/testing size/XL triage/accepted
[KEP-3521] Part 3: Bug fixes, integration & E2E Test kubernetes#113442

approved area/code-generation area/e2e-test-framework area/stable-metrics area/test cncf-cla: yes kind/api-change kind/feature lgtm needs-priority release-note-none sig/api-machinery sig/apps sig/instrumentation sig/scheduling sig/testing size/L triage/accepted
Fix an accuracy issue of scheduler_pending_pods metric kubernetes#113946

approved cncf-cla: yes kind/bug lgtm needs-priority needs-triage release-note-none sig/scheduling size/L
Doc for Alpha feature PodSchedulingReadiness website#37675

approved cncf-cla: yes language/en lgtm sig/docs sig/scheduling size/L
Options

Beta

Give feedback

KEP-3521: Graduate PodSchedulingReadiness to beta #3739

approved cncf-cla: yes kind/kep lgtm sig/scheduling size/L
Umbrella issue tracking Beta impl. of KEP 3521 (Pod Scheduling Readiness) kubernetes#113608

10 of 10

lifecycle/stale needs-triage sig/scheduling
Doc for Beta feature PodSchedulingReadiness website#39773

approved cncf-cla: yes language/en lgtm sig/docs size/XS
Options

Stable

Give feedback

KEP 3521/3838: graduate PodSchedulingReadiness to stable #4455

approved cncf-cla: yes kind/kep lgtm sig/scheduling size/M
Graduate PodSchedulingReadiness to stable kubernetes#123575

approved area/code-generation area/test cncf-cla: yes kind/api-change kind/feature lgtm needs-priority needs-triage release-note sig/api-machinery sig/apps sig/node sig/scheduling sig/testing size/XL triage/accepted
Docs (k/website) update(s):
Options

Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.

The text was updated successfully, but these errors were encountered:

Huang-Wei · 2022-09-16T18:24:24Z

/sig scheduling

ahg-g · 2022-09-19T17:31:57Z

/label lead-opted-in

ahg-g · 2022-09-19T17:45:15Z

/milestone v1.26

Atharva-Shinde · 2022-09-22T05:10:28Z

Hey @kerthcet 👋, 1.26 Enhancements team here!

Just checking in as we approach Enhancements Freeze on 18:00 PDT on Thursday 6th October 2022.

This enhancement is targeting for stage alpha for 1.26 (correct me, if otherwise)

Here's where this enhancement currently stands:

KEP file using the latest template has been merged into the k/enhancements repo.
KEP status is marked as implementable
KEP has an updated detailed test plan section filled out
KEP has up to date graduation criteria
KEP has a production readiness review that has been completed and merged into k/enhancements.

For this KEP, we would need to:

Change the status of kep.yaml to implementable (I've also added a comment review)
Include the new updated PR of this KEP in the Issue Description and get it merged before Enhancements Freeze to make this enhancement eligible for 1.26 release.

The status of this enhancement is marked as at risk. Please keep the issue description up-to-date with appropriate stages as well.
Thank you :)

Huang-Wei · 2022-09-23T04:57:14Z

Thanks @Atharva-Shinde.

The KEP is updated to marked as implementable
Waiting for @johnbelamaric or @wojtek-t to be the approver of PRR: KEP: Pod Scheduling Readiness #3522 (comment)

Atharva-Shinde · 2022-10-05T16:20:45Z

Hello @Huang-Wei 👋, just a quick check-in again, as we approach the 1.26 Enhancements freeze.

Please plan to get the PR #3522 merged before Enhancements freeze on 18:00 PDT on Thursday 6th October 2022 i.e tomorrow

For note, the current status of the enhancement is marked at-risk :)

Huang-Wei · 2022-10-05T21:35:04Z

Thanks for the reminder. It's 99% accomplished atm, just some final comments waiting for the approver to +1.

rhockenbury · 2022-10-06T21:16:57Z

With #3522 merged, we have this marked as tracked for v1.26.

marosset · 2022-10-31T19:17:53Z

Hi @Huang-Wei 👋,

Checking in once more as we approach 1.26 code freeze at 17:00 PDT on Tuesday 8th November 2022.

Please ensure the following items are completed:

All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
All PRs are fully merged by the code freeze deadline.

For this enhancement, it looks like the following PRs are open and need to be merged before code freeze:

Please ensure all of these PRs are linked to this issue as well and mentioned in the initial issue description.

As always, we are here to help should questions come up. Thanks!

Huang-Wei · 2022-10-31T19:47:43Z

@marosset yes, those 3 PRs are code implementation for this KEP in alpha stage. I just updated the issue description to get them linked.

krol3 · 2022-11-07T18:01:36Z

Hello @Huang-Wei 👋, 1.26 Release Docs Lead here. This enhancement is marked as ‘Needs Docs’ for 1.26 release.

Please follow the steps detailed in the documentation to open a PR against dev-1.26 branch in the k/website repo. This PR can be just a placeholder at this time, and must be created by November 9. Also, take a look at Documenting for a release to familiarize yourself with the docs requirement for the release.

Any doubt, reach us! Thank you!

marosset · 2022-11-07T18:53:36Z

Hi @Huang-Wei 👋,

Checking in once more as we approach 1.26 code freeze at 17:00 PDT on Tuesday 8th November 2022.

Please ensure the following items are completed:

All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
All PRs are fully merged by the code freeze deadline.

For this enhancement, it looks like the following PRs are open and need to be merged before code freeze:

As always, we are here to help should questions come up. Thanks!

Huang-Wei · 2022-11-07T19:26:21Z

@marosset ACK, I'm working with reviewers to get 2 pending PRs merged by tomorrow.

Huang-Wei · 2022-11-08T23:17:46Z

I'm working with reviewers to get 2 pending PRs merged by tomorrow.

All dev work has been merged.

liggitt · 2024-03-25T16:56:25Z

For people wanting to request a specific node but still use the scheduling lifecycle / scheduling gates, etc, they should do what the daemonset controller does, and use nodeAffinity to target a single node without setting spec.nodeName

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - <node-name-here>

Huang-Wei · 2024-03-25T17:25:27Z

.spec.nodeName was historically designed as a terminal state of pod's destination to run. You may find quite some enforcements building atop it:

.spec.nodeName can only be set once (there was a flaw that we didn't make it RBAC-ed in the first place though)
.spec.nodeName cannot be cleared once set
recommended way to set .spec.nodeName is to create a v1.Binding request and delegate the "setting .spec.nodeName" to API Server - which is the way kube-scheduler does, but not always for other schedulers
setting .spec.nodeName directly would cause conflicting scheduling results (it'd behave like you have multi scheduler running at the same time) - it was the motivation of enforcing scheduler to take over all Pod's scheduling including Daemonset pods

So, to adhere to existing semantics, which we cannot change basically, like Jordan said, express the desire of "placing a pod to node X" via nodeAffinity/nodeSelector rather than setting .spec.nodeName.

liggitt · 2024-03-25T17:37:05Z

.spec.nodeName can only be set once (there was a flaw that we didn't make it RBAC-ed in the first place though)

RBAC didn't exist until Kube 1.6, spec.nodeName was there from the start :-)

pohly · 2024-03-26T06:54:43Z

Nice summary, just one follow-up for my own education:

recommended way to set .spec.nodeName is to create a v1.Binding request and delegate the "setting .spec.nodeName" to API Server

Why is this better than setting it directly (for example, via a patch)?

iholder101 · 2024-03-26T13:18:08Z

@alculquicondor

There was a time in the Kubernetes history when Status was supposed to only contain information that could be restored from something else. The node node name could not have been restored from somwhere else, so the decision was made to put it in the Spec.

That assumption is no longer true today. Status can be treated as a source of truth and it would be more appropriate for nodeName to be in Status. But it's not, because it would be a breaking change.

I (and all of us, I believe) fully agree. That's a bad, unrecommended and confusing field that would have never been able to get in today and it is there only for historical reasons.

That said, that doesn't imply that the nodeName means or ever meant "I want to land on node X". It means: "this pod is in node X".

With this statement I don't fully agree.
You're treating this field as if it's a purely status field, but in essence it's not, it's a spec field. And while it's unrecommended to use it, people still use it and it is valid as long as this field is not deprecated. Not only that, but as a user it would be completely unintuitive to me that I need to set node affinity and not use this field.

Furthermore, it's not accurate that is means "this pod is in node X". For example, if I try to run a pod with nodeName set but that asks for more resources than available on the node, I'll see this:

> k get pods -w
NAME     READY   STATUS        RESTARTS   AGE
fedora   0/1     OutOfmemory   0          4s

What actually happened here?
What happened is that the user said "the desired state for my pod is that it needs to run on node X". But, in practice, the actual state is that the pod cannot run on that node. Saying that this pod is "on node X" is not accurate. It's more accurate to say "the pod wants to run on node X, but it is not possible".

Yes, it is reasonable because that's not what the nodeName is intented for.

Would you also say that for standard Resource Quota?
In other words, would it be reasonable in your opinion that if a ResourceQuota exists on a namespace it would deny pods that set nodeName? I would assume that it would be completely unacceptable, and that the community would demand to support this scenario (and that's indeed how it's implemented). Would you agree?

The nodeName field is a field that is for schedulers to manage, not end users or other kinds of controllers. It is as if the users of this field say "I want to bypass scheduler or an external quota mechanism".

Letting the user say "I want to bypass external quota mechanism" is absolutely against the concept of quota in the first place. The whole idea is forcing users under a certain namespace to comply to the quota. Effectively this means that external quota mechanisms have to work that out, like we do in the AAQ operator. In my opinion this is a major issue since one of the main goals of this KEP is to support such external mechanisms.

And again, the user saying "I want to bypass quota" is absolutely unacceptable with the standard Resource Quota, so I feel it's a bit unfair to apply this rule only for external mechanisms.

I guess that my perspective can be boiled to: nodeName is either a valid field in spec that's acceptable to use by users to specify their pods' desired state or it is an internal field that should be never used by users.

If it is a valid spec field - we should treat setting it as a completely valid use-case that should be supported with both standard and external k8s mechanisms.

If it's an internal field that should never be set by users - we should deprecate it and disallow its usage.

Saying "it's a valid spec field, but users should never use it, therefore it's fine to not support it, but we won't deprecate it because we don't want to break it" seems contradicting to me.

iholder101 · 2024-03-26T13:19:38Z

For people wanting to request a specific node but still use the scheduling lifecycle / scheduling gates, etc, they should do what the daemonset controller does, and use nodeAffinity to target a single node without setting spec.nodeName
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - <node-name-here>

Thanks @liggitt.

For my education: is there a user-case where someone would need to use nodeName instead of node-affinity? Or is it true that in 100% of the times it's better to set node affinity instead of nodeName?

liggitt · 2024-03-26T14:21:06Z

That said, that doesn't imply that the nodeName means or ever meant "I want to land on node X". It means: "this pod is in node X".

With this statement I don't fully agree.

As soon as a pod with this field is created, it appears in the corresponding node's pod watch (filtered to spec.nodeName=$myName). The field does mean the pod is scheduled to the node.

As you point out, the node can still reject it or terminate it, but the pod is being handled by the node at that point, for better or worse.

For my education: is there a user-case where someone would need to use nodeName instead of node-affinity? Or is it true that in 100% of the times it's better to set node affinity instead of nodeName?

If they are writing something that is intended to roll pod creation and scheduling into a single step, and are acting as both pod creator and scheduler, setting spec.nodeName on create is logically coherent. Anyone intending to use the normal scheduling flow should not set spec.nodeName.

If it's an internal field that should never be set by users - we should deprecate it and disallow its usage.

"Internal" is a blurry line for an API-driven system. Is a custom scheduler "internal"? Is a custom create-and-schedule-in-a-single-step integration "internal"?

We will not break the current use of the field. We can improve documentation about it.

Huang-Wei · 2024-03-26T17:04:27Z

Why is this better than setting it directly (for example, via a patch)?

@pohly by setting .spec.nodeName, it's complex to be guarded against (you have to build admission control or leverage OPA); while via a separate API endpoint (v1.Binding), it's easier to be guarded by RBAC.

pohly · 2024-03-28T08:20:47Z

So the goal was to make setting .spec.nodeName only possible through v1.Binding, it's just not enforced? Is that still the intent or has that goal been discarded as infeasible (major breaking change)?

Barakmor1 · 2024-03-28T08:51:00Z

Please try to look at this from a user perspective. Setting .spec.nodeName seems like the intuitive choice when you want a pod to be deployed on a specific node. It appears natural and convenient. However, the correct setting, which might not be immediately intuitive, involves a long and complex expression:

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
            - key: metadata.name
              operator: In
              values:
              - <node-name-here>

If they are writing something that is intended to roll pod creation and scheduling into a single step, and are acting as both pod creator and scheduler, setting spec.nodeName on create is logically coherent. Anyone intending to use the normal scheduling flow should not set spec.nodeName.

Is there an example of a practical case when this is needed? I still don't see why nodeAffinity is not always the right choice in practice.

alculquicondor · 2024-03-28T12:19:09Z

There is another option:

spec:
  nodeSelector:
    kubernetes.io/hostname: <node-hostname>

alculquicondor · 2024-03-28T12:20:57Z

Is there an example of a practical case when this is needed?

To oversimplify: only use .spec.nodeName if you are a scheduler.

I still don't see why nodeAffinity is not always the right choice in practice.

From a user perspective, it is the right choice. Unless you are an administrator trying to run a pod in an emergency, for example, if the scheduler is down.

Barakmor1 · 2024-03-28T12:24:49Z

There is another option:

spec:
  nodeSelector:
    kubernetes.io/hostname: <node-hostname>

this label can be modified

iholder101 · 2024-03-28T14:25:39Z

If they are writing something that is intended to roll pod creation and scheduling into a single step, and are acting as both pod creator and scheduler, setting spec.nodeName on create is logically coherent. Anyone intending to use the normal scheduling flow should not set spec.nodeName.

To oversimplify: only use .spec.nodeName if you are a scheduler.

@liggitt @alculquicondor

I'm trying to understand if there's an actual use-case for non-schedulers to use this field.
Except for the scheduler being down, what do I gain from setting nodeName instead of node affinity? Is there any difference from a user's perspective whatsoever? Even if I would write a custom scheduler, what would I gain from using nodeName instead node affinity?

I'm not sure I fully understand why we'd want to keep this field around. Do we want to grant users the possibility of bypassing the scheduler, especially if there is never a valuable use-case that justifies it?

The following crazy idea pops into my head:

Add a .status.nodeName field. This is the field that should be watched by k8s components / external controllers to understand to which node the pod is scheduled.
If deprecating .spec.nodeName is completely unacceptable for backward compatibility reasons, k8s can mutate it to the equivalent node-affinity, perhaps alongside a warning that notifies the user of what happened.
Longer term - if possible - deprecate and remove this field entirely.

If there's a must-have concrete reason (I haven't seen one yet) to let users to bypass the scheduler, we can introduce a field with a clearer name like spec.bypassScheduling so it would be clear what its use is.

Just a crazy idea :)

liggitt · 2024-03-28T15:05:42Z

Even if I would write a custom scheduler, what would I gain from using nodeName instead node affinity?

An integration that both created and scheduled the pod would set spec.nodeName directly, instead of creating the pod and immediately calling pods/binding to set spec.nodeName. Translating spec.nodeName on create into affinity would break that integration.

I'm not sure I fully understand why we'd want to keep this field around.

There are lots of readers of the field, so we would never remove it. We continue to allow writing it on pod create for compatibility with existing integrations that set it on create.

we can introduce a field with a clearer name like spec.bypassScheduling so it would be clear what its use is.

Requiring a new field to be set to keep existing behavior is just as breaking for compatibility :-/

alculquicondor · 2024-03-28T17:22:52Z

I'm trying to understand if there's an actual use-case for non-schedulers to use this field.

Again, oversimplifying, there is no use-case.

alculquicondor · 2024-03-28T17:25:47Z

this label can be modified

Yes, it can, but you might be breaking a few first-party and third-party controllers that assume that this label matches the nodeName or at least that it is unique. The label is documented as well-known, so it should be treated with care https://kubernetes.io/docs/reference/labels-annotations-taints/#kubernetesiohostname

fabiand · 2024-04-05T11:44:41Z

A good example of how .spec.nodeName means "scheduling done" is by setting nodeName and an arbitrary resource, the resource will be ignored, and the pod fails with

…
status:
  phase: Failed
…
  message: 'Pod was rejected: Node didn''t have enough resource: cpu, requested: 400000000, used: 400038007, capacity: 159500'
  reason: OutOfcpu
…
  containerStatuses:
    - name: nginx
      state:
        terminated:
          exitCode: 137
          reason: ContainerStatusUnknown
          message: The container could not be located when the pod was terminated
…
      image: 'nginx:1.14.2'
      started: false

To some extent .spec.nodeName could be considered to be a scheduling gate as well - a special one: Only if it is set, then we can (are) schedule(d).
However, due to it's special meaning in the coordination with kubelet, we can not change it's seamtics.

I do not know the current state, but I do wonder - if we are not already doing this today - a Pod with scheduling gates and spec.nodeName should be rejected at admission time.

vladikr · 2024-04-05T12:01:55Z

A good example of how .spec.nodeName means "scheduling done" is by setting nodeName and an arbitrary resource, the resource will be ignored, and the pod fails with
…
status:
  phase: Failed
…
  message: 'Pod was rejected: Node didn''t have enough resource: cpu, requested: 400000000, used: 400038007, capacity: 159500'
  reason: OutOfcpu
…
  containerStatuses:
    - name: nginx
      state:
        terminated:
          exitCode: 137
          reason: ContainerStatusUnknown
          message: The container could not be located when the pod was terminated
…
      image: 'nginx:1.14.2'
      started: false
To some extent .spec.nodeName could be considered to be a scheduling gate as well - a special one: Only if it is set, then we can (are) schedule(d). However, due to it's special meaning in the coordination with kubelet, we can not change it's seamtics.

I do not know the current state, but I do wonder - if we are not already doing this today - a Pod with scheduling gates and spec.nodeName should be rejected at admission time.

@fabiand Yes, it will be rejected at the admission.
However, the Application-Aware Quota adds the scheduling gate to pods at the admission time as well.
Therefore, pods created with .spec.nodeName will get rejected in the namespaces where AAQ operates.
One example is with kubectl debug node which explicitly sets .spec.nodeName in the pod spec.

fabiand · 2024-04-05T13:30:15Z

I share that it's a general problem, but due to the special handling of .spec.nodeName I do not see how we can resolve the problem in

a backwards compatile manner
keeping the user spec

I do fear that - in your example - kubectl debug or oc debug should change and use affinity instead.

The core problem is that kubelet starts to react once nodeName is set.

Was it considered to change kubelet to only start acting once nodeName is set and schedulingGates is empty?

alculquicondor · 2024-04-05T14:19:56Z

Was it considered to change kubelet to only start acting once nodeName is set and schedulingGates is empty?

According to the version skew policy, the change would have to be in the kubelet for 3 versions before we can relax the validation in apiserver.

I guess that could be backwards compatible if we start in 1.31 and we allow scheduling_gates + nodeName in 1.34.
@liggitt wdyt?

One example is with kubectl debug node which explicitly sets .spec.nodeName in the pod spec.

IMO, that falls under the special case where it might make sense to skip scheduler or an external quota system. You probably wouldn't even want to set requests in a debug pod.

fabiand · 2024-04-05T14:29:59Z

probably wouldn't even want to set requests in a debug pod.

FWIW - I do wonder if debug pods should actually be guaranteed. I had a couple of cases where debug pods (as best effort) got killed quickly on busy nodes.

liggitt · 2024-04-05T14:34:05Z

I guess that could be backwards compatible if we start in 1.31 and we allow scheduling_gates + nodeName in 1.34.
@liggitt wdyt?

that seems like making the problem and confusion around use of spec.nodeName worse to me... I don't see a compelling reason to do that

iholder101 · 2024-04-07T05:08:23Z

@alculquicondor

IMO, that falls under the special case where it might make sense to skip scheduler or an external quota system.

TBH, I still try to understand how skipping the scheduler is ever helpful (when you're not using a custom scheduler).
Can you please elaborate on how skipping the scheduler helps in this case? In other words, if we would change kubectl debug to use node affinity instead of nodeName, what would be the implications?

You probably wouldn't even want to set requests in a debug pod.

While this might be correct, the question to me is who makes the decision. Granting a user a knob to skip quota mechanisms feels to me like granting a linux user to bypass permission checks when writing to a file. In both cases the whole idea is to restrict the users and enforce them to comply to a certain policy. Handing the user the possibility to bypass such mechanisms seems entirely contradicting to me and de-facto it makes external quota mechanisms unpractical.

@liggitt

that seems like making the problem and confusion around use of spec.nodeName worse to me... I don't see a compelling reason to do that

Are you open to discussion on that?
IMHO this behavior is consistent with scheduling gates' intent to (as the KEP states) force creating pods in a "not-ready to schedule" state. I understand that technically the pod is not scheduled at all with nodeName set (although I think we all agree it's a surprising behavior that's kept for backward compatibility), however having kubelet to wait for the scheduling gates to be removed before running the pod with a some kind of "not ready" condition sounds very intuitive and consistent to me.

This way we can avoid breaking backward compatibility, support external quota mechanisms and extend scheduling-gates in a consistent matter, which IMHO makes the exceptional nodeName case to be less exceptional.

liggitt · 2024-04-08T18:19:31Z

having kubelet to wait for the scheduling gates to be removed before running the pod with a some kind of "not ready" condition sounds very intuitive and consistent to me.

Not to me... expecting pods which are already assigned to a node to run through scheduling gate removal phases (which couldn't do anything to affect the selected node) seems more confusing than the current behavior which simply forbids that combination. I don't think we should relax validation there and increase confusion.

fabiand · 2024-04-08T18:33:25Z

I (sadly) concur - If the nodeName is set, then it's a matter of fact that the scheduling has happened. This is the contract we can not break.

fabiand · 2024-04-10T12:43:13Z

Fun fact: I created a deployment with pod template that had nodeName defined AND a resource (cpu: 400k) which can not be met. The Deployment had an RC of 2. But due to this scheduling contradiction it ended up with 12k failed pods.

IOW: I wonder if this spec.nodeName is troublesome for other controllers in kube as well.

Barakmor1 · 2024-04-11T07:15:21Z

I share that it's a general problem, but due to the special handling of .spec.nodeName I do not see how we can resolve the problem in

a backwards compatile manner

keeping the user spec

I do fear that - in your example - kubectl debug or oc debug should change and use affinity instead.

The reason kubectl debug or oc debug creates pods using .spec.nodeName instead of affinity is likely because it allows the debug pod to bypass taint constraints.

Although it might be achievable with tolerations as well.

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Sep 16, 2022

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 16, 2022

Huang-Wei mentioned this issue Sep 16, 2022

KEP: Pod Scheduling Readiness #3522

Merged

k8s-ci-robot added the lead-opted-in Denotes that an issue has been opted in to a release label Sep 19, 2022

k8s-ci-robot added this to the v1.26 milestone Sep 19, 2022

rhockenbury assigned Huang-Wei Sep 19, 2022

rhockenbury added tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status labels Sep 20, 2022

Huang-Wei mentioned this issue Oct 6, 2022

KEP-3521: reword literal schedule to scheduling #3597

Merged

Huang-Wei changed the title ~~Pod Schedule Readiness~~ Pod Scheduling Readiness Oct 6, 2022

kerthcet mentioned this issue Oct 8, 2022

Cluster autoscaler improvements for AI workloads kubernetes/autoscaler#5170

Open

This was referenced Oct 21, 2022

Blog about pod scheduling readiness kubernetes/website#37436

Merged

Umbrella issue tracking Alpha impl. of KEP 3521 (Pod Scheduling Readiness) kubernetes/kubernetes#113269

Closed

This was referenced Nov 2, 2022

Doc for Alpha feature PodSchedulingReadiness kubernetes/website#37675

Merged

Umbrella issue tracking Beta impl. of KEP 3521 (Pod Scheduling Readiness) kubernetes/kubernetes#113608

Closed

This was referenced Mar 26, 2024

Upgrade note about nodeName to warning kubernetes/website#45683

Merged

Update API comment for nodeName to match system behavior kubernetes/kubernetes#124062

Open

Pod Scheduling Readiness #3521

Pod Scheduling Readiness #3521

Comments

Huang-Wei commented Sep 16, 2022 • edited

Enhancement Description

Alpha

Beta

Stable

Huang-Wei commented Sep 16, 2022

ahg-g commented Sep 19, 2022

ahg-g commented Sep 19, 2022

Atharva-Shinde commented Sep 22, 2022

Huang-Wei commented Sep 23, 2022

Atharva-Shinde commented Oct 5, 2022

Huang-Wei commented Oct 5, 2022

rhockenbury commented Oct 6, 2022

marosset commented Oct 31, 2022

Huang-Wei commented Oct 31, 2022

krol3 commented Nov 7, 2022

marosset commented Nov 7, 2022

Huang-Wei commented Nov 7, 2022

Huang-Wei commented Nov 8, 2022

liggitt commented Mar 25, 2024

Huang-Wei commented Mar 25, 2024

liggitt commented Mar 25, 2024

pohly commented Mar 26, 2024 • edited

iholder101 commented Mar 26, 2024 • edited

iholder101 commented Mar 26, 2024

liggitt commented Mar 26, 2024

Huang-Wei commented Mar 26, 2024

pohly commented Mar 28, 2024

Barakmor1 commented Mar 28, 2024 • edited

alculquicondor commented Mar 28, 2024

alculquicondor commented Mar 28, 2024

Barakmor1 commented Mar 28, 2024

iholder101 commented Mar 28, 2024

liggitt commented Mar 28, 2024 • edited

alculquicondor commented Mar 28, 2024

alculquicondor commented Mar 28, 2024

fabiand commented Apr 5, 2024

vladikr commented Apr 5, 2024

fabiand commented Apr 5, 2024

alculquicondor commented Apr 5, 2024

fabiand commented Apr 5, 2024

liggitt commented Apr 5, 2024

iholder101 commented Apr 7, 2024

liggitt commented Apr 8, 2024

fabiand commented Apr 8, 2024

fabiand commented Apr 10, 2024

Barakmor1 commented Apr 11, 2024 • edited

Huang-Wei commented Sep 16, 2022 •

edited

pohly commented Mar 26, 2024 •

edited

iholder101 commented Mar 26, 2024 •

edited

Barakmor1 commented Mar 28, 2024 •

edited

liggitt commented Mar 28, 2024 •

edited

Barakmor1 commented Apr 11, 2024 •

edited