Scale down a deployment by removing specific pods (PodDeletionCost) #2255

ahg-g · 2021-01-12T14:05:49Z

Enhancement Description

One-line enhancement description (can be used as a release note): allow users to decide which pods should be scaled first
Kubernetes Enhancement Proposal: WIP
Discussion Link: Scale down a deployment by removing specific pods kubernetes#45509
Primary contact (assignee): @drbugfinder-work, @ahg-g
Responsible SIGs: sig-apps
Enhancement target (which target equals to which milestone):
- Alpha release target (1.21):
- Beta release target (1.22):
- Stable release target (1.24):
Alpha
- KEP (k/enhancements) update PR(s): https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/2255-pod-cost
- Code (k/k) update PR(s): Implements pod deletion cost kubernetes#99163
- Docs (k/website) update PR(s): ReplicaSet pod-deletion-cost annotation website#26739
Beta
- KEP (k/enhancements) update PR(s): Promote PodDeletionCost to Beta #2619
- Code (k/k) update PR(s): Graduate PodDeletionCost to Beta kubernetes#101080, Integration test for pod deletion cost feature kubernetes#101003
- Docs (k/website) update(s): PodDeletionCost to Beta website#28417

Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.

The text was updated successfully, but these errors were encountered:

ahg-g · 2021-01-12T14:06:09Z

/sig apps

ahg-g · 2021-02-03T23:31:14Z

@annajung @JamesLaverack james, you mentioned in the sig-apps slack channel that this enhancement is at risk, can you clarify why? it meets the criteria.

JamesLaverack · 2021-02-04T01:06:15Z

@ahg-g Just to follow up here too, we discussed in Slack and this was due to a delay in reviewing. We've now marked this as "Tracked" on the enhancements spreadsheet for 1.21.

Thank you for getting back to us. :)

JamesLaverack · 2021-02-19T23:53:07Z

Hi @ahg-g,

Since your Enhancement is scheduled to be in 1.21, please keep in mind the important upcoming dates:

Tuesday, March 9th: Week 9 — Code Freeze
Tuesday, March 16th: Week 10 — Docs Placeholder PR deadline
- If this enhancement requires new docs or modification to existing docs, please follow the steps in the Open a placeholder PR doc to open a PR against k/website repo.

As a reminder, please link all of your k/k PR(s) and k/website PR(s) to this issue so we can track them.

Thanks!

ahg-g · 2021-02-26T20:42:59Z

Hi @ahg-g,

Since your Enhancement is scheduled to be in 1.21, please keep in mind the important upcoming dates:

Tuesday, March 9th: Week 9 — Code Freeze

Tuesday, March 16th: Week 10 — Docs Placeholder PR deadline

If this enhancement requires new docs or modification to existing docs, please follow the steps in the Open a placeholder PR doc to open a PR against k/website repo.

As a reminder, please link all of your k/k PR(s) and k/website PR(s) to this issue so we can track them.

Thanks!

done.

JamesLaverack · 2021-03-02T11:29:27Z

Hi @ahg-g

Enhancements team is currently tracking the following PRs

Implements pod deletion cost kubernetes#99163

As this PR is merged, can we mark this enhancement complete for code freeze or do you have other PR(s) that are being worked on as part of the release?

ahg-g · 2021-03-02T11:30:56Z

Hi @JamesLaverack , yes the k/k code is merged, docs PR still open though.

ahg-g · 2021-05-05T00:48:09Z

/stage beta

ahg-g · 2021-05-05T00:48:30Z

/milestone v1.22

k8s-triage-robot · 2022-05-27T09:55:00Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

thesuperzapper · 2022-05-30T04:42:19Z

/remove-lifecycle rotten

thesuperzapper · 2022-08-10T04:22:12Z

Hey all watching! After thinking more about how we can make pod-deletion-cost GA, I believe I have an idea that will address most of the annotation-related concerns of the current implementation (while still maintaining backward compatibility with annotations, if they are present).

I still need to write up a full proposal and KEP, but my initial thoughts can be found at:

PROPOSAL configurable down-scaling behaviour in ReplicaSets & Deployments kubernetes#107598 (comment).

The gist of the idea is that we can make pod-deletion-cost a more transient value (rather than only storing it in annotations), by extending the /apis/apps/v1/namespaces/{namespace}/deployments/{name}/scale API so when a caller sends a PATCH which reduces replicas, they can include the pod-deletion-cost of one-or-more Pods, these costs will only affect the current down-scale (unlike the annotations, which must be manually cleared after scaling to remove their effect).

remiville · 2022-08-26T07:32:28Z

Many thanks for all your works and reflections on this subject, I'm strongly interest in the capacity to choose pods to be evicted during scale in and I try to follow up corresponding discussions and feature developments or proposals.

I searched for a long time how to achieve this correctly, I was happy with the PodDeletionCost but now I am a little disappointed as it seems that it will stay a beta (please do not remove this feature until an equivalent one is released).
To give my two cents, I will share my understanding of the issue and maybe, I hope, help to solve it in a simple and globally compatible manner (may be I should post elsewhere, I'm not familiar with your processes).

My need (which is maybe different of yours) is to selectively evict or replace terminated pods to keep a dynamic number of fresh pod replicas without terminating potentially running pods (I mean pods running applications currently processing something).
It is more or less a pool of pods with a minimum and maximum replicas, a current number of replica varying in function of external demands, and the rule to forbid to terminate a pod with activity inside.

I may be wrong, but I think the root cause of the problem is the incompatibility between the automatic pod restart and the scale-in features.
If the ReplicaSet automatically restarts terminated pods then it gives no chance to the application itself to indicate which pod to be evicted during scale in (I mean without using the API).

Without PodDeletionCost, one known workaround is to:

stop or delete the ReplicaSet
delete selected pods
decrease replica count accordingly
start or recreate the ReplicaSet.

For me this workaround speaks in favor the incompatibility between ReplicaSet and scale-in features to select pods to be evicted : currently that cannot work when mixed together.

Also I think one should avoid any controller to terminate a pod, it should be the application inside the pod that terminates, implying the its pod to terminate, then a controller could evict only already terminated pod.

Here is my proposal :

add an option to ReplicaSet or Deployment etc to not restart terminated pods (succeeded and/or failed).
Currently the restart policy can only be Always.
during scale-in prioritize terminated pods to be evicted (maybe that's already the case ?)

With these behaviors, scale-in will select pods to be evicted based on the inside pod applications termination status (here Succeeded or Failed) instead of external indicators.
If a custom controller is used to maintain a dynamic number of replica it will be able to remove or replace terminated pods just by decreasing the replica count or deleting them.

If this proposal is acceptable and can work it is maybe achievable with a minimal coding effort.

What do you think ?

remiville · 2022-08-30T15:22:37Z

Maybe my need is different because I need to automatically replace or delete terminated pods.
I have been able to select pods to remove from the replicaset by not terminating pods but setting the pod-delection-cost annotation instead, then a custom controller decrease the replica or delete pods accordingly.
Like evoked in PROPOSAL configurable down-scaling behaviour in ReplicaSets & Deployments something like a pod deletion cost probe would be better than the annotation to let the application indicates by itself that it must be prioritized for deletion.

I think there are two cases to distinguish during scale-in: the capacity to remove terminated pods from replicaset (without replacing them, which imply a ReplicaSet restartPolicy different than Always), and the capacity to remove running pods (using the probe).

rhockenbury · 2022-10-01T02:02:56Z

/milestone clear

k8s-triage-robot · 2022-12-30T03:00:34Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

thockin · 2023-01-14T00:10:37Z

@ahg-g I'm not in love with annotations as APIs. Do we REALLY think this is the best answer?

ahg-g · 2023-01-14T00:31:49Z

@ahg-g I'm not in love with annotations as APIs. Do we REALLY think this is the best answer?

I think we have a reasonable counter proposal in kubernetes/kubernetes#107598 (comment); can we hold this in its current beta state until that proposal makes progress?

k8s-triage-robot · 2023-04-14T00:37:26Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

hoerup · 2023-04-14T07:41:43Z

/remove-lifecycle stale

k8s-triage-robot · 2024-01-20T08:13:01Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

thesuperzapper · 2024-01-20T20:51:02Z

/remove-lifecycle stale

ddelange · 2024-01-20T23:19:50Z

would the following be an acceptable design pattern?

Pod gets a sidecar with permission to set its own PodDeletionCost
sidecar polls metric-server for current CPU usage of its own Pod
sidecar sets its own PodDeletionCost to the number of millicores returned by metrics-server
- next time scale-in happens, the most idle Pods in the ReplicaSet get deleted, and the busy Pods can stay busy
dev tweaks poll interval to the use case, and the load it causes on the cluster

k8s-triage-robot · 2024-04-19T23:44:53Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

thesuperzapper · 2024-04-19T23:50:52Z

/remove-lifecycle stale

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 12, 2021

k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 12, 2021

This was referenced Jan 12, 2021

Create 20200529-pod-cost-annotations #1828

Merged

Promote pod deletion cost KEP to implementable #2272

Merged

annajung added stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team labels Jan 27, 2021

annajung added this to the v1.21 milestone Jan 27, 2021

josephburnett mentioned this issue Feb 5, 2021

Scale down a deployment by removing specific pods kubernetes/kubernetes#45509

Closed

ahg-g changed the title ~~Scale down a deployment by removing specific pods~~ Scale down a deployment by removing specific pods (PodDeletionCost) Feb 17, 2021

ahg-g mentioned this issue Feb 17, 2021

Implements pod deletion cost kubernetes/kubernetes#99163

Merged

ahg-g mentioned this issue Feb 26, 2021

ReplicaSet pod-deletion-cost annotation kubernetes/website#26739

Merged

charith-elastic mentioned this issue Mar 3, 2021

Support for pod deletion cost elastic/cloud-on-k8s#4297

Closed

This was referenced Apr 6, 2021

Cost based scaling down of pods #1888

Closed

Integration test for pod deletion cost feature kubernetes/kubernetes#101003

Merged

ahg-g mentioned this issue Apr 13, 2021

Graduate PodDeletionCost to Beta kubernetes/kubernetes#101080

Merged

github-actions bot mentioned this issue Apr 21, 2021

Week Ending April 14, 2021 dev-obs/actus#357

Open

JamesLaverack added tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team and removed tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team labels Apr 25, 2021

k8s-ci-robot added stage/beta Denotes an issue tracking an enhancement targeted for Beta status and removed stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status labels May 5, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 27, 2022

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 30, 2022

damemi mentioned this issue Aug 3, 2022

Affinity should be enabled when RS deletes pods kubernetes/kubernetes#111655

Closed

k8s-ci-robot removed this from the v1.22 milestone Oct 1, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 30, 2022

thockin removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 14, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 14, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 14, 2023

Atharva-Shinde removed the tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team label May 14, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 19, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale down a deployment by removing specific pods (PodDeletionCost) #2255

Scale down a deployment by removing specific pods (PodDeletionCost) #2255

ahg-g commented Jan 12, 2021 •

edited

ahg-g commented Jan 12, 2021

ahg-g commented Feb 3, 2021

JamesLaverack commented Feb 4, 2021

JamesLaverack commented Feb 19, 2021

ahg-g commented Feb 26, 2021

JamesLaverack commented Mar 2, 2021

ahg-g commented Mar 2, 2021

ahg-g commented May 5, 2021

ahg-g commented May 5, 2021

k8s-triage-robot commented May 27, 2022

thesuperzapper commented May 30, 2022

thesuperzapper commented Aug 10, 2022

remiville commented Aug 26, 2022

remiville commented Aug 30, 2022

rhockenbury commented Oct 1, 2022

k8s-triage-robot commented Dec 30, 2022

thockin commented Jan 14, 2023

ahg-g commented Jan 14, 2023

k8s-triage-robot commented Apr 14, 2023

hoerup commented Apr 14, 2023

k8s-triage-robot commented Jan 20, 2024

thesuperzapper commented Jan 20, 2024

ddelange commented Jan 20, 2024

k8s-triage-robot commented Apr 19, 2024

thesuperzapper commented Apr 19, 2024

Scale down a deployment by removing specific pods (PodDeletionCost) #2255

Scale down a deployment by removing specific pods (PodDeletionCost) #2255

Comments

ahg-g commented Jan 12, 2021 • edited

Enhancement Description

ahg-g commented Jan 12, 2021

ahg-g commented Feb 3, 2021

JamesLaverack commented Feb 4, 2021

JamesLaverack commented Feb 19, 2021

ahg-g commented Feb 26, 2021

JamesLaverack commented Mar 2, 2021

ahg-g commented Mar 2, 2021

ahg-g commented May 5, 2021

ahg-g commented May 5, 2021

k8s-triage-robot commented May 27, 2022

thesuperzapper commented May 30, 2022

thesuperzapper commented Aug 10, 2022

remiville commented Aug 26, 2022

remiville commented Aug 30, 2022

rhockenbury commented Oct 1, 2022

k8s-triage-robot commented Dec 30, 2022

thockin commented Jan 14, 2023

ahg-g commented Jan 14, 2023

k8s-triage-robot commented Apr 14, 2023

hoerup commented Apr 14, 2023

k8s-triage-robot commented Jan 20, 2024

thesuperzapper commented Jan 20, 2024

ddelange commented Jan 20, 2024

k8s-triage-robot commented Apr 19, 2024

thesuperzapper commented Apr 19, 2024

ahg-g commented Jan 12, 2021 •

edited