Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose metrics about resource requests and limits that represent the pod model #1748

Open
4 of 6 tasks
smarterclayton opened this issue May 7, 2020 · 62 comments
Open
4 of 6 tasks
Assignees
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. stage/stable Denotes an issue tracking an enhancement targeted for Stable/GA status
Milestone

Comments

@smarterclayton
Copy link
Contributor

smarterclayton commented May 7, 2020

Enhancement Description

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 7, 2020
@smarterclayton
Copy link
Contributor Author

/sig instrumentation
/sig node
/sig scheduling

@k8s-ci-robot k8s-ci-robot added sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 7, 2020
@harshanarayana
Copy link

Hey there @smarterclayton -- 1.19 Enhancements shadow here. I wanted to check in and see if you think this Enhancement will be graduating in 1.19?

In order to have this part of the release:

  1. The KEP PR must be merged in an implementable state
  2. The KEP must have test plans
  3. The KEP must have graduation criteria.

The current release schedule is:

  • Monday, April 13: Week 1 - Release cycle begins
  • Tuesday, May 19: Week 6 - Enhancements Freeze
  • Thursday, June 25: Week 11 - Code Freeze
  • Thursday, July 9: Week 14 - Docs must be completed and reviewed
  • Tuesday, August 4: Week 17 - Kubernetes v1.19.0 released

If you do, I'll add it to the 1.19 tracking sheet (http://bit.ly/k8s-1-19-enhancements). Once coding begins please list all relevant k/k PRs in this issue so they can be tracked properly. 👍

Thanks!

@smarterclayton
Copy link
Contributor Author

I don't think we'll make implementable and merged by Tuesday, so should be targeted for 1.20

@harshanarayana
Copy link

Hey @smarterclayton Thanks for confirming the inclusion state. I've marked the Enhancement as Deferred in the Tracker and updating the milestone accordingly.

/milestone v1.20

@k8s-ci-robot k8s-ci-robot added this to the v1.20 milestone May 18, 2020
@harshanarayana harshanarayana added the tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team label May 18, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 16, 2020
@palnabarun
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 1, 2020
@kikisdeliveryservice
Copy link
Member

Hi @smarterclayton !

Enhancements Lead here, do you still intend to target this for alpha in 1.20?

Thanks!
Kirsten

@smarterclayton
Copy link
Contributor Author

Yes, this is target alpha for 1.20 assuming we can close the remaining questions in the KEP

@kikisdeliveryservice kikisdeliveryservice added tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status and removed tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team labels Sep 21, 2020
@kikisdeliveryservice
Copy link
Member

kikisdeliveryservice commented Sep 21, 2020

Thanks Clayton!!

As a reminder, by Enhancements Freeze (October 6th), KEPs must be:

  • merged in an implementable state (yours is provisional)
  • must have test plans (missing)
  • must have graduation criteria (missing)

Best,
Kirsten

I also added the PR link to the Issue description we can update again once merged.

@mikejoh
Copy link

mikejoh commented Sep 29, 2020

Hi @smarterclayton 👋!

I'm one of the Enhancement shadows for the 1.20 release cycle. This is a friendly reminder that the Enhancement freeze is on the 6th of October, i'm repeating the requirements needed by then:

  • The KEP must be merged in an implementable state.
    • It's provisional at the moment and i see that there's active work ongoing.
  • The KEP must have test plans.
    • This is missing at the moment.
  • The KEP must have graduation criteria(s).
    • This is also missing at the moment.

Thanks!

@smarterclayton
Copy link
Contributor Author

Thanks for the reminder, updated those. Will be working with the sig.

@kikisdeliveryservice
Copy link
Member

The current PR looks complete from a enhancements freeze POV, we'll monitor to see if it merges in time.

@kikisdeliveryservice
Copy link
Member

Hi @smarterclayton

Enhancements Freeze is now in effect. Unfortunately, your KEP PR has not yet merged. If you wish to be included in the 1.20 Release, please submit an Exception Request as soon as possible.

Best,
Kirsten
1.20 Enhancements Lead

@kikisdeliveryservice kikisdeliveryservice added tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team and removed tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team labels Oct 7, 2020
@kikisdeliveryservice kikisdeliveryservice removed this from the v1.20 milestone Oct 7, 2020
@Huang-Wei
Copy link
Member

@dashpole @logicalhan do you happen to find some volunteers to continue the work?

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Sep 26, 2022

Oh man, we didn't take this to beta?! This is my fault. Let me talk to @dgrisonnet who pinged me about it a day ago - originally the delay was gathering feedback from admins doing capacity planning, and I had been working with a few people on leveraging it more widely.

The use I was most familiar with was OpenShift and we replaced the dashboards that were using the (old, incorrect, not complete) kube-state-metrics for this - among the folks who did the change there was general agreement that the new metrics were superior and the cost of cardinality was worth it to replace the generally incorrect metrics from kube-state-metrics (at the time we felt that completely replicating the pod resource model code in ksm was not appropriate, and this was a better solution).. Next phase was getting community user input on building metric based capacity dashboards and whether the dimensions worked for the audience. I did a few analysis when planning out e2e CI runs and found the metrics provided better human visibility of comparing bulk "used vs requested".

@smarterclayton
Copy link
Contributor Author

@Huang-Wei re:

I just realized this metric has a pod label, which IMO increase the cardinality a lot and yield a pressure on the scraper side. Did you hear any concern/feedback from the users? Per the KEP, all the goals can be satisfied by removing the pod dimension as in terms of a metric, its primary goal is to give a high-level overview on aggregated pods' reqs/limits. Pod-level metric doesn't seem that common. WDYT?

The original intent was to allow admins to build capacity planning dashboards, and to pair the resource vs pod level resource metrics (like cpu, memory, etc). So the intent was very much to have a pod dimension. Do we have a proposal to remove or make optional pod level cpu consumption or memory consumption dimensions? If so, such a change would apply to this metric as well, but as this is already an optional endpoint for users who are concerned about cardinality.

@smarterclayton
Copy link
Contributor Author

To clarify - this is in beta since 1.21 (#1748 (comment)). Was there some belief that it was not beta?

It would be last step to go to GA, I'm happy to push that over the line with @dgrisonnet

@dgrisonnet
Copy link
Member

I also thought this was still in Alpha for some reason even though we have a label marking the stability 😅

Yet let's try to get this over the finish and gather some feedback from users to know if they encountered any issues with these new metrics.

/assign @smarterclayton @dgrisonnet

@dgrisonnet
Copy link
Member

Do we have a proposal to remove or make optional pod level cpu consumption or memory consumption dimensions? If so, such a change would apply to this metric as well, but as this is already an optional endpoint for users who are concerned about cardinality.

We already have a cardinality protection mechanism in Kubernetes: https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2305-metrics-cardinality-enforcement so users could already tweak the dimensions if needed.
That, plus the fact that the endpoint is optional, it sounds fairly safe to expose without having to worry about potential cardinality explosions.

@dgrisonnet
Copy link
Member

I opened kubernetes/kube-state-metrics#1846 in kube-state-metrics to make the transition to the kube-scheduler metrics.

@ehashman ehashman added lead-opted-in Denotes that an issue has been opted in to a release and removed lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. labels Jan 19, 2023
@dgrisonnet
Copy link
Member

@smarterclayton we are now recommending users to switch to the scheduler resource metrics in kube-state-metrics and there is an open PR to propagate the use of the new metrics in the Prometheus mixins: kubernetes-monitoring/kubernetes-mixin#815. Based on that, we should have a good enough user base for these metrics, so would it make sense to graduate the effort to stable?

@npolshakova npolshakova added this to the v1.27 milestone Jan 23, 2023
@npolshakova
Copy link

npolshakova commented Feb 2, 2023

Hello @dgrisonnet 👋, 1.27 Enhancements team here.

Just checking in as we approach enhancements freeze on 18:00 PDT Thursday 9th February 2023.

This enhancement is targeting for stage stable for 1.27 (correct me, if otherwise)

Here's where this enhancement currently stands:

  • KEP readme using the latest template has been merged into the k/enhancements repo.
  • KEP status is marked as implementable for latest-milestone: 1.27
  • KEP readme has a updated detailed test plan section filled out
  • KEP readme has up to date graduation criteria
  • KEP has a production readiness review that has been completed and merged into k/enhancements.

It looks like kubernetes/kubernetes#115454 and #3810 will address most of these issues.

The status of this enhancement is marked as at risk. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

@dgrisonnet
Copy link
Member

Hi @npolshakova, I completed the different action items, please let me know if there is anything else I need to do.

@npolshakova
Copy link

Great, I'm marking this enhancement as tracked for v1.27.
Thanks!

/remove-label tracked/no
/label tracked/yes

@k8s-ci-robot
Copy link
Contributor

@npolshakova: Those labels are not set on the issue: tracked/no

In response to this:

Great, I'm marking this enhancement as tracked for v1.27.
Thanks!

/remove-label tracked/no
/label tracked/yes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team label Feb 9, 2023
@npolshakova
Copy link

npolshakova commented Mar 9, 2023

Hi @dgrisonnet,

Checking in as we approach 1.27 code freeze at 17:00 PDT on Tuesday 14th March 2023.

Please ensure the following items are completed:

  • All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
  • All PRs are fully merged by the code freeze deadline.

Please let me know if there are any other PRs in k/k I should be tracking for this KEP.
As always, we are here to help should questions come up. Thanks!

@dgrisonnet
Copy link
Member

Hi @npolshakova, I have updated everything, thank you for the reminder :)

@LukeMwila
Copy link

Hi @dgrisonnet @smarterclayton, I’m reaching out from the 1.27 Release Docs team. This enhancement is marked as ‘Needs Docs’ for the 1.27 release.

Please follow the steps detailed in the documentation to open a PR against dev-1.27 branch in the k/website repo. This PR can be just a placeholder at this time, and must be created by March 16. For more information, please take a look at Documenting for a release to familiarize yourself with the documentation requirements for the release.

Please feel free to reach out with any questions. Thanks!

@dgrisonnet
Copy link
Member

Hi @LukeMwila, I completely missed the fact that we add to write a doc for this KEP. I opened a placeholder PR for now kubernetes/website#39970.

@smarterclayton do you perhaps remember what you had in mind for the doc? I was thinking about drafting something about capacity planning in general.

@Atharva-Shinde Atharva-Shinde removed tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team lead-opted-in Denotes that an issue has been opted in to a release labels May 14, 2023
@dashpole
Copy link
Contributor

@dgrisonnet reminder to write the documentation for this. After that, we can move the KEP to implemented and close this.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. stage/stable Denotes an issue tracking an enhancement targeted for Stable/GA status
Projects
Status: Tracked
Status: Needs Triage
Development

No branches or pull requests