-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include default requests/limits for child pods #348
Comments
Hi, @ohthehugemanatee. I think we have the same situation with the children nodes - it is hard to predict, depending on the number of pods per node and the workload (e.g. a lot of cronjobs (many short-lived pods/containers)).
You are 100% right. As it is now - it is up to the user: to install, check resource usage and set the limits. The default values (not only the limits) are not production ready. |
OK so this is the first problem to solve. From my understanding in a k8s situation the children should be very lightweight; it's the parent that really gets big. So a start would be to approach this only for the children. What about using the calculation from docs as a starting point? Add the ephemerality and tiers-related config values to the helm chart values, so we can use helm arithmetic to arrive at a high estimate default value. In the output after install, add a line to say "we set memory limits based on an estimate of your child node RAM requirements. There is almost certainly room to reduce these limits, see https://learn.netdata.cloud/... for more information. Even better would be to request feedback on a GH issue if the defaults suck, but you might get that regardless. 😃 |
Since a good deal of netdata's value is in keeping its' footprint as small as possible, and instances are stateless, adding default limits for child pods seem to make sense. I would include parent and k8s-state too, but AFAIK those are relative to the size/complexity of the cluster and therefore hard to predict.
Setting resource requests is just good practice. Setting limits prevent runaway processes.
Based on the docs I suggest defaulting the child to
The text was updated successfully, but these errors were encountered: