-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[new] promtail: add readline rate limit #5031
[new] promtail: add readline rate limit #5031
Conversation
@cyriltovena review PR please ,this is important feature. |
Thanks @liguozhong. Will it also be possible to limit based on tags? So we only limit one service and not all services within a promtail instance? |
done. |
It feels like @Whyeasy and @liguozhong have different usage for this ? @Whyeasy wanted to drop logs after a certain rate, @liguozhong you seem to use it to avoid spike of CPU but you keep the logs trading lag for CPU. |
I think @Whyeasy wants a rate limiter stage not a global stage. |
@liguozhong can you expand a bit more on your usage please ? |
Logs are very important and do not need to be discarded. Local files can be cached for a long time. This PR code can work well in our monitoring |
We collect the envoy access log of the istio component in k8s to draw the golden metrics of service access traffic. |
Because the default k8s promatil config file has 4 jobs, it is difficult for us to configure different current limiting rules for each job. But our business can accurately know the rate of access log generation per second. So global rate limiter is more suitable for promtail in k8s daemonset
|
@liguozhong I understand logs are important and we can indeed buffer a lot. But what if there is a service that keeps on spamming and the buffer only grows? At some point we would like to drop those logs for that particular service. That's the reason it would be nice to do it based on tags. This way we can say we drop logs for everything non-prod if it spams to much, but we buffer it for prod. |
Your usage is probably to add a stage pipeline operator like this, and I will commit another PR to handle your usage. |
That make sense to have 2 different feature set. Still I'm not understanding what problem the global limit is solving for you. |
Because of the limitation of the cpu limit of the pod in k8s. 1: When there is no rate limit, the cpu usage of the pod will exceed the cpu limit, causing the pod to enter a frozen state. 2: When with rate limit, the cpu usage of the pod will not exceed the cpu limit of the pod, so that the promtail pod will not enter the docker freeze state and work stably and continuously, so that there will be no data loss. If there is no global rate limit function, our online environment will continue to receive the following pod cpu alarms, and the promtail pod will continue to be in an unstable state。
|
Right ok so it's for throttling the CPU to avoid hitting the CPU limit. |
type Config struct { | ||
ReadlineRate float64 `yaml:"readline_rate" json:"readline_rate"` | ||
ReadlineBurst int `yaml:"readline_burst" json:"readline_burst"` | ||
ReadlineRateEnabled bool `yaml:"readline_rate_enabled,omitempty" json:"readline_rate_enabled"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add a config to decided if we want to drop or not the log line. This makes it usable in multiple use case.
A bit like the stage one but global.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clients/pkg/promtail/limit/config.go
Outdated
f.Float64Var(&cfg.ReadlineRate, prefix+"limit.readline-rate", 10000, "promtail readline Rate.") | ||
f.IntVar(&cfg.ReadlineBurst, prefix+"limit.readline-burst", 10000, "promtail readline Burst.") | ||
f.BoolVar(&cfg.ReadlineRateEnabled, prefix+"limit.readline-rate-enabled", true, "Set to false to disable readline rate limit.") | ||
f.BoolVar(&cfg.ReadlineRateAsyn, prefix+"limit.readline-rate-asyn", false, "Set to true to drop log when rate limit.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReadlineRateDrop ? What does Asyn stand for here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, "Drop" is better than "Asyn" .
Co-authored-by: Cyril Tovena <cyril.tovena@gmail.com>
Co-authored-by: Cyril Tovena <cyril.tovena@gmail.com>
this PR,#5051 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great contribution, thanks @liguozhong 🎉
FYI for anyone finding this, the config name changed This was driving me nuts |
thanks |
refer issue : #2664
it is important feature,。
In our production environment, the cpu can be reduced from 3.5 core to 0.8 core, and business monitoring begins to stabilize。