-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DISCUSS]: Improving the performance of the prometheus plugin #5755
Comments
Do we have other ways to switch out some CPU time? So many hard-coded |
yes, it looks strange. But I only know that |
I think we may only need to focus on some critical paths, not all of them need to be yielded. Could you generate a flamegraph? |
I think this solution is similar to Kong/kong@3fc3961? Maybe we can also use counter for it: https://github.com/Kong/kong/blob/d65101fe80fd7ac9870a84d34d81bda8bcb461ac/kong/tools/utils.lua#L1438 |
counter is for call yield in a loop? |
dear all, Is there a good solution for this issue? |
I' m working on it. ref: knyar/nginx-lua-prometheus#131 I will submit another PR to upstream. |
@tzssangglass knyar/nginx-lua-prometheus#131 merged, so any update for this issue? thx |
I have asked the maintainer if there are plans to release a new version, form my previous observations, usually the maintainers wait for stability before releasing a new version, so let's wait a little longer. |
new version has be released: https://github.com/knyar/nginx-lua-prometheus/releases/tag/0.20220127 In the following days I will submit a PR to fix APISIX. |
Issue description
the community has recently received reports from users of APISIX generating long-tail requests. e.g.
#5604
#5500
Talking to them and testing on their environment, we found the problem that when starting the prometheus plugin on APISIX, the prometheus service when collecting data (accessing /apisix/prometheus/metrics) causes APISIX to delay some of the requests, i.e. long-tail requests.
According to my analysis, this is because the collect function is computationally heavy and takes up too many CPU time slices.
apisix/apisix/plugins/prometheus/exporter.lua
Lines 281 to 289 in 5ae38f8
I have tried to do some optimisation, which has worked, but not as well as I would have liked.
here are my optimizations
and I've done some local verification
config
upstream server is an openresty, the nginx.conf is https://github.com/apache/apisix/tree/master/benchmark/server/conf
the route config is
test
use wrk to trigger prometheus clooect data
test
the trigger is as above
test
for more information on how to optimise, refer to: https://groups.google.com/g/openresty/c/fuY_vTS01eg
when collecting data in prometheus, we use
nix.timer(0)
to give up CPU time slice to give epoll a chance to process more requests.based on the above optimisation ideas we can assume that this optimisation is effective.
but we need to make more optimizations to the peometheus plugin, or even modify lua-resty-prometheus.
thanks @jagerzhang @sandy420
would you like to improve this optimization?
Environment
apisix version
): masteruname -a
):nginx -V
oropenresty -V
):curl http://127.0.0.1:9090/v1/server_info
to get the info from server-info API):luarocks --version
):The text was updated successfully, but these errors were encountered: