Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: prometheus metrics cause high cpu usage #7211

Closed
tangzhenhuang opened this issue Jun 8, 2022 · 13 comments
Closed

bug: prometheus metrics cause high cpu usage #7211

tangzhenhuang opened this issue Jun 8, 2022 · 13 comments

Comments

@tangzhenhuang
Copy link
Contributor

tangzhenhuang commented Jun 8, 2022

Current Behavior

At present, we have enabled the prometheus plugin. Since we have more routes (4000+), the size of the lua share dict of prometheus is 512M (larger). When the amount of data becomes larger, prometheus.export_metrics() will cause the cpu-usage of one worker process very hight, and requests on this worker process will be greatly affected.
wecom-temp-41f7f95c2d8774f9450a4dbbe9242897

Maybe there is some way to separate the worker process of internal server (such as admin and prometheus) and the actual business server?

Expected Behavior

Error Logs

Steps to Reproduce

  1. enable prometheus plugin
  2. maybe set lua share dict of prometheus to 512m
  3. increase prometheus data(by send many request?)
  4. curl the metric url

Environment

  • APISIX version (run apisix version): 2.14.1
  • Operating system (run uname -a):
  • OpenResty / Nginx version (run openresty -V or nginx -V):
  • etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info):
  • APISIX Dashboard version, if relevant:
  • Plugin runner version, for issues related to plugin runners:
  • LuaRocks version, for installation issues (run luarocks --version):
@tangzhenhuang tangzhenhuang changed the title bug: prometheus metrics request affect normal business requests bug: prometheus metrics cause high cpu usage Jun 8, 2022
@tokers
Copy link
Contributor

tokers commented Jun 9, 2022

Do you mean the method prometheus:metric_data? https://github.com/api7/lua-resty-prometheus/blob/master/prometheus.lua#L775

@tangzhenhuang
Copy link
Contributor Author

tangzhenhuang commented Jun 9, 2022

Do you mean the method prometheus:metric_data? https://github.com/api7/lua-resty-prometheus/blob/master/prometheus.lua#L775

image
Enable prometheus plugin and access this location⬆️ /apisix/prometheus/metrics

@tangzhenhuang
Copy link
Contributor Author

notice that codes changed about prometheus, my apisix version: 2.14.1

@spacewander
Copy link
Member

Maybe there is some way to separate the worker process of internal server (such as admin and prometheus) and the actual business server?

We have already done it in the Enterprise version and received good feedback.
@membphis
Is it possible to open source the change?

@jagerzhang
Copy link
Contributor

Maybe there is some way to separate the worker process of internal server (such as admin and prometheus) and the actual business server?

We have already done it in the Enterprise version and received good feedback. @membphis Is it possible to open source the change?

Can this plan also solve the previous problem:#5755

@membphis
Copy link
Member

membphis commented Jun 9, 2022

We have already done it in the Enterprise version and received good feedback.
@membphis
Is it possible to open source the change?

we do not have this plan now.

@xuminwlt
Copy link

Same question to me. Is there any way we can do , not close it.

@tangzhenhuang
Copy link
Contributor Author

Same question to me. Is there any way we can do , not close it.

I have seen that among the metrics in Prometheus, the amount of delay metrics data is very large. This metrics is not very important to me, so I removed it in exporter.lua and mount by configmap. Now the memory is greatly reduced, and the CPU usage during collection is also greatly reduced. Looking forward to better solutions.

@tangzhenhuang
Copy link
Contributor Author

Same question to me. Is there any way we can do , not close it.

I have seen that among the metrics in Prometheus, the amount of delay metrics data is very large. This metrics is not very important to me, so I removed it in exporter.lua and mount by configmap. Now the memory is greatly reduced, and the CPU usage during collection is also greatly reduced. Looking forward to better solutions.

Or can we add some configuration items of the Prometheus plugin to support selecting what data to collect

@spacewander
Copy link
Member

Same question to me. Is there any way we can do , not close it.

I have seen that among the metrics in Prometheus, the amount of delay metrics data is very large. This metrics is not very important to me, so I removed it in exporter.lua and mount by configmap. Now the memory is greatly reduced, and the CPU usage during collection is also greatly reduced. Looking forward to better solutions.

Or can we add some configuration items of the Prometheus plugin to support selecting what data to collect

Similar to #4273?

@Dieken

This comment was marked as off-topic.

@jagerzhang
Copy link
Contributor

Same question to me. Is there any way we can do , not close it.

I have seen that among the metrics in Prometheus, the amount of delay metrics data is very large. This metrics is not very important to me, so I removed it in exporter.lua and mount by configmap. Now the memory is greatly reduced, and the CPU usage during collection is also greatly reduced. Looking forward to better solutions.

@crazyMonkey1995
Looking forward to your best practices

@spacewander
Copy link
Member

spacewander commented Jul 20, 2022

Let's discuss it at #7353. Close this to ensure all discussions happen in the same place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants