bug: prometheus metrics cause high cpu usage #7211

tangzhenhuang · 2022-06-08T10:32:08Z

Current Behavior

At present, we have enabled the prometheus plugin. Since we have more routes (4000+), the size of the lua share dict of prometheus is 512M (larger). When the amount of data becomes larger, prometheus.export_metrics() will cause the cpu-usage of one worker process very hight, and requests on this worker process will be greatly affected.

Maybe there is some way to separate the worker process of internal server (such as admin and prometheus) and the actual business server?

Expected Behavior

Error Logs

Steps to Reproduce

enable prometheus plugin
maybe set lua share dict of prometheus to 512m
increase prometheus data(by send many request?)
curl the metric url

Environment

APISIX version (run apisix version): 2.14.1
Operating system (run uname -a):
OpenResty / Nginx version (run openresty -V or nginx -V):
etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info):
APISIX Dashboard version, if relevant:
Plugin runner version, for issues related to plugin runners:
LuaRocks version, for installation issues (run luarocks --version):

The text was updated successfully, but these errors were encountered:

tokers · 2022-06-09T01:09:16Z

Do you mean the method prometheus:metric_data? https://github.com/api7/lua-resty-prometheus/blob/master/prometheus.lua#L775

tangzhenhuang · 2022-06-09T01:28:10Z

Do you mean the method prometheus:metric_data? https://github.com/api7/lua-resty-prometheus/blob/master/prometheus.lua#L775

Enable prometheus plugin and access this location⬆️ /apisix/prometheus/metrics

tangzhenhuang · 2022-06-09T02:20:44Z

notice that codes changed about prometheus, my apisix version: 2.14.1

spacewander · 2022-06-09T06:15:05Z

Maybe there is some way to separate the worker process of internal server (such as admin and prometheus) and the actual business server?

We have already done it in the Enterprise version and received good feedback.
@membphis
Is it possible to open source the change?

jagerzhang · 2022-06-09T08:31:06Z

Maybe there is some way to separate the worker process of internal server (such as admin and prometheus) and the actual business server?

We have already done it in the Enterprise version and received good feedback. @membphis Is it possible to open source the change?

Can this plan also solve the previous problem：#5755

membphis · 2022-06-09T09:12:34Z

We have already done it in the Enterprise version and received good feedback.
@membphis
Is it possible to open source the change?

we do not have this plan now.

xuminwlt · 2022-06-24T14:43:57Z

Same question to me. Is there any way we can do , not close it.

tangzhenhuang · 2022-06-24T15:06:57Z

Same question to me. Is there any way we can do , not close it.

I have seen that among the metrics in Prometheus, the amount of delay metrics data is very large. This metrics is not very important to me, so I removed it in exporter.lua and mount by configmap. Now the memory is greatly reduced, and the CPU usage during collection is also greatly reduced. Looking forward to better solutions.

tangzhenhuang · 2022-06-24T15:10:59Z

Same question to me. Is there any way we can do , not close it.

I have seen that among the metrics in Prometheus, the amount of delay metrics data is very large. This metrics is not very important to me, so I removed it in exporter.lua and mount by configmap. Now the memory is greatly reduced, and the CPU usage during collection is also greatly reduced. Looking forward to better solutions.

Or can we add some configuration items of the Prometheus plugin to support selecting what data to collect

spacewander · 2022-06-26T12:40:59Z

Same question to me. Is there any way we can do , not close it.

I have seen that among the metrics in Prometheus, the amount of delay metrics data is very large. This metrics is not very important to me, so I removed it in exporter.lua and mount by configmap. Now the memory is greatly reduced, and the CPU usage during collection is also greatly reduced. Looking forward to better solutions.

Or can we add some configuration items of the Prometheus plugin to support selecting what data to collect

Similar to #4273?

jagerzhang · 2022-07-14T08:32:18Z

Same question to me. Is there any way we can do , not close it.

I have seen that among the metrics in Prometheus, the amount of delay metrics data is very large. This metrics is not very important to me, so I removed it in exporter.lua and mount by configmap. Now the memory is greatly reduced, and the CPU usage during collection is also greatly reduced. Looking forward to better solutions.

@crazyMonkey1995
Looking forward to your best practices

spacewander · 2022-07-20T01:31:33Z

Let's discuss it at #7353. Close this to ensure all discussions happen in the same place.

tangzhenhuang changed the title ~~bug: prometheus metrics request affect normal business requests~~ bug: prometheus metrics cause high cpu usage Jun 8, 2022

tzssangglass mentioned this issue Jun 29, 2022

the prometheus metrics API is tool slow #7353

Closed

This comment was marked as off-topic.

Sign in to view

spacewander closed this as completed Jul 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: prometheus metrics cause high cpu usage #7211

bug: prometheus metrics cause high cpu usage #7211

tangzhenhuang commented Jun 8, 2022 •

edited

Loading

tokers commented Jun 9, 2022

tangzhenhuang commented Jun 9, 2022 •

edited

Loading

tangzhenhuang commented Jun 9, 2022

spacewander commented Jun 9, 2022

jagerzhang commented Jun 9, 2022

membphis commented Jun 9, 2022

xuminwlt commented Jun 24, 2022

tangzhenhuang commented Jun 24, 2022

tangzhenhuang commented Jun 24, 2022

spacewander commented Jun 26, 2022

This comment was marked as off-topic.

jagerzhang commented Jul 14, 2022

spacewander commented Jul 20, 2022 •

edited

Loading

bug: prometheus metrics cause high cpu usage #7211

bug: prometheus metrics cause high cpu usage #7211

Comments

tangzhenhuang commented Jun 8, 2022 • edited Loading

Current Behavior

Expected Behavior

Error Logs

Steps to Reproduce

Environment

tokers commented Jun 9, 2022

tangzhenhuang commented Jun 9, 2022 • edited Loading

tangzhenhuang commented Jun 9, 2022

spacewander commented Jun 9, 2022

jagerzhang commented Jun 9, 2022

membphis commented Jun 9, 2022

xuminwlt commented Jun 24, 2022

tangzhenhuang commented Jun 24, 2022

tangzhenhuang commented Jun 24, 2022

spacewander commented Jun 26, 2022

This comment was marked as off-topic.

jagerzhang commented Jul 14, 2022

spacewander commented Jul 20, 2022 • edited Loading

tangzhenhuang commented Jun 8, 2022 •

edited

Loading

tangzhenhuang commented Jun 9, 2022 •

edited

Loading

spacewander commented Jul 20, 2022 •

edited

Loading