Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: high cpu and memory usage #9015

Closed
monkeyDluffy6017 opened this issue Mar 6, 2023 · 0 comments · Fixed by #9016
Closed

bug: high cpu and memory usage #9015

monkeyDluffy6017 opened this issue Mar 6, 2023 · 0 comments · Fixed by #9016

Comments

@monkeyDluffy6017
Copy link
Contributor

monkeyDluffy6017 commented Mar 6, 2023

Current Behavior

scenario 1

Reload or service discovery will update the upstream object and rebuild the health checker if a request comes in.
image

if healthcheck_parent.checker and healthcheck_parent.checker_upstream == upstream then

In the case of a large number of concurrent requests and a small number of upstreams, the following scenario exists.
Requests a, b, and c all access the same upstream, and since there is an ngx.sleep call in healthcheck.new, requests a, b, and c may all reach position 1, request a continues execution and successfully creates the checker, request b continues execution, and When it reaches position 2, since it corresponds to the same upstream object as request a, healthcheck_parent.checker is not nil and request b executes the cancel_clean_handler function, which sets the corresponding clean function to nil, and continues execution to position 3, where the ngx.sleep call is made inside the add_target function. Request c starts execution and when it reaches position 2, healthcheck_parent.checker is not nil and the cancel_clean_handler function is executed
image

At this point, the request returns 500 because the corresponding clean function has been set to nil by request b, and an error has occurred.
image

The checker generated at location 1 cannot be released and a timed task is registered within the checker to continuously perform json decode
image
https://github.com/api7/lua-resty-healthcheck/blob/master/lib/resty/healthcheck.lua#L217

If the qps is large, thousands of checkers will be created that cannot be freed, causing CPU and memory anomalies
image

scenario 2

When concurrent requests arrive at position 1 at the same time, the checker is already created and cannot be released later, resulting in CPU and memory exceptions

image

Conclusion

Currently in concurrent scenarios, reloads or releases can create many health checkers, causing CPU and memory anomalies.
We need to ensure that only one health checker is created for an upstream

Expected Behavior

The CPU and memory is normal after reload or service discovery

Error Logs

/usr/local/apisix/apisix/core/config_util.lua:79: attempt to call local 'f' (a nil value)
config_util.lua:73: cancel_clean_handler(): item.clean_handlers is nil when cancel_clean_handler

Steps to Reproduce

  1. One upstream with dozens of nodes
  2. High concurrency (4000+ qps)
  3. Active health check
  4. Reload

Environment

  • APISIX version (run apisix version): 2.13.1
  • Operating system (run uname -a): centos 7.6
  • OpenResty / Nginx version (run openresty -V or nginx -V): 1.19.3.1
  • etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info):
  • APISIX Dashboard version, if relevant:
  • Plugin runner version, for issues related to plugin runners:
  • LuaRocks version, for installation issues (run luarocks --version):
monkeyDluffy6017 added a commit to monkeyDluffy6017/apisix that referenced this issue Mar 6, 2023
cocurrent requests after reload or update the upstream nodes will
cause high cpu and memory usage, the checker created by healthcheck.new
in create_checker won't be released if the program crashes after
cancel_clean_handler failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant