You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the case of a large number of concurrent requests and a small number of upstreams, the following scenario exists.
Requests a, b, and c all access the same upstream, and since there is an ngx.sleep call in healthcheck.new, requests a, b, and c may all reach position 1, request a continues execution and successfully creates the checker, request b continues execution, and When it reaches position 2, since it corresponds to the same upstream object as request a, healthcheck_parent.checker is not nil and request b executes the cancel_clean_handler function, which sets the corresponding clean function to nil, and continues execution to position 3, where the ngx.sleep call is made inside the add_target function. Request c starts execution and when it reaches position 2, healthcheck_parent.checker is not nil and the cancel_clean_handler function is executed
At this point, the request returns 500 because the corresponding clean function has been set to nil by request b, and an error has occurred.
If the qps is large, thousands of checkers will be created that cannot be freed, causing CPU and memory anomalies
scenario 2
When concurrent requests arrive at position 1 at the same time, the checker is already created and cannot be released later, resulting in CPU and memory exceptions
Conclusion
Currently in concurrent scenarios, reloads or releases can create many health checkers, causing CPU and memory anomalies.
We need to ensure that only one health checker is created for an upstream
Expected Behavior
The CPU and memory is normal after reload or service discovery
Error Logs
/usr/local/apisix/apisix/core/config_util.lua:79: attempt to call local 'f' (a nil value)
config_util.lua:73: cancel_clean_handler(): item.clean_handlers is nil when cancel_clean_handler
Steps to Reproduce
One upstream with dozens of nodes
High concurrency (4000+ qps)
Active health check
Reload
Environment
APISIX version (run apisix version): 2.13.1
Operating system (run uname -a): centos 7.6
OpenResty / Nginx version (run openresty -V or nginx -V): 1.19.3.1
etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info):
APISIX Dashboard version, if relevant:
Plugin runner version, for issues related to plugin runners:
LuaRocks version, for installation issues (run luarocks --version):
The text was updated successfully, but these errors were encountered:
cocurrent requests after reload or update the upstream nodes will
cause high cpu and memory usage, the checker created by healthcheck.new
in create_checker won't be released if the program crashes after
cancel_clean_handler failed
Current Behavior
scenario 1
Reload or service discovery will update the upstream object and rebuild the health checker if a request comes in.
apisix/apisix/upstream.lua
Line 102 in 69df734
In the case of a large number of concurrent requests and a small number of upstreams, the following scenario exists.
Requests a, b, and c all access the same upstream, and since there is an
ngx.sleep
call inhealthcheck.new
, requests a, b, and c may all reach position 1, request a continues execution and successfully creates the checker, request b continues execution, and When it reaches position 2, since it corresponds to the same upstream object as request a,healthcheck_parent.checker
is not nil and request b executes thecancel_clean_handler
function, which sets the corresponding clean function to nil, and continues execution to position 3, where thengx.sleep
call is made inside theadd_target
function. Request c starts execution and when it reaches position 2,healthcheck_parent.checker
is not nil and thecancel_clean_handler
function is executedAt this point, the request returns 500 because the corresponding clean function has been set to nil by request b, and an error has occurred.
apisix/apisix/core/config_util.lua
Line 92 in 1acee1b
The checker generated at location 1 cannot be released and a timed task is registered within the checker to continuously perform json decode
https://github.com/api7/lua-resty-healthcheck/blob/master/lib/resty/healthcheck.lua#L217
If the qps is large, thousands of checkers will be created that cannot be freed, causing CPU and memory anomalies
scenario 2
When concurrent requests arrive at position 1 at the same time, the checker is already created and cannot be released later, resulting in CPU and memory exceptions
Conclusion
Currently in concurrent scenarios, reloads or releases can create many health checkers, causing CPU and memory anomalies.
We need to ensure that only one health checker is created for an upstream
Expected Behavior
The CPU and memory is normal after reload or service discovery
Error Logs
Steps to Reproduce
Environment
apisix version
): 2.13.1uname -a
): centos 7.6openresty -V
ornginx -V
): 1.19.3.1curl http://127.0.0.1:9090/v1/server_info
):luarocks --version
):The text was updated successfully, but these errors were encountered: