bug: active healthcheck increase the response latency #11756

jujiale · 2024-11-19T13:11:53Z

Description

Hello, APISIX team.
in our prd env, we suffered a very strange scenario.
we have a cluster which have 9 apisix instances.

the upstream is using service discovery, which is eureka.
our microservice is registerd in eureka, it deployed in k8s, use eureka exposed to apisix, the microservice has 150 instances.

we deployed it in July this year, but recently, we find that the latency is increase to 200ms+ (normal latency is about 100ms, but recently some latency is beyond 200ms), when we send a request directly to one of the microservice instance, it latency is about 100ms,but when the request proxy by apisix, it increased.

we opened active healthcheck(use tcp way), when we closed the healthcheck, the latency suddenly recoverd to about 100ms.

also we find the prometheus metrics may have some odd things. the following metrics is 0:
`

apisix_shared_dict_free_space_bytes{name="worker-events"} 0
apisix_shared_dict_free_space_bytes{name="prometheus-metrics"} 0

`
the shared_dict worker-events and prometheus-metrics both are 10m

I want to know if the above shared_dict used up could result in healthcheck increase the latency.

I tried reopen it in our dev env, but failed, the error log indicate the prometheus has no memory

I find a other issue metioned : but seems no activity. #11345

hope your answer, thanks.

Environment

APISIX version (run apisix version):2.15.3
Operating system (run uname -a):
OpenResty / Nginx version (run openresty -V or nginx -V):
etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info):3.5.0
APISIX Dashboard version, if relevant:
Plugin runner version, for issues related to plugin runners:
LuaRocks version, for installation issues (run luarocks --version):

The text was updated successfully, but these errors were encountered:

jujiale · 2024-11-19T13:52:37Z

because we add other feature in our apisix, so we cannot update it to 3.x version.
our healthcheck version is v3.2.0 https://github.com/api7/lua-resty-healthcheck
and we merge some pr that apisix have fixed about healthcheck which version is beyond 2.15.3

jujiale · 2024-11-20T02:35:39Z

in error.log. find that earlier pod ip exist in healthcheck.
2024/11/20 09:19:36 [warn] 16195#16195: *15291831195 [lua] healthcheck.lua:1383: log(): [healthcheck] (upstream#/xxx/upstreams/515481794896725698) healthy SUCCESS increment (10/2) for '10.98.xxx.155(10.98.xxx.155:8080)', context: ngx.timer, client: 172.xxx.29.xxx, server: 0.0.0.0:80
I confirmed that 10.98.xxx.155 is not our miroservice pod ip, it used to be our microservice pod ip, it now belongs to the other service. so if upstream config in etcd not change. but upstream node change(because use eureka discovery)，it seems the healthcheck will not remove the earlier pod ip

jujiale · 2024-11-20T09:30:19Z

we also find even if we close the healthcheck. use the tcpdump to capture packet, the apisix instance also acting active healthcheck

jujiale · 2024-11-20T15:17:28Z

I prefer to this is a bug problem

github-project-automation bot added this to Apache APISIX backlog Nov 19, 2024

github-project-automation bot moved this to 📋 Backlog in Apache APISIX backlog Nov 19, 2024

dosubot bot added the question label for questions asked by users label Nov 19, 2024

jujiale changed the title ~~help request: active healthcheck increase the response latency~~ bug: active healthcheck increase the response latency Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: active healthcheck increase the response latency #11756

bug: active healthcheck increase the response latency #11756

jujiale commented Nov 19, 2024 •

edited

Loading

jujiale commented Nov 19, 2024

jujiale commented Nov 20, 2024 •

edited

Loading

jujiale commented Nov 20, 2024 •

edited

Loading

jujiale commented Nov 20, 2024

bug: active healthcheck increase the response latency #11756

bug: active healthcheck increase the response latency #11756

Comments

jujiale commented Nov 19, 2024 • edited Loading

Description

Environment

jujiale commented Nov 19, 2024

jujiale commented Nov 20, 2024 • edited Loading

jujiale commented Nov 20, 2024 • edited Loading

jujiale commented Nov 20, 2024

jujiale commented Nov 19, 2024 •

edited

Loading

jujiale commented Nov 20, 2024 •

edited

Loading

jujiale commented Nov 20, 2024 •

edited

Loading