-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
conduit-proxy pegging cpu #519
Comments
Grafana by default calls out to grafana.com to check for updates. As user's of Conduit do not have direct control over updating Grafana directly, this update check is not needed. Disable Grafana's update check via grafana.ini. This is also a workaround for #155, root cause of #519. Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Review to disable update check: #521. |
Grafana by default calls out to grafana.com to check for updates. As user's of Conduit do not have direct control over updating Grafana directly, this update check is not needed. Disable Grafana's update check via grafana.ini. This is also a workaround for #155, root cause of #519. Signed-off-by: Andrew Seigner <siggy@buoyant.io>
@siggy wrote (elsewhere):
Yes, we need to find out why the proxy has gone haywire in this situation. |
marking this P0 to at least develop a working theory of this issue so that we can assess impact. |
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
An infinite loop exists in the TCP proxy, which could be triggered by any raw TCP connection (including HTTPS requests). The connection will be proxied successfully, but instead of closing, it will remain open, and the proxy's CPU usage will remain extremely high indefinitely. Since `Duplex::poll` will call `half_in.copy_into()`/`half_out.copy_into()` repeatedly, even after they return `Async::Ready`, when one half has shut down and returned ready, it may still be polled again, as `Duplex::poll` waits until _both_ halves have returned `Ready`. Because of the guard that `!dst.is_shutdown`, intended to prevent the destination from shutting down twice, the function will not return if it is polled again after returning `Async::Ready` once. I've fixed this by moving the guard against double shutdowns out of the loop, so that the function will return `Async::Ready` again if it is polled after shutting down the destination. I've also included a unit test against regressions to this bug. The unit test fails against master. Fixes #519 Signed-off-by: Eliza Weisman <eliza@buoyant.io> Co-Authored-By: Andrew Seigner <andrew@sig.gy>
An infinite loop exists in the TCP proxy, which could be triggered by any raw TCP connection (including HTTPS requests). The connection will be proxied successfully, but instead of closing, it will remain open, and the proxy's CPU usage will remain extremely high indefinitely. Since `Duplex::poll` will call `half_in.copy_into()`/`half_out.copy_into()` repeatedly, even after they return `Async::Ready`, when one half has shut down and returned ready, it may still be polled again, as `Duplex::poll` waits until _both_ halves have returned `Ready`. Because of the guard that `!dst.is_shutdown`, intended to prevent the destination from shutting down twice, the function will not return if it is polled again after returning `Async::Ready` once. I've fixed this by moving the guard against double shutdowns out of the loop, so that the function will return `Async::Ready` again if it is polled after shutting down the destination. I've also included a unit test against regressions to this bug. The unit test fails against master. Fixes linkerd#519 Signed-off-by: Eliza Weisman <eliza@buoyant.io> Co-Authored-By: Andrew Seigner <andrew@sig.gy>
Building and running Conduit from master, I observe the conduit-proxy in the Grafana pod pegging the CPU, about a minute after startup.
Grafana is complaining about not being able to connect to grafana.com, which is a manifestation of #155. Confirmed this is the cause of the CPU pegging.
Workarounds
We can work around this via one of two ways.
Ignore outbound 443
Ignore port 443 on the outbound Grafana proxy-init container:
Disable update check
Disable the Grafana update check in
grafana.ini
:Repro
DOCKER_TRACE=1 bin/docker-build bin/conduit install --proxy-log-level debug,conduit_proxy=debug | kubectl apply -f -
Note that curl'ing Prometheus from the Grafana container hangs:
CPU
$ docker stats $(docker ps --format={{.Names}}) CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS ... dc0e821e277f k8s_conduit-proxy_grafana-68954c7876-txnsd_conduit_13a09de6-2184-11e8-b369-025000000001_0 95.08% 984KiB / 11.71GiB 0.01% 0B / 0B 0B / 0B 2 ...
Logs
Grafana
conduit-proxy
The text was updated successfully, but these errors were encountered: