fix(api) support HTTP/2 status and metrics requests #8687

rainest · 2022-04-14T21:46:20Z

Summary

Rework the status and metrics endpoint implementations to make them not break on HTTP/2.

Full changelog

Add a new Unix socket listen that runs NGINX's stub_status on requests for /nginx_status at $PREFIX/nginx_status.sock.
Use lua-resty-http to access the new status socket listen in the status endpoint instead of using ngx.location.capture (which breaks if invoked during an HTTP/2 request) on the existing internal locations.
Ditto for Prometheus /metrics.

Issue reference

Discovered that Kong/kubernetes-ingress-controller#2343 did not work with HTTPS during post-release testing. We'd prefer to change the Kong end rather than disable HTTP/2 across the board on the controller end.

Implements @dndx's suggestion from #8528

Draft problems

Edit: 671e15e works around this: #8687 (comment)

This exposes a possible issue in spec/02-integration/02-cmd/02-start_stop_spec.lua:

cat: /home/runner/work/kong/kong/servroot/pids/nginx.pid: No such file or directory
1146.04   OK 635: kong start/stop #postgres deprecated properties prints a warning to stderr client_max_body_size
        __________
        FAILED  spec/02-integration/02-cmd/02-start_stop_spec.lua:641: kong start/stop #postgres deprecated properties prints a warning to stderr client_body_buffer_size
spec/02-integration/02-cmd/02-start_stop_spec.lua:608: 2022/04/14 20:33:48 [warn] the 'client_body_buffer_size' configuration property is deprecated, use 'nginx_http_client_body_buffer_size' instead
Error: ./kong/cmd/start.lua:64: nginx: [emerg] bind() to unix:/home/runner/work/kong/kong/servroot/nginx_status.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/home/runner/work/kong/kong/servroot/nginx_status.sock failed (98: Address already in use)

This change adds a unix socket that is almost always (either admin or status listen enabled) present. I think I'm hitting something in those tests that has multiple overlapping Kong instances. The cat error just before the conflict suggests we were trying to check if an instance was still up and/or kill it but failed due to a missing pidfile, but I'm not familiar enough with the tests to say for sure. I can't reproduce it when testing locally--I get no PID conflicts, just complaints about Cassandra and Postgres availability in 3 out of 39 tests. The issue flakes/does not occur consistently.

Manual testing (do we have HTTP/2 test client stuff yet?): example.txt

Add a new Unix socket listen that runs NGINX's stub_status on requests for /nginx_status at $PREFIX/nginx_status.sock. Use lua-resty-http to access the new status socket listen in the status endpoint instead of using ngx.location.capture (which breaks if invoked during an HTTP/2 request) on the existing internal locations.

Wait for the nginx_status.sock socket to clear before starting a Kong instance.

rainest · 2022-04-18T22:03:14Z

Alright, well, workaround is less than ideal for using a specific socket path, but does apparently avoid whatever the issue was. Guessing that since kill_all only sends a signal there's no guarantee when NGINX actually stops, but since the function's done the after_each() exits and continues with the next test.

Also tried:

pl.delete(<prefix>/*.sock), which I'm guessing didn't work because either it doesn't expand globs or doesn't work on socket files.
Moving the clean_prefix() call to after_each. This broke because some tests get unhappy if the prefix isn't around yet--the nginx_main_daemon = off can't deal with the extra logs about creating it again.

There's now an unrelated flake with one of the Cassandra tests that appears to just be "Cassandra slow", but I don't have permissions to trigger re-tests.

rainest · 2022-04-18T22:15:56Z

#8690 probably obsoletes this, so closing on the assumption we'll use that.

pull-request-size bot added the size/L label Apr 14, 2022

github-actions bot added core/admin-api core/templates plugins/prometheus labels Apr 14, 2022

rainest force-pushed the fix/status-sock branch 5 times, most recently from 099c947 to fa46f3a Compare April 18, 2022 21:52

fix(tests) check socket before start/stop tests

671e15e

Wait for the nginx_status.sock socket to clear before starting a Kong instance.

rainest force-pushed the fix/status-sock branch from fa46f3a to 671e15e Compare April 18, 2022 21:55

rainest marked this pull request as ready for review April 18, 2022 22:03

rainest requested a review from a team as a code owner April 18, 2022 22:03

rainest closed this Apr 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(api) support HTTP/2 status and metrics requests #8687

fix(api) support HTTP/2 status and metrics requests #8687

rainest commented Apr 14, 2022 •

edited

Loading

rainest commented Apr 18, 2022 •

edited

Loading

rainest commented Apr 18, 2022 •

edited

Loading

fix(api) support HTTP/2 status and metrics requests #8687

fix(api) support HTTP/2 status and metrics requests #8687

Conversation

rainest commented Apr 14, 2022 • edited Loading

Summary

Full changelog

Issue reference

Draft problems

rainest commented Apr 18, 2022 • edited Loading

rainest commented Apr 18, 2022 • edited Loading

rainest commented Apr 14, 2022 •

edited

Loading

rainest commented Apr 18, 2022 •

edited

Loading

rainest commented Apr 18, 2022 •

edited

Loading