Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

appservice: Replace nginx with Starlette reverse proxy, add session status API #37

Merged
merged 7 commits into from
Sep 26, 2022

Conversation

martinpitt
Copy link
Member

@martinpitt martinpitt commented Sep 15, 2022

nginx does not get along well with unavailable proxy_pass targets -- as soon as one session pod unexpectedly goes away (crashes, idle timeouts, networking flakes), it errors out and no session routes can be resolved any more. There are workarounds [1], but I can't get them to work properly. Also, nginx does not allow us to hook into (dis)connection events to implement the session status API (see issue #28).

So it's finally time to grow to something more flexible.

Fixes #36
Fixes #28

[1] https://sandro-keil.de/blog/let-nginx-start-if-upstream-host-is-unavailable-or-down/


@tiran FYI

@martinpitt martinpitt force-pushed the bye-nginx branch 5 times, most recently from 7624947 to be8ebb0 Compare September 15, 2022 15:49
@martinpitt
Copy link
Member Author

martinpitt commented Sep 19, 2022

I've spent some time trying the workarounds in aio-libs/aiohttp#4581 , and I can't get it to work. I keep getting aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed crashes with the chunked transfer-encoding that cockpit-ws does. I'm afraid we'll need another proxying solution.

@martinpitt martinpitt force-pushed the bye-nginx branch 2 times, most recently from efbd955 to 216ff8f Compare September 19, 2022 08:33
@martinpitt martinpitt marked this pull request as ready for review September 19, 2022 08:42
@martinpitt martinpitt mentioned this pull request Sep 19, 2022
@martinpitt martinpitt changed the title appservice: Replace nginx with aiohttp reverse proxy appservice: Replace nginx with aiohttp reverse proxy, add session status API Sep 19, 2022
appservice/multiplexer.py Outdated Show resolved Hide resolved
3scale/nginx.conf Outdated Show resolved Hide resolved
appservice/multiplexer.py Outdated Show resolved Hide resolved
appservice/multiplexer.py Show resolved Hide resolved
@martinpitt
Copy link
Member Author

This works well enough with podman now, but aio-libs/aiohttp#4581 still completely breaks deployment on 3scale/k8s. I've wasted another two hours on trying to find a workaround..

@martinpitt martinpitt removed the request for review from jelly September 20, 2022 11:54
@martinpitt martinpitt mentioned this pull request Sep 20, 2022
@martinpitt martinpitt marked this pull request as draft September 20, 2022 11:56
@martinpitt martinpitt changed the title appservice: Replace nginx with aiohttp reverse proxy, add session status API appservice: Replace nginx with Starlette reverse proxy, add session status API Sep 21, 2022
@martinpitt
Copy link
Member Author

@tiran created a starlette PoC yesterday, many thanks! I rewrote the appservice/proxy based on his code it seems to work nicely at least with local podman. I'm going to try that on kubernetes next.

However, I would like to keep the aiohttp implementation in git history, just in case we need it again some day. It's also a nice reference to look at.

appservice/multiplexer.py Outdated Show resolved Hide resolved
appservice/multiplexer.py Outdated Show resolved Hide resolved
@martinpitt martinpitt force-pushed the bye-nginx branch 4 times, most recently from 32befda to d425997 Compare September 21, 2022 12:59
@martinpitt
Copy link
Member Author

This works with our podman emulation now, but still not on actual k8s. Needs a bigger hammer.

3scale sets this to "Upgrade", not "upgrade".
nginx does not get along well with unavailable proxy_pass targets -- as
soon as one session pod unexpectedly goes away (crashes, idle timeouts,
networking flakes), it errors out and no session routes can be resolved
any more. There are workarounds [1], but I can't get them to work
properly. Also, nginx does not allow us to hook into (dis)connection
events to implement the session status API (see issue #28).

So it's finally time to grow to something more flexible. Rewrite
multiplexer.py using aiohttp [2], which now handles both our own session
control API as well as the dynamic reverse proxying to the session pods.

This should still be considered as a PoC, but at least there are fewer
moving parts and an easier-to-understand architecture now.

Check behaviour with broken session pods in the tests: Crash a session
pod, and validate that the other sessions still behave correctly.

Fixes #36

[1] https://sandro-keil.de/blog/let-nginx-start-if-upstream-host-is-unavailable-or-down/
[2] https://docs.aiohttp.org/en/stable/index.html
This is larger than Debian, but using UBI is a requirement for deploying
to 3scale.
Track session status. Right after creation, it starts as `wait_target`.
After the target machine has connected (first connection to the /ws
path), it changes to `running`. We'll probably add more states in the
future.

Add /sessions/ID/status API to query it, so that e.g. the target machine
and the UI can poll this to know when to connect.

Fixes #28
aiohttp has a long-standing race condition with reading replies [1]. This is
completely killing cockpit on kubernetes, and so far there is no workaround.

Thus move to a different framework. Starlette [1] is quite pleasant and
performant, and supports async and websockets. This also allows us to export
the API to consoledot as OpenAPI.

Many thanks to Christian Heimes @tiran for his initial code!

[1] aio-libs/aiohttp#4581
[2] https://www.starlette.io/
That sole `Connection: upgrade` header gets attached to all requests on
/wss/* paths, even for normal HTTP requests which don't have an
`Upgrade:` header. uvicorn rejects these with "Unsupported upgrade
request".

Monkey-patch the h11 module to deliver patched headers.

See https://issues.redhat.com/browse/RHCLOUD-21326
@martinpitt
Copy link
Member Author

Huzzah! Fortunately that was simple, just a differently capitalized header value between our fake 3scale and the actual one. This now works fine on k8s!

@martinpitt martinpitt marked this pull request as ready for review September 22, 2022 04:34
@martinpitt martinpitt requested review from tiran and jelly and removed request for tiran September 22, 2022 04:34
@martinpitt
Copy link
Member Author

Ah, interesting (but unrelated) failure due to a race condition with redis startup. I can reproduce locally with

--- webconsoleapp-local.yaml
+++ webconsoleapp-local.yaml
@@ -35,6 +35,7 @@ spec:
 
     - name: redis
       image: docker.io/redis
+      command: ["sh", "-ec", "sleep 2; redis-server"]
 
     - name: 3scale
       image: docker.io/library/nginx

The pod starts up in parallel with appservice, so redis may not yet be
ready. This caused an occasional CI failure, which can be reproduced
with delaying the redis startup in webconsoleapp-local.yaml:

    command: ["sh", "-ec", "sleep 2; redis-server"]
'user': 'cockpit-wsinstance',
}

async with httpx.AsyncClient(transport=httpx.AsyncHTTPTransport(uds=PODMAN_SOCKET)) as podman:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if the podman socket is gone, this just throws an exception? Otherwise this will return an unitialized status, content.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does. with uds=PODMAN_SOCKET + 'x', you get a httpcore.ConnectError: [Errno 2] No such file or directory.

@jelly jelly merged commit 7225fdd into main Sep 26, 2022
@jelly jelly deleted the bye-nginx branch September 26, 2022 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Get along with non-responding session pods add session status API
3 participants