server: UI dead if internal net.Pipe closes #42828
Labels
A-webui-general
Issues on the DB Console that span multiple areas or don't have another clear category.
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Describe the problem
The cockroach server creates a singleton net.Pipe through which it connects the status server to the local gRPC server.
cockroach/pkg/server/server.go
Lines 1264 to 1265 in f97dc13
It wraps one side of this pipe in a
singleListener
on which it serves the gRPC server:cockroach/pkg/server/server.go
Line 933 in f97dc13
This listener no-ops on close and always returns the same side of the pipe. This is problematic if the underlying
Conn
is closed. I can't speak exactly to why theConn
is closed, but it seems to line up with a period of unavailability of the remote side. My best guess is that there was a client timeout somewhere.The error on authentication requests looks like:
The observable behavior is that login requests display
System Unavailable
. If you perform the login request withAccept: application/json
you get the following response:To Reproduce
Not sure how exactly to repro. It seems easy enough to contrive the situation. Coming up with a realistic repro is probably worthwhile. The situation occurred during a rolling restart and upgrade of a four node cluster. At some point during the upgrade a large number of ranges were transiently unavailable as they submitted crash reports (specifically this one: #42802).
The upgrade was from 19.1.5 (maybe 19.1.4) to 19.2.1.
Expected behavior
The expected behavior is that the admin UI remain available.
Possible solution
We should detect the close on the net.Pipe and create a new one. This can all happen underneath the listener. The dialer we inject into the client should then look up the most up-to-date conn from the listener. Probably we should rate limit all of this so we don't get caught in a tight loop.
The text was updated successfully, but these errors were encountered: