Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix grpc recover and logging #155

Merged
merged 2 commits into from
Apr 8, 2022

Conversation

DasSkelett
Copy link
Member

@DasSkelett DasSkelett commented Apr 5, 2022

Problem

While looking at #119 I discovered that the existing RecoveryMiddleware doesn't actually work for panics in gRPC code / API calls. This is because the grpc library handles the requests in goroutines, and recover() can only work for panics in the same goroutine.
https://github.com/grpc/grpc-go/blob/431ea809a7676e1da8d09c33ae0d31fcba85f1ff/server.go#L920-L923

And while it does recover for non-gRPC calls, like the login flow, the logging doesn't actually work because ctxlogrus is a noop logger in this case.

This can make setting up OIDC annoying like in #154, because it's just failing and not telling you why.

The OIDC callback handler is broken if the name claim is missing, despite it being optional in the spec and in the rest of the application.

Changes

We handle a missing name claim in the OIDC User Info gracefully.

RecoveryMiddleware now logs using traces.Logger, not ctxlogrus.

For gRPC we make use of github.com/grpc-ecosystem/go-grpc-middleware's grpc_recovery interceptor to recover from panics. It has a custom handler to attach the trace id to the error message so it's visible to the client (e.g. in the Grpc-Message header).

The log lines will look like this:

panic in gRPC API code:

ERRO[0006]options.go:224 finished unary call with code Internal        error="rpc error: code = Internal desc = something trace = 884660dd-ead7-41c4-b1bb-8f36162ed198" grpc.code=Internal grpc.method=Info grpc.service=proto.Server grpc.start_time="2022-04-05T15:29:36+02:00" grpc.time_ms=0.019 span.kind=server system=grpc trace.id=884660dd-ead7-41c4-b1bb-8f36162ed198

panic in other HTTP handling code (actual log entry will have new lines escaped to \n):

ERRO[0103]middleware.go:23 something                                     stack="goroutine 61 [running]:
runtime/debug.Stack()
	go/src/runtime/debug/stack.go:24 +0x65
github.com/freifunkMUC/wg-access-server/internal/services.RecoveryMiddleware.func1.1()
	wg-access-server/internal/services/middleware.go:22 +0xa5
panic({0xc9e0a0, 0xef9640})
	go/src/runtime/panic.go:838 +0x207
github.com/freifunkMUC/wg-access-server/pkg/authnz/authconfig.(*OIDCConfig).loginHandler.func1({0x8451b1?, 0xc000410190?}, 0xc0000f40e0?)
	wg-access-server/pkg/authnz/authconfig/oidc.go:74 +0x27
github.com/freifunkMUC/wg-access-server/pkg/authnz/authconfig.(*OIDCConfig).Provider.func1({0xf01770, 0xc000758000}, 0xdcb568?, 0x5?)
	wg-access-server/pkg/authnz/authconfig/oidc.go:63 +0x56
github.com/freifunkMUC/wg-access-server/pkg/authnz.New.func2({0xf01770, 0xc000758000}, 0xc0001b6870?)
	wg-access-server/pkg/authnz/router.go:63 +0xfc
net/http.HandlerFunc.ServeHTTP(0xc000328600?, {0xf01770?, 0xc000758000?}, 0xc00017f710?)
	go/src/net/http/server.go:2084 +0x2f
github.com/gorilla/mux.(*Router).ServeHTTP(0xc00012e0c0, {0xf01770, 0xc000758000}, 0xc000328500)
	mod/github.com/gorilla/mux@v1.8.0/mux.go:210 +0x1cf
github.com/freifunkMUC/wg-access-server/pkg/authnz.(*AuthMiddleware).Middleware.func1({0xf01770, 0xc000758000}, 0xc000328500)
	wg-access-server/pkg/authnz/router.go:93 +0x494
net/http.HandlerFunc.ServeHTTP(0xc9e0a0?, {0xf01770?, 0xc000758000?}, 0x88?)
	go/src/net/http/server.go:2084 +0x2f
github.com/freifunkMUC/wg-access-server/internal/services.RecoveryMiddleware.func1({0xf01770?, 0xc000758000?}, 0xc00017f801?)
	wg-access-server/internal/services/middleware.go:28 +0x83
net/http.HandlerFunc.ServeHTTP(0xf01e80?, {0xf01770?, 0xc000758000?}, 0xc00017f960?)
	go/src/net/http/server.go:2084 +0x2f
github.com/freifunkMUC/wg-access-server/internal/services.TracesMiddleware.func1({0xf01770, 0xc000758000}, 0xc000328300)
	wg-access-server/internal/services/middleware.go:13 +0x230
net/http.HandlerFunc.ServeHTTP(0xc000328100?, {0xf01770?, 0xc000758000?}, 0x0?)
	go/src/net/http/server.go:2084 +0x2f
github.com/gorilla/mux.(*Router).ServeHTTP(0xc00012e000, {0xf01770, 0xc000758000}, 0xc0001fe700)
	mod/github.com/gorilla/mux@v1.8.0/mux.go:210 +0x1cf
net/http.serverHandler.ServeHTTP({0xc000685f20?}, {0xf01770, 0xc000758000}, 0xc0001fe700)
	go/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc0000bf220, {0xf01e80, 0xc000685e00})
	go/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
	go/src/net/http/server.go:3071 +0x4db
" trace.id=2052a13b-66ca-4a86-986f-36aa3dad2691

Fixes #118
Fixes #154

@DasSkelett DasSkelett added bug Something isn't working go Pull requests that update Go code labels Apr 5, 2022
@nathanael-h
Copy link

Thanks for looking this topic. I built the docker image based on your branch to run your fix, I think I missed something because I have the same log level as before.

I also tried to pull to docker image built by the CI on the PR, but it does not work. Is the image published after the build on PRs?

docker pull ghcr.io/freifunkmuc/wg-access-server:pr-155
Error response from daemon: manifest unknown

@DasSkelett
Copy link
Member Author

DasSkelett commented Apr 5, 2022

Thanks for looking this topic. I built the docker image based on your branch to run your fix, I think I missed something because I have the same log level as before.

Is there absolutely nothing nothing in the logs, or do you at least see the default lines like starting wireguard server or wireguard VPN network is but just no error message?

I also tried to pull to docker image built by the CI on the PR, but it does not work. Is the image published after the build on PRs?

No, the CI images aren't pushed

@nathanael-h
Copy link

I have normal logs, like on v0.6.0.

Here is the output:

Creating wg-access-server ... done
Creating traefik          ... done
Attaching to wg-access-server, traefik
wg-access-server    | time="2022-04-05T15:54:52Z" level=info msg="starting wireguard server on :51820" file="main.go:128"
wg-access-server    | time="2022-04-05T15:54:52Z" level=debug msg="set interface up" file="iface_linux.go:26"
wg-access-server    | time="2022-04-05T15:54:52Z" level=info msg="wireguard VPN network is 10.41.0.1/16" file="main.go:143"
traefik             | time="2022-04-05T15:54:52Z" level=info msg="Configuration loaded from flags."
wg-access-server    | time="2022-04-05T15:54:52Z" level=info msg="storing data in SQL backend sqlite3" file="contracts.go:73"
wg-access-server    | time="2022-04-05T15:54:52Z" level=debug msg="creating gorm watcher" file="gormwatcher.go:17"
wg-access-server    | time="2022-04-05T15:54:52Z" level=info msg="starting dns server on 10.41.0.1:53 with upstreams: 127.0.0.11" file="server.go:32"
wg-access-server    | time="2022-04-05T15:54:52Z" level=debug msg="OnAdd callback name github.com/freifunkMUC/wg-access-server/internal/devices.(*DeviceManager).StartSync.func1" file="gormwatcher.go:26"
wg-access-server    | time="2022-04-05T15:54:52Z" level=debug msg="OnDelete callback name github.com/freifunkMUC/wg-access-server/internal/devices.(*DeviceManager).StartSync.func2" file="gormwatcher.go:35"
wg-access-server    | time="2022-04-05T15:54:52Z" level=debug msg="SELECT * FROM \"devices\"  " file="sql.go:32" module=gorm rows=4 src_ref="/code/internal/storage/sql.go:158" type=sql values="[]"
wg-access-server    | time="2022-04-05T15:54:52Z" level=debug msg="found 4 device(s)" file="sql.go:161"
wg-access-server    | time="2022-04-05T15:54:52Z" level=debug msg="metadata sync executing" file="metadata.go:18"
wg-access-server    | time="2022-04-05T15:54:52Z" level=info msg="serving website from ./website/build" file="website_router.go:37"
wg-access-server    | time="2022-04-05T15:54:52Z" level=info msg="web ui listening on :8000" file="main.go:265"
wg-access-server    | time="2022-04-05T15:55:22Z" level=debug msg="metadata sync executing" file="metadata.go:18"
wg-access-server    | time="2022-04-05T15:55:52Z" level=debug msg="metadata sync executing" file="metadata.go:18"
wg-access-server    | time="2022-04-05T15:56:22Z" level=debug msg="metadata sync executing" file="metadata.go:18"
wg-access-server    | time="2022-04-05T15:56:52Z" level=debug msg="metadata sync executing" file="metadata.go:18"
wg-access-server    | time="2022-04-05T15:57:22Z" level=debug msg="metadata sync executing" file="metadata.go:18"


I think I should check my build, and maybe recreate it.

@DasSkelett
Copy link
Member Author

Yeah, please try to confirm that your build does indeed include my changes. If it is though I'll have to go looking for how this can happen again.

@nathanael-h
Copy link

nathanael-h commented Apr 6, 2022

So yesterday I built master while thinking I was on your branch with the fix. I rebuilt the fix and now I have logs corresponding to the error. Thanks !

Do you think it could be possible to have new lines per \n? (Maybe also tab for \t)

traefik             | time="2022-04-06T07:44:35Z" level=info msg="Configuration loaded from flags."
wg-access-server    | time="2022-04-06T07:44:34Z" level=info msg="starting wireguard server on :51820" file="main.go:118"
wg-access-server    | time="2022-04-06T07:44:34Z" level=debug msg="set interface up" file="iface_linux.go:26"
wg-access-server    | time="2022-04-06T07:44:34Z" level=info msg="wireguard VPN network is 10.41.0.1/16" file="main.go:133"
wg-access-server    | time="2022-04-06T07:44:34Z" level=info msg="storing data in SQL backend sqlite3" file="contracts.go:73"
wg-access-server    | time="2022-04-06T07:44:34Z" level=debug msg="creating gorm watcher" file="gormwatcher.go:17"
wg-access-server    | time="2022-04-06T07:44:34Z" level=info msg="starting dns server on 10.41.0.1:53 with upstreams: 127.0.0.11" file="server.go:32"
wg-access-server    | time="2022-04-06T07:44:34Z" level=debug msg="OnAdd callback name github.com/freifunkMUC/wg-access-server/internal/devices.(*DeviceManager).StartSync.func1" file="gormwatcher.go:26"
wg-access-server    | time="2022-04-06T07:44:34Z" level=debug msg="OnDelete callback name github.com/freifunkMUC/wg-access-server/internal/devices.(*DeviceManager).StartSync.func2" file="gormwatcher.go:35"
wg-access-server    | time="2022-04-06T07:44:34Z" level=debug msg="SELECT * FROM \"devices\"  " file="sql.go:32" module=gorm rows=4 src_ref="/code/internal/storage/sql.go:158" type=sql values="[]"
wg-access-server    | time="2022-04-06T07:44:34Z" level=debug msg="found 4 device(s)" file="sql.go:161"
wg-access-server    | time="2022-04-06T07:44:34Z" level=debug msg="metadata sync executing" file="metadata.go:18"
wg-access-server    | time="2022-04-06T07:44:34Z" level=info msg="serving website from ./website/build" file="website_router.go:37"
wg-access-server    | time="2022-04-06T07:44:34Z" level=info msg="web ui listening on :8000" file="main.go:255"
wg-access-server    | time="2022-04-06T07:45:04Z" level=debug msg="metadata sync executing" file="metadata.go:18"
wg-access-server    | time="2022-04-06T07:45:08Z" level=error msg="interface conversion: interface {} is nil, not string" file="middleware.go:23" stack="goroutine 137 [running]:\nruntime/debug.Stack()\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x65\ngithub.com/freifunkMUC/wg-access-server/internal/services.RecoveryMiddleware.func1.1()\n\t/code/internal/services/middleware.go:22 +0xa5\npanic({0xce4b20, 0xc0003d1230})\n\t/usr/local/go/src/runtime/panic.go:838 +0x207\ngithub.com/freifunkMUC/wg-access-server/pkg/authnz/authconfig.(*OIDCConfig).callbackHandler.func1({0xf00b10, 0xc000669a40}, 0xc0001ee100)\n\t/code/pkg/authnz/authconfig/oidc.go:146 +0xa7e\nnet/http.HandlerFunc.ServeHTTP(0xc0001ee000?, {0xf00b10?, 0xc000669a40?}, 0xc00020b710?)\n\t/usr/local/go/src/net/http/server.go:2084 +0x2f\ngithub.com/gorilla/mux.(*Router).ServeHTTP(0xc0003acd80, {0xf00b10, 0xc000669a40}, 0xc0003f7f00)\n\t/go/pkg/mod/github.com/gorilla/mux@v1.8.0/mux.go:210 +0x1cf\ngithub.com/freifunkMUC/wg-access-server/pkg/authnz.(*AuthMiddleware).Middleware.func1({0xf00b10, 0xc000669a40}, 0xc0003f7f00)\n\t/code/pkg/authnz/router.go:93 +0x494\nnet/http.HandlerFunc.ServeHTTP(0xc9d1e0?, {0xf00b10?, 0xc000669a40?}, 0x88?)\n\t/usr/local/go/src/net/http/server.go:2084 +0x2f\ngithub.com/freifunkMUC/wg-access-server/internal/services.RecoveryMiddleware.func1({0xf00b10?, 0xc000669a40?}, 0xc00020b801?)\n\t/code/internal/services/middleware.go:28 +0x83\nnet/http.HandlerFunc.ServeHTTP(0xf01220?, {0xf00b10?, 0xc000669a40?}, 0xc00020b960?)\n\t/usr/local/go/src/net/http/server.go:2084 +0x2f\ngithub.com/freifunkMUC/wg-access-server/internal/services.TracesMiddleware.func1({0xf00b10, 0xc000669a40}, 0xc0003f7e00)\n\t/code/internal/services/middleware.go:13 +0x230\nnet/http.HandlerFunc.ServeHTTP(0xc0003f7d00?, {0xf00b10?, 0xc000669a40?}, 0x0?)\n\t/usr/local/go/src/net/http/server.go:2084 +0x2f\ngithub.com/gorilla/mux.(*Router).ServeHTTP(0xc0003accc0, {0xf00b10, 0xc000669a40}, 0xc0003f7c00)\n\t/go/pkg/mod/github.com/gorilla/mux@v1.8.0/mux.go:210 +0x1cf\nnet/http.serverHandler.ServeHTTP({0xc0004d2660?}, {0xf00b10, 0xc000669a40}, 0xc0003f7c00)\n\t/usr/local/go/src/net/http/server.go:2916 +0x43b\nnet/http.(*conn).serve(0xc0001ba0a0, {0xf01220, 0xc0000e5cb0})\n\t/usr/local/go/src/net/http/server.go:1966 +0x5d7\ncreated by net/http.(*Server).Serve\n\t/usr/local/go/src/net/http/server.go:3071 +0x4db\n" trace.id=596b4d60-8fcd-4444-9001-be1c8cc68baf
wg-access-server    | time="2022-04-06T07:45:34Z" level=debug msg="metadata sync executing" file="metadata.go:18"

@DasSkelett
Copy link
Member Author

Thanks! So the callback handler doesn't handle the missing name claim well, which is indeed optional according to the OIDC spec: https://openid.net/specs/openid-connect-core-1_0.html#UserInfoResponse
I've pushed a commit to handle it gracefully if it is missing, if you want you can pull this branch again and rebuild it, it should work now.

Do you think it could be possible to have new lines per \n? (Maybe also tab for \t)

The logging library currently escapes it automatically, which is necessary because the stacktrace is logged in the stack field. I think it should stay this way so log readers/aggregators can parse it correctly. The stacktrace is not something that should occur in normal operation, and if it does it's easy enough to replace them with proper line breaks and tabs if needed.

@nathanael-h
Copy link

Actually we changed our OIDC provider so that it sends a name so I could really try the last commit. Anyway it should help some others and is closer to the spec. Thats good!

I understand to point about the log output. Ok for me.

This PR is good for me. 👏

@DasSkelett DasSkelett merged commit 788da41 into freifunkMUC:master Apr 8, 2022
@DasSkelett DasSkelett deleted the fix/grpc-logging branch April 8, 2022 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working go Pull requests that update Go code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Auth OIDC and Gitlab errors, but nothing in logs Server errors don't actually log
3 participants