-
-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: initial tracing support #1623
feat: initial tracing support #1623
Conversation
I've intentionally left out the documentation as I'd like to get some feedback on the provided configuration options first |
Also: if tracing is available, the requestId middleware will respect the trace id. |
1 similar comment
Also: if tracing is available, the requestId middleware will respect the trace id. |
b9a5689
to
c8fcb55
Compare
internal/config/config.go
Outdated
@@ -151,7 +156,7 @@ type Configuration struct { | |||
AdminTransPath string `name:"path" usage:"the path of the file to import from/export to"` | |||
AdminMediaPruneDryRun bool `name:"dry-run" usage:"perform a dry run and only log number of items eligible for pruning"` | |||
|
|||
RequestIDHeader string `name:"request-id-header" usage:"Header to extract the Request ID from. Eg.,'X-Request-Id'"` | |||
RequestIDHeader string `name:"request-id-header" usage:"Header to extract the Request ID from. Eg.,'X-Request-Id'. If tracing is enabled, this option has no effect as the trace ID is used as request id"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs reconsidering. We currently pick up the Request-ID from a header like X-Request-ID which can be set by someone's LB/proxy. Having this ignored when tracing is active is not great, because we lose the link between the two.
I'm wondering if we can reverse it, whereby the RequestID is used as the TraceID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be compatible with the trace context standard, the format of the trace ID has to be very specific. We could however introduce a toggle to switch between logging the trace id and an external request id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively we could have both. That decouples the two systems, and then add the requestid as an attribute on the trace instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that would work as well. However, I would still like to see the traceId in the logs at some point. This allows for correlation of logs (e.g errors or weird messages) and traces. At that point we'd have two unrelated IDs in a single log line which could cause confusion.
Just for consideration: if you generate a valid traceparent
on your loadbalancer (or even enable native tracing), the requestId middleware would behave the same as before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the requestID middleware, you see it installs a log-hook. That's what adds a requestid=xxxx
line to the logs currently. The tracing middleware can do the same.
gotosocial/internal/middleware/requestid.go
Lines 79 to 86 in 196cd88
func AddRequestID(header string) gin.HandlerFunc { | |
log.Hook(func(ctx context.Context, kvs []kv.Field) []kv.Field { | |
if id, _ := ctx.Value(ridCtxKey).(string); id != "" { | |
// Add stored request ID to log entry fields. | |
return append(kvs, kv.Field{K: "requestID", V: id}) | |
} | |
return kvs | |
}) |
I don't think having both ID's in there is a problem, since they're namespaced. traceID=xxxx
is not the same thing as requestID=xxx
and that's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think keeping the systems separated makes a lot of sense, but then I'd also introduce a way to disable either system completely.
Adding the external request ID (if enabled) to the trace ID is something I can look into, however this could get complicated very quickly when taking the build-time opt-out into account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think keeping the systems separated makes a lot of sense, but then I'd also introduce a way to disable either system completely.
Yeah, that makes sense. Even without tracing not everyone might want that so it seems reasonable to let folks disable it. Since it's barely any code and doesn't rely on external dependencies I think a config option is fine and we don't need a build-time option for this one.
Adding the external request ID (if enabled) to the trace ID is something I can look into, however this could get complicated very quickly when taking the build-time opt-out into account.
I think this might be easier if you swap the middleware order around? If requestID goes first, then you can call
func RequestID(ctx context.Context) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so I've split up the two systems and extracted the requestID into the span. This is currently quite cumbersome and results in quite ugly middleware initialization.
I think I'll just implement the relevant parts of the otelgin library directly, which would allow reducing the number of middlewares needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm coming in a bit late here, hadn't seen this discussion yet. I think having the requestid as a property on the trace id makes sense. In what situation would you want to log the trace id? In case of an error or something, so you can go look through your traces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly - if some error occurs I could look up the trace ID and see the entire context (e.g source of the request, previous queries and timings etc.)
Most observability solutions also allow you to correlate logs with traces directly so you can quickly switch between these two for faster debugging
f37b91c
to
ab079cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good so far :) Thanks for doing all this!
internal/config/defaults.go
Outdated
@@ -174,5 +179,6 @@ var Defaults = Configuration{ | |||
|
|||
AdminMediaPruneDryRun: true, | |||
|
|||
RequestIDHeader: "X-Request-Id", | |||
RequestIDEnabled: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain in what circumstances would you want to set this to false? Previously we had this enabled by default (and not turn-offable) so I'm wondering what it is in the tracing stuff that means we might wanna turn this off. Not a barbed question, btw, I just don't understand :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I'd disable this if tracing is enabled. My reverse proxy can collect trace data as well, so I'd have a full picture of the entire lifespan of the request without the requestId middleware. If tracing is desired, having a separate, unrelated requestID in the log line increases noise and makes it harder to pick out desired information.
Now if my reverse proxy would not support tracing, I'd leave this enabled to correlate between the proxy and GtS
I'm marking this as a draft and reopen when I've rewritten the middleware. Will keep the PR open however to keep the discussion around the specific approach taken |
ab079cb
to
4824261
Compare
4824261
to
d68fa3d
Compare
Hey @theSuess ! sorry for the delay in reviewing this, a few things got stacked up. Could you possibly rebase it? Then I can review it properly and hopefully merge it in :) |
d68fa3d
to
35df0b7
Compare
47fc067
to
0afe257
Compare
I've rebased the tracing stuff onto the latest changes I noticed that the error page now reports the request ID so I'll remove the toggle for enabling/disabling the requestId middleware as it seems mandatory now |
This splits up the two systems. However if a request ID is set, it is also added as an attribute in the recorded span
this improves the flow of conditionally injecting the tracing or requestid middleware by no longer requiring two middlewares for tracing support
Sorry for the lateness of getting to this PR (work has been a bitch recently lol), it looks good but I did have one question regarding the choice of metrics library itself. Would we be able to use a lighter alternative like: https://github.com/VictoriaMetrics/metrics For an example of reduced binary sizes by library comparison: https://valyala.medium.com/stripping-dependency-bloat-in-victoriametrics-docker-image-983fb5912b0d |
Ah finally my comment posts!! I have been trying for like 3hrs now lol |
Note that this PR is for tracing - not metrics. It just happens that OTEL can do both. I can take a look around if I can find a more lightweight tracing lib, but I'm not aware of one as of now. |
Ah gotcha. I wasn't actually aware of any difference until now, I had always had metrics / tracing in the same meaning in my head! In that case I'm happy with this PR, all looks good :) |
nice work! 🚀 |
Description
This pull request implements basic tracing support for HTTP requests and database queries. It accomplishes this by utilizing the opentelemetry go sdk as well as pre-made integrations for gin and bun.
To reduce binary size, tracing support can be disabled at build time by building GtS with the
notracing
build tag. This will reduce the final binary size by about 4 MB.If you want to test this, the easiest way I found was to set up Grafana and Tempo locally using a compose file similar to the following:
Resulting traces look like this:
The tracing configuration currently supports the jaeger and otlp grpc protocols. IMHO these are enough for most users.
closes #1230
Checklist
go fmt ./...
andgolangci-lint run
.