feat: initial tracing support #1623

theSuess · 2023-03-14T17:29:58Z

Description

This pull request implements basic tracing support for HTTP requests and database queries. It accomplishes this by utilizing the opentelemetry go sdk as well as pre-made integrations for gin and bun.

To reduce binary size, tracing support can be disabled at build time by building GtS with the notracing build tag. This will reduce the final binary size by about 4 MB.

If you want to test this, the easiest way I found was to set up Grafana and Tempo locally using a compose file similar to the following:

---
volumes:
  grafana_data:
services:
  grafana:
    image: grafana/grafana:9.3.2
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
      - GF_AUTH_DISABLE_LOGIN_FORM=true
      - GF_FEATURE_TOGGLES_ENABLE=traceqlEditor
    ports:
      - "3001:3000"
  tempo:
    image: grafana/tempo:latest
    command:
      [
        "-storage.trace.backend=local",
        "-storage.trace.wal.path=/tmp/tempo/wal",
        "-storage.trace.local.path=/tmp/tempo/blocks",
        "-server.http-listen-port=3200",
      ]
    ports:
      - "14268:14268" # jaeger ingest
      - "3200:3200" # tempo
      - "4317:4317" # otlp grpc

Resulting traces look like this:

The tracing configuration currently supports the jaeger and otlp grpc protocols. IMHO these are enough for most users.

closes #1230

Checklist

I/we have read the GoToSocial contribution guidelines.
I/we have discussed the proposed changes already, either in an issue on the repository, or in the Matrix chat.
I/we have performed a self-review of added code.
I/we have written code that is legible and maintainable by others.
I/we have commented the added code, particularly in hard-to-understand areas.
I/we have made any necessary changes to documentation.
I/we have added tests that cover new code.
I/we have run tests and they pass locally with the changes.
I/we have run go fmt ./... and golangci-lint run.

theSuess · 2023-03-14T17:30:29Z

I've intentionally left out the documentation as I'd like to get some feedback on the provided configuration options first

theSuess · 2023-03-14T17:36:24Z

Also: if tracing is available, the requestId middleware will respect the trace id.
This should also work in distributed settings where GtS receives the parent trace id via the traceparent header

theSuess · 2023-03-14T17:36:44Z

Also: if tracing is available, the requestId middleware will respect the trace id.
This should also work in distributed settings where GtS receives the parent trace id via the traceparent header

daenney · 2023-03-14T18:34:46Z

internal/config/config.go

@@ -151,7 +156,7 @@ type Configuration struct {
 	AdminTransPath        string `name:"path" usage:"the path of the file to import from/export to"`
 	AdminMediaPruneDryRun bool   `name:"dry-run" usage:"perform a dry run and only log number of items eligible for pruning"`

-	RequestIDHeader string `name:"request-id-header" usage:"Header to extract the Request ID from. Eg.,'X-Request-Id'"`
+	RequestIDHeader string `name:"request-id-header" usage:"Header to extract the Request ID from. Eg.,'X-Request-Id'. If tracing is enabled, this option has no effect as the trace ID is used as request id"`


I think this needs reconsidering. We currently pick up the Request-ID from a header like X-Request-ID which can be set by someone's LB/proxy. Having this ignored when tracing is active is not great, because we lose the link between the two.

I'm wondering if we can reverse it, whereby the RequestID is used as the TraceID

To be compatible with the trace context standard, the format of the trace ID has to be very specific. We could however introduce a toggle to switch between logging the trace id and an external request id

Alternatively we could have both. That decouples the two systems, and then add the requestid as an attribute on the trace instead?

Sure, that would work as well. However, I would still like to see the traceId in the logs at some point. This allows for correlation of logs (e.g errors or weird messages) and traces. At that point we'd have two unrelated IDs in a single log line which could cause confusion.

Just for consideration: if you generate a valid traceparent on your loadbalancer (or even enable native tracing), the requestId middleware would behave the same as before

If you look at the requestID middleware, you see it installs a log-hook. That's what adds a requestid=xxxx line to the logs currently. The tracing middleware can do the same.

gotosocial/internal/middleware/requestid.go

Lines 79 to 86 in 196cd88

func AddRequestID(header string) gin.HandlerFunc {

log.Hook(func(ctx context.Context, kvs []kv.Field) []kv.Field {

if id, _ := ctx.Value(ridCtxKey).(string); id != "" {

// Add stored request ID to log entry fields.

return append(kvs, kv.Field{K: "requestID", V: id})

}

return kvs

})

I don't think having both ID's in there is a problem, since they're namespaced. traceID=xxxx is not the same thing as requestID=xxx and that's fine.

I think keeping the systems separated makes a lot of sense, but then I'd also introduce a way to disable either system completely.

Adding the external request ID (if enabled) to the trace ID is something I can look into, however this could get complicated very quickly when taking the build-time opt-out into account.

I think keeping the systems separated makes a lot of sense, but then I'd also introduce a way to disable either system completely.

Yeah, that makes sense. Even without tracing not everyone might want that so it seems reasonable to let folks disable it. Since it's barely any code and doesn't rely on external dependencies I think a config option is fine and we don't need a build-time option for this one.

Adding the external request ID (if enabled) to the trace ID is something I can look into, however this could get complicated very quickly when taking the build-time opt-out into account.

I think this might be easier if you swap the middleware order around? If requestID goes first, then you can call

gotosocial/internal/middleware/requestid.go

Line 73 in 196cd88

func RequestID(ctx context.Context) string {

in the tracing middleware and add it to the span there. I think.

Okay, so I've split up the two systems and extracted the requestID into the span. This is currently quite cumbersome and results in quite ugly middleware initialization.

I think I'll just implement the relevant parts of the otelgin library directly, which would allow reducing the number of middlewares needed.

Sorry I'm coming in a bit late here, hadn't seen this discussion yet. I think having the requestid as a property on the trace id makes sense. In what situation would you want to log the trace id? In case of an error or something, so you can go look through your traces?

Exactly - if some error occurs I could look up the trace ID and see the entire context (e.g source of the request, previous queries and timings etc.)

Most observability solutions also allow you to correlate logs with traces directly so you can quickly switch between these two for faster debugging

tsmethurst

Looking good so far :) Thanks for doing all this!

internal/config/helpers.gen.go

tsmethurst · 2023-03-16T11:04:11Z

internal/config/defaults.go

@@ -174,5 +179,6 @@ var Defaults = Configuration{

 	AdminMediaPruneDryRun: true,

-	RequestIDHeader: "X-Request-Id",
+	RequestIDEnabled: true,


Could you explain in what circumstances would you want to set this to false? Previously we had this enabled by default (and not turn-offable) so I'm wondering what it is in the tracing stuff that means we might wanna turn this off. Not a barbed question, btw, I just don't understand :P

Sure! I'd disable this if tracing is enabled. My reverse proxy can collect trace data as well, so I'd have a full picture of the entire lifespan of the request without the requestId middleware. If tracing is desired, having a separate, unrelated requestID in the log line increases noise and makes it harder to pick out desired information.

Now if my reverse proxy would not support tracing, I'd leave this enabled to correlate between the proxy and GtS

theSuess · 2023-03-17T15:49:58Z

I'm marking this as a draft and reopen when I've rewritten the middleware. Will keep the PR open however to keep the discussion around the specific approach taken

tsmethurst · 2023-03-27T14:06:43Z

Hey @theSuess ! sorry for the delay in reviewing this, a few things got stacked up. Could you possibly rebase it? Then I can review it properly and hopefully merge it in :)

theSuess · 2023-04-28T17:38:37Z

I've rebased the tracing stuff onto the latest changes

I noticed that the error page now reports the request ID so I'll remove the toggle for enabling/disabling the requestId middleware as it seems mandatory now

internal/config/defaults.go

This splits up the two systems. However if a request ID is set, it is also added as an attribute in the recorded span

this improves the flow of conditionally injecting the tracing or requestid middleware by no longer requiring two middlewares for tracing support

NyaaaWhatsUpDoc · 2023-05-09T13:33:34Z

Sorry for the lateness of getting to this PR (work has been a bitch recently lol), it looks good but I did have one question regarding the choice of metrics library itself. Would we be able to use a lighter alternative like: https://github.com/VictoriaMetrics/metrics

For an example of reduced binary sizes by library comparison: https://valyala.medium.com/stripping-dependency-bloat-in-victoriametrics-docker-image-983fb5912b0d

NyaaaWhatsUpDoc · 2023-05-09T13:33:53Z

Ah finally my comment posts!! I have been trying for like 3hrs now lol

theSuess · 2023-05-09T16:16:26Z

Sorry for the lateness of getting to this PR (work has been a bitch recently lol), it looks good but I did have one question regarding the choice of metrics library itself. Would we be able to use a lighter alternative like: https://github.com/VictoriaMetrics/metrics

For an example of reduced binary sizes by library comparison: https://valyala.medium.com/stripping-dependency-bloat-in-victoriametrics-docker-image-983fb5912b0d

Note that this PR is for tracing - not metrics. It just happens that OTEL can do both. I can take a look around if I can find a more lightweight tracing lib, but I'm not aware of one as of now.

NyaaaWhatsUpDoc · 2023-05-09T17:01:35Z

Sorry for the lateness of getting to this PR (work has been a bitch recently lol), it looks good but I did have one question regarding the choice of metrics library itself. Would we be able to use a lighter alternative like: VictoriaMetrics/metrics
For an example of reduced binary sizes by library comparison: valyala.medium.com/stripping-dependency-bloat-in-victoriametrics-docker-image-983fb5912b0d

Note that this PR is for tracing - not metrics. It just happens that OTEL can do both. I can take a look around if I can find a more lightweight tracing lib, but I'm not aware of one as of now.

Ah gotcha. I wasn't actually aware of any difference until now, I had always had metrics / tracing in the same meaning in my head! In that case I'm happy with this PR, all looks good :)

NyaaaWhatsUpDoc · 2023-05-09T17:19:41Z

nice work! 🚀

theSuess force-pushed the tracing-support branch from b9a5689 to c8fcb55 Compare March 14, 2023 17:39

daenney reviewed Mar 14, 2023

View reviewed changes

theSuess force-pushed the tracing-support branch 2 times, most recently from f37b91c to ab079cb Compare March 15, 2023 17:27

tsmethurst reviewed Mar 16, 2023

View reviewed changes

theSuess marked this pull request as draft March 17, 2023 15:49

theSuess force-pushed the tracing-support branch from ab079cb to 4824261 Compare March 18, 2023 09:07

theSuess marked this pull request as ready for review March 18, 2023 09:14

theSuess force-pushed the tracing-support branch from 4824261 to d68fa3d Compare March 18, 2023 09:16

theSuess force-pushed the tracing-support branch from d68fa3d to 35df0b7 Compare March 29, 2023 15:54

theSuess force-pushed the tracing-support branch 3 times, most recently from 47fc067 to 0afe257 Compare April 28, 2023 16:59

tsmethurst reviewed May 1, 2023

View reviewed changes

internal/config/defaults.go Show resolved Hide resolved

theSuess force-pushed the tracing-support branch from 0afe257 to c75a990 Compare May 2, 2023 15:17

theSuess added 7 commits May 7, 2023 07:35

feat: initial tracing support

91d5abf

feat: split up requestid and tracing

ab66524

This splits up the two systems. However if a request ID is set, it is also added as an attribute in the recorded span

feat: independently toggle tracing and requestid

8f29e89

fix: conditionally add tracing hook to bun

e50e64d

refactor: roll own tracing middleware implementation

f9a0a90

this improves the flow of conditionally injecting the tracing or requestid middleware by no longer requiring two middlewares for tracing support

docs: add tracing documentation

3184773

chore: remove toggle for requestID

af967e2

theSuess force-pushed the tracing-support branch from c75a990 to af967e2 Compare May 7, 2023 05:50

NyaaaWhatsUpDoc merged commit 6392e00 into superseriousbusiness:main May 9, 2023

This was referenced Oct 31, 2023

[feature] Prometheus metrics implementation #1218

Closed

[feature] Initial metrics #2334

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: initial tracing support #1623

feat: initial tracing support #1623

theSuess commented Mar 14, 2023

theSuess commented Mar 14, 2023

theSuess commented Mar 14, 2023

theSuess commented Mar 14, 2023

daenney Mar 14, 2023

theSuess Mar 14, 2023

daenney Mar 14, 2023

theSuess Mar 14, 2023

daenney Mar 14, 2023 •

edited

Loading

theSuess Mar 14, 2023

daenney Mar 14, 2023

theSuess Mar 15, 2023

tsmethurst Mar 15, 2023

theSuess Mar 15, 2023 •

edited

Loading

tsmethurst left a comment

tsmethurst Mar 16, 2023

theSuess Mar 16, 2023

theSuess commented Mar 17, 2023

tsmethurst commented Mar 27, 2023

theSuess commented Apr 28, 2023

NyaaaWhatsUpDoc commented May 9, 2023

NyaaaWhatsUpDoc commented May 9, 2023

theSuess commented May 9, 2023

NyaaaWhatsUpDoc commented May 9, 2023

NyaaaWhatsUpDoc commented May 9, 2023

	func AddRequestID(header string) gin.HandlerFunc {
	log.Hook(func(ctx context.Context, kvs []kv.Field) []kv.Field {
	if id, _ := ctx.Value(ridCtxKey).(string); id != "" {
	// Add stored request ID to log entry fields.
	return append(kvs, kv.Field{K: "requestID", V: id})
	}
	return kvs
	})

feat: initial tracing support #1623

feat: initial tracing support #1623

Conversation

theSuess commented Mar 14, 2023

Description

Checklist

theSuess commented Mar 14, 2023

theSuess commented Mar 14, 2023

theSuess commented Mar 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daenney Mar 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

theSuess Mar 15, 2023 • edited Loading

Choose a reason for hiding this comment

tsmethurst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

theSuess commented Mar 17, 2023

tsmethurst commented Mar 27, 2023

theSuess commented Apr 28, 2023

NyaaaWhatsUpDoc commented May 9, 2023

NyaaaWhatsUpDoc commented May 9, 2023

theSuess commented May 9, 2023

NyaaaWhatsUpDoc commented May 9, 2023

NyaaaWhatsUpDoc commented May 9, 2023

daenney Mar 14, 2023 •

edited

Loading

theSuess Mar 15, 2023 •

edited

Loading