This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Draft: Migrate to OpenTelemetry tracing #13400
Closed
MadLittleMods
wants to merge
75
commits into
develop
from
madlittlemods/11850-migrate-to-opentelemetry
Closed
Changes from all commits
Commits
Show all changes
75 commits
Select commit
Hold shift + click to select a range
0cc610e
Migrate to OpenTelemetry tracing
MadLittleMods 2fe6911
Some shim and some new
MadLittleMods 6984cef
Progress towards OTEL
MadLittleMods 6406fd5
Server running
MadLittleMods 2428172
Export to Jaeger (things are showing up)
MadLittleMods 0d7a2b9
Revert changes to Sentry scopes (not OTEL)
MadLittleMods 9e1de86
We use the config for the Jaeger exporter now
MadLittleMods f6c3b22
Fix some lints
MadLittleMods 3a25996
Fixup some todos
MadLittleMods 1b0840e
Fix some lints
MadLittleMods 1d208fa
Fix invalid attribute type
MadLittleMods 2011ac2
Fix using wrong type of context (`Context` vs `SpanContext`)
MadLittleMods 19d20b5
Record exception
MadLittleMods 786dd9b
Explain weird function
MadLittleMods 7c135b9
Easier to follow local vs remote span tracing
MadLittleMods d29a4af
Move to start_active_span
MadLittleMods 041acdf
Working second test although it's a bit pointless testing whether ope…
MadLittleMods d848156
Passing tests and context manager doesn't seem to be needed
MadLittleMods 070195a
Use correct type for what start_as_current_span returns
MadLittleMods 7772f50
Use HTTP_HOST attribute
MadLittleMods 322da51
Fix some lints
MadLittleMods 33fd24e
todos
MadLittleMods a9fb504
Implement start_active_span_from_edu for OTEL
MadLittleMods 8e902b8
Remove what's left of scopemanager
MadLittleMods 00be06c
Try to align read from edu content
MadLittleMods 6255a1a
Fix tests and some lints
MadLittleMods b3cdbad
PoC force tracing
MadLittleMods d15fa45
Non-working try baggage to inherit force tracing/sampling
MadLittleMods 6bb7cb7
Revert "Non-working try baggage to inherit force tracing/sampling"
MadLittleMods dbd9005
Revert crazy custom sampler and span process to try force tracing for…
MadLittleMods 0f93ec8
Fix lints
MadLittleMods 36d6648
Remove type ignore comments
MadLittleMods fb0e820
More clear method names
MadLittleMods b09651a
Always return config path for config error
MadLittleMods da396a2
Add test for what happens when side by side spans in with statement
MadLittleMods ad71bc3
End on exit is already the default expected behavior
MadLittleMods 59facea
Restore logging current_context (not sure why removed
MadLittleMods 9d6fcf3
Clean up some opentracing text references
MadLittleMods fcc4220
Update docs
MadLittleMods d72cacf
Add changelog
MadLittleMods ba4a46a
Seems to (see test_side_by_side_spans)
MadLittleMods 72c718d
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods c26fa2d
Move to 72 schema version
MadLittleMods 5999132
Fix lints
MadLittleMods 2491665
Fix remnant
MadLittleMods 16d17f7
Fix table missing column
MadLittleMods b6f5665
Use latested Twisted from source to fix contextvar issues causing OTE…
MadLittleMods 699dad0
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods 270db42
Update treq to match minimum Twisted Python versions
MadLittleMods f5da762
Revert "Update treq to match minimum Twisted Python versions"
MadLittleMods ccd4752
Fix tracing imports after merging in develop
MadLittleMods d7166a0
Update docs/tracing.md
MadLittleMods 7566375
Try fix Twisted/treq problems
MadLittleMods 7024d7b
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods 8def7e4
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods 50f0342
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods f73bc59
Try to resolve poetry deps
MadLittleMods a15592d
Poetry install again
MadLittleMods 32b9d16
poetry update
MadLittleMods 6c40dfa
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods ad3e324
Install otel deps from develop
MadLittleMods 15e242e
OTEL install with DMR
MadLittleMods d730a46
Update Twisted to lastest
MadLittleMods ed11237
Remove linting from CI for now
MadLittleMods 19c6f6e
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods b77d49f
Hopefully fix problem when OTEL not installed with non recording span
MadLittleMods a027c6e
Maybe fix positional argument mismatch for DummyLink
MadLittleMods 84f91e3
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods b86869f
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods e4b9898
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods 4a495ac
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods 7d70acd
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods 627951e
Fix poetry.lock conflicts
MadLittleMods d993cb0
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods 7acb365
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Migrate from OpenTracing to OpenTelemetry (config changes necessary). | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,94 +1,3 @@ | ||
# OpenTracing | ||
|
||
## Background | ||
|
||
OpenTracing is a semi-standard being adopted by a number of distributed | ||
tracing platforms. It is a common api for facilitating vendor-agnostic | ||
tracing instrumentation. That is, we can use the OpenTracing api and | ||
select one of a number of tracer implementations to do the heavy lifting | ||
in the background. Our current selected implementation is Jaeger. | ||
|
||
OpenTracing is a tool which gives an insight into the causal | ||
relationship of work done in and between servers. The servers each track | ||
events and report them to a centralised server - in Synapse's case: | ||
Jaeger. The basic unit used to represent events is the span. The span | ||
roughly represents a single piece of work that was done and the time at | ||
which it occurred. A span can have child spans, meaning that the work of | ||
the child had to be completed for the parent span to complete, or it can | ||
have follow-on spans which represent work that is undertaken as a result | ||
of the parent but is not depended on by the parent to in order to | ||
finish. | ||
|
||
Since this is undertaken in a distributed environment a request to | ||
another server, such as an RPC or a simple GET, can be considered a span | ||
(a unit or work) for the local server. This causal link is what | ||
OpenTracing aims to capture and visualise. In order to do this metadata | ||
about the local server's span, i.e the 'span context', needs to be | ||
included with the request to the remote. | ||
|
||
It is up to the remote server to decide what it does with the spans it | ||
creates. This is called the sampling policy and it can be configured | ||
through Jaeger's settings. | ||
|
||
For OpenTracing concepts see | ||
<https://opentracing.io/docs/overview/what-is-tracing/>. | ||
|
||
For more information about Jaeger's implementation see | ||
<https://www.jaegertracing.io/docs/> | ||
|
||
## Setting up OpenTracing | ||
|
||
To receive OpenTracing spans, start up a Jaeger server. This can be done | ||
using docker like so: | ||
|
||
```sh | ||
docker run -d --name jaeger \ | ||
-p 6831:6831/udp \ | ||
-p 6832:6832/udp \ | ||
-p 5778:5778 \ | ||
-p 16686:16686 \ | ||
-p 14268:14268 \ | ||
jaegertracing/all-in-one:1 | ||
``` | ||
|
||
Latest documentation is probably at | ||
https://www.jaegertracing.io/docs/latest/getting-started. | ||
|
||
## Enable OpenTracing in Synapse | ||
|
||
OpenTracing is not enabled by default. It must be enabled in the | ||
homeserver config by adding the `opentracing` option to your config file. You can find | ||
documentation about how to do this in the [config manual under the header 'Opentracing'](usage/configuration/config_documentation.md#opentracing). | ||
See below for an example Opentracing configuration: | ||
|
||
```yaml | ||
opentracing: | ||
enabled: true | ||
homeserver_whitelist: | ||
- "mytrustedhomeserver.org" | ||
- "*.myotherhomeservers.com" | ||
``` | ||
|
||
## Homeserver whitelisting | ||
|
||
The homeserver whitelist is configured using regular expressions. A list | ||
of regular expressions can be given and their union will be compared | ||
when propagating any spans contexts to another homeserver. | ||
|
||
Though it's mostly safe to send and receive span contexts to and from | ||
untrusted users since span contexts are usually opaque ids it can lead | ||
to two problems, namely: | ||
|
||
- If the span context is marked as sampled by the sending homeserver | ||
the receiver will sample it. Therefore two homeservers with wildly | ||
different sampling policies could incur higher sampling counts than | ||
intended. | ||
- Sending servers can attach arbitrary data to spans, known as | ||
'baggage'. For safety this has been disabled in Synapse but that | ||
doesn't prevent another server sending you baggage which will be | ||
logged to OpenTracing's logs. | ||
|
||
## Configuring Jaeger | ||
|
||
Sampling strategies can be set as in this document: | ||
<https://www.jaegertracing.io/docs/latest/sampling/>. | ||
Synapse now uses OpenTelemetry and the [documentation for tracing has moved](./tracing.md). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
# Tracing | ||
|
||
## Background | ||
|
||
OpenTelemetry is a semi-standard being adopted by a number of distributed | ||
tracing platforms. It is a common API for facilitating vendor-agnostic | ||
tracing instrumentation. | ||
|
||
Tracing is a tool which gives an insight into the causal | ||
relationship of work done in and between servers. The servers each track | ||
events and report them to a centralised server - in Synapse's case: | ||
Jaeger. The basic unit used to represent events is the span. The span | ||
roughly represents a single piece of work that was done and the time at | ||
which it occurred. A span can have child spans, meaning that the work of | ||
the child had to be completed for the parent span to complete, or it can | ||
have follow-on spans which represent work that is undertaken as a result | ||
of the parent but is not depended on by the parent to in order to | ||
finish. | ||
|
||
Since this is undertaken in a distributed environment a request to | ||
another server, such as an RPC or a simple GET, can be considered a span | ||
(a unit or work) for the local server. This causal link is what | ||
tracing aims to capture and visualise. In order to do this metadata | ||
about the local server's span, i.e the 'span context', needs to be | ||
included with the request to the remote. | ||
|
||
It is up to the remote server to decide what it does with the spans it | ||
creates. This is called the sampling policy and it can be configured | ||
through Jaeger's settings. | ||
|
||
For OpenTelemetry concepts, see | ||
<https://opentelemetry.io/docs/concepts/>. | ||
|
||
For more information about the Python implementation of OpenTelemetry we're using, see | ||
<https://opentelemetry.io/docs/instrumentation/python/> | ||
|
||
For more information about Jaeger, see | ||
<https://www.jaegertracing.io/docs/> | ||
|
||
## Setting up tracing | ||
|
||
To receive tracing spans, start up a Jaeger server. This can be done | ||
using docker like so: | ||
|
||
```sh | ||
docker run -d --name jaeger \ | ||
-p 6831:6831/udp \ | ||
-p 6832:6832/udp \ | ||
-p 5778:5778 \ | ||
-p 16686:16686 \ | ||
-p 14268:14268 \ | ||
jaegertracing/all-in-one:1 | ||
``` | ||
|
||
Latest documentation is probably at | ||
https://www.jaegertracing.io/docs/latest/getting-started. | ||
|
||
## Enable tracing in Synapse | ||
|
||
Tracing is not enabled by default. It must be enabled in the | ||
homeserver config by adding the `tracing` option to your config file. You can find | ||
documentation about how to do this in the [config manual under the header 'Tracing'](usage/configuration/config_documentation.md#tracing). | ||
See below for an example tracing configuration: | ||
|
||
```yaml | ||
tracing: | ||
enabled: true | ||
homeserver_whitelist: | ||
- "mytrustedhomeserver.org" | ||
- "*.myotherhomeservers.com" | ||
``` | ||
|
||
## Homeserver whitelisting | ||
|
||
The homeserver whitelist is configured using regular expressions. A list | ||
of regular expressions can be given and their union will be compared | ||
when propagating any spans contexts to another homeserver. | ||
|
||
Though it's mostly safe to send and receive span contexts to and from | ||
untrusted users since span contexts are usually opaque ids it can lead | ||
to two problems, namely: | ||
|
||
- If the span context is marked as sampled by the sending homeserver | ||
the receiver will sample it. Therefore two homeservers with wildly | ||
different sampling policies could incur higher sampling counts than | ||
intended. | ||
- Sending servers can attach arbitrary data to spans, known as | ||
'baggage'. For safety this has been disabled in Synapse but that | ||
doesn't prevent another server sending you baggage which will be | ||
logged in the trace. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any other changelogs to base this kind of change on?