Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter spans before sending them to the opentelemetry layer #2894

Merged
merged 30 commits into from
Apr 25, 2023

Conversation

Geal
Copy link
Contributor

@Geal Geal commented Mar 31, 2023

the sampling configuration in the opentelemetry layer only applies when the span closes, so in the meantime a lot of data is created just to be dropped. This experiments with a filter than can sample spans before the opentelemetry layer

@github-actions

This comment has been minimized.

Geal added 4 commits April 4, 2023 09:43
the filter is not part of the reloadable tracer (which is on the
opentelemetry side), so we use an atomic (lighter than a RwLock),
and there's no AtomicF64 in std, so we transmute back and forth
with a u64
@Geal Geal marked this pull request as ready for review April 7, 2023 14:01
@BrynCooke
Copy link
Contributor

Perf test please

@BrynCooke
Copy link
Contributor

Perf test please

@Geal Geal requested review from garypen and bnjjj April 17, 2023 09:34
@Geal
Copy link
Contributor Author

Geal commented Apr 17, 2023

the PR should be good to review now. I merged dev, fixed the tests and removed some cruft, it's ice and small now.
@BrynCooke it seems the perf tests can get stuck, like this one: https://github.com/apollographql/router/runs/12794146900

Copy link
Contributor

@BrynCooke BrynCooke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved pending the suggestion that @garypen made.

@Geal Geal enabled auto-merge (squash) April 21, 2023 10:07
Copy link
Contributor

@SimonSapin SimonSapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good assuming my previous comment is addressed one way or another

the span might have been enabled by another layer
@Geal Geal disabled auto-merge April 25, 2023 08:14
@Geal Geal merged commit bdd33f7 into staging-perf Apr 25, 2023
@Geal Geal deleted the geal/filter-spans branch April 25, 2023 09:45
Geal added a commit that referenced this pull request Apr 25, 2023
The Opentelemetry layer accumulates data from all spans before sending
it to the opentelemetry span exporters. This means that sampling
decisions are done after a lot of data was already allocated. And that
this work is done even if no exporter is set up.

This adds a sampling filter around the opentelemetry layer, to do the
sampling decision as early as possible, and avoid this work entirely if
no exporter is configured
Geal added a commit that referenced this pull request May 15, 2023
Batch of performance improvements:

* linux: replace default allocator with jemalloc (#2882) 
* Add a private part to the Context structure (#2802) 
* lighter path manipulation in response formatting (#2854) 
* prevent span attributes from being formatted to write logs (#2890) 
* Parse Accept headers once instead of three times (#2847) 
* use a parking-lot mutex in Context (#2885) 
* Filter spans before sending them to the opentelemetry layer (#2894)
BrynCooke pushed a commit that referenced this pull request May 18, 2023
If an otel context has been created then spans should be sampled regardless of if the sampler says to sample. The presence of a context means that we got a context from the client.

Fixes #3103
BrynCooke pushed a commit that referenced this pull request May 18, 2023
If an otel context has been created then spans should be sampled regardless of if the sampler says to sample. The presence of a context means that we got a context from the client.

Fixes #3103
BrynCooke pushed a commit that referenced this pull request May 18, 2023
If an otel context has been created then spans should be sampled regardless of if the sampler says to sample. The presence of a context means that we got a context from the client.

Fixes #3103
BrynCooke pushed a commit that referenced this pull request May 18, 2023
BrynCooke pushed a commit that referenced this pull request May 18, 2023
If there is no otel context set up we let the otel layer do its thing.
If there is a context set up then we can filter early depending on if the span is being sampeld or not.

Fixes #3103
BrynCooke pushed a commit that referenced this pull request May 18, 2023
…2894)"

This reverts commit ecd5d57.

The reson for this is that parent based sampling definitely regressed and there are some ongoing questions around correctness.
We will look again after the release as this did appear to give a good speedup.
BrynCooke added a commit that referenced this pull request May 18, 2023
…2894)" (#3111)

This reverts commit ecd5d57.

The reason for this is that parent based sampling definitely regressed
and there are some ongoing questions around correctness. We will look
again after the release as this did appear to give a good speedup.

Related:
#2894 
#3103 

<!-- start metadata -->

**Checklist**

Complete the checklist (and note appropriate exceptions) before a final
PR is raised.

- [ ] Changes are compatible[^1]
- [ ] Documentation[^2] completed
- [ ] Performance impact assessed and acceptable
- Tests added and passing[^3]
    - [ ] Unit Tests
    - [ ] Integration Tests
    - [ ] Manual Tests

**Exceptions**

*Note any exceptions here*

**Notes**

[^1]. It may be appropriate to bring upcoming changes to the attention
of other (impacted) groups. Please endeavour to do this before seeking
PR approval. The mechanism for doing this will vary considerably, so use
your judgement as to how and when to do this.
[^2]. Configuration is an important part of many changes. Where
applicable please try to document configuration examples.
[^3]. Tick whichever testing boxes are applicable. If you are adding
Manual Tests:
- please document the manual testing (extensively) in the Exceptions.
- please raise a separate issue to automate the test and label it (or
ask for it to be labeled) as `manual test`

Co-authored-by: bryn <bryn@apollographql.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants