Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: fixes the logic in convertJaegerTraceToProfile function #1679

Closed
wants to merge 1 commit into from

Conversation

petethepig
Copy link
Member

@petethepig petethepig commented Nov 4, 2022

Addresses #1671 / jaegertracing/jaeger-ui#1012

Trace

Screen Shot 2022-11-04 at 3 15 57 PM

Flamegraph Before

before

Flamegraph After

after

Notes

One of the characteristics of the trace from the original issue was that it had parallel executions, particularly in logic-proxy: upstream node. That's why it looked so strange before.

This PR makes it so that in such cases when the sum of children nodes' duration is above parent duration, parent duration is set to the sum of children nodes' duration.

One unfortunate consequence of this is that now durations of the top nodes are inflated and don't always match the durations in span, e.g look at the top eyeball-facing-service: request. While this is correct in a sense that that's how much wall time was spent processing a request, it might be confusing to some.

@bobrik What do you think about this?

@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2022

size-limit report 📦

Path Size Loading time (3g) Running time (snapdragon) Total time
webapp/public/assets/app.js 485.37 KB (+0.01% 🔺) 9.8 s (+0.01% 🔺) 4.1 s (+2.99% 🔺) 13.8 s
webapp/public/assets/app.css 18.62 KB (0%) 373 ms (0%) 0 ms (+100% 🔺) 373 ms
webapp/public/assets/styles.css 9.54 KB (0%) 191 ms (0%) 0 ms (+100% 🔺) 191 ms
packages/pyroscope-flamegraph/dist/index.js 129.73 KB (+0.01% 🔺) 2.6 s (+0.01% 🔺) 1.2 s (-24.23% 🔽) 3.8 s
packages/pyroscope-flamegraph/dist/index.node.js 130.19 KB (+0.01% 🔺) 2.7 s (+0.01% 🔺) 752 ms (+3.8% 🔺) 3.4 s
packages/pyroscope-flamegraph/dist/index.css 7.88 KB (0%) 158 ms (0%) 0 ms (+100% 🔺) 158 ms

@codecov
Copy link

codecov bot commented Nov 4, 2022

Codecov Report

Base: 66.72% // Head: 66.56% // Decreases project coverage by -0.15% ⚠️

Coverage data is based on head (7d6b2b2) compared to base (04ab88a).
Patch coverage: 0.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1679      +/-   ##
==========================================
- Coverage   66.72%   66.56%   -0.15%     
==========================================
  Files         156      156              
  Lines        5260     5274      +14     
  Branches     1202     1206       +4     
==========================================
+ Hits         3509     3510       +1     
- Misses       1745     1758      +13     
  Partials        6        6              
Impacted Files Coverage Δ
...megraph/src/convert/convertJaegerTraceToProfile.ts 16.33% <0.00%> (ø)
webapp/javascript/ui/StatusMessage/index.tsx 61.91% <0.00%> (-3.09%) ⬇️
webapp/javascript/redux/reducers/tracing.ts 19.45% <0.00%> (-1.16%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@bobrik
Copy link

bobrik commented Nov 8, 2022

While this is correct in a sense that that's how much wall time was spent processing a request

In my example the whole request took 162ms, but your top flamegraph slab is 268ms. That doesn't seem to match your description. That doesn't seem right.

@Rperry2174
Copy link
Contributor

Rperry2174 commented Nov 8, 2022

@bobrik did you see the note at the bottom?

One of the characteristics of the trace from the original issue was that it had parallel executions, particularly in logic-proxy: upstream node. That's why it looked so strange before.
This PR makes it so that in such cases when the sum of children nodes' duration is above parent duration, parent duration is set to the sum of children nodes' duration.
One unfortunate consequence of this is that now durations of the top nodes are inflated and don't always match the durations in span, e.g look at the top eyeball-facing-service: request. While this is correct in a sense that that's how much wall time was spent processing a request, it might be confusing to some.

This is why there is a discrepancy between the request length and what the flamegraph shows... what do you think the expected behavior should be here since flamegraphs don't actually have a concept of "parallel" execution

While the full request only took 162ms, the flamegraph shows 268ms because there was 268ms of "work" being done (just some of it was happening in parallel). This is the most accurate way to represent this situation that we could think of so would love to hear your thoughts on how you plan to use this flamegraph and what the expected output would be in your opinion

@bobrik
Copy link

bobrik commented Nov 9, 2022

did you see the note at the bottom?

Yes, I quoted from it. I don't agree with the "that's how much wall time was spent processing a request" framing.

While the full request only took 162ms, the flamegraph shows 268ms because there was 268ms of "work" being done (just some of it was happening in parallel).

Tracing is generally used to gauge latency (wall time), not work (which is usually a proxy for cpu time) as flamegraphs.

I don't know if there's a good way to translate concurrent spans into flamegraphs. It might be acceptable to represent them as you do with your change visually, but show the actual timing values on the slabs. As a concrete example for my trace:

[ logic-proxy: upstream (70.1%, 133.91ms)                                                         ]
[ logic-proxy: upstream_ttfb (60.39%, 112.75ms) ][ customer-code: http_request (60.39%, 112.69ms) ]

Here we scale width of upstream_ttfb and http_request to make them fit into upstream, but keep them proportional.

If you can add some sort of warning to upstream span showing that it has children adding up to more than their parent (or maybe even just parallel running children), that would be even better to avoid confusion.

Let me know if this makes sense.

@Rperry2174
Copy link
Contributor

hey @bobrik yes this makes sense and might be a good compromise we'll look into seeing what changes we'd need to make in order to allow the flamegraph component to make them "overwrite" the true values, but this may take a little while to dig into

@cescoferraro
Copy link

cescoferraro commented Nov 16, 2022

@petethepig how can I try this PR on my project?
@Rperry2174 we at Tracetest really need this

@cescoferraro
Copy link

cescoferraro commented Nov 16, 2022

I published this PR as cescoferraro-flamegraph

@eh-am
Copy link
Collaborator

eh-am commented Nov 16, 2022

Hey @cescoferraro, even though you already published a PR, we have a CI step that publishes the flamegraph (https://github.com/pyroscope-io/pyroscope/actions/runs/3397437411/jobs/5649595841). You can point to @pyroscope/flamegraph@0.25.2-1679-ab8cd3f.0

image

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@petethepig petethepig closed this Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants