chore: fixes the logic in convertJaegerTraceToProfile function #1679

petethepig · 2022-11-04T22:17:24Z

Addresses #1671 / jaegertracing/jaeger-ui#1012

Trace

Flamegraph Before

Flamegraph After

Notes

One of the characteristics of the trace from the original issue was that it had parallel executions, particularly in logic-proxy: upstream node. That's why it looked so strange before.

This PR makes it so that in such cases when the sum of children nodes' duration is above parent duration, parent duration is set to the sum of children nodes' duration.

One unfortunate consequence of this is that now durations of the top nodes are inflated and don't always match the durations in span, e.g look at the top eyeball-facing-service: request. While this is correct in a sense that that's how much wall time was spent processing a request, it might be confusing to some.

@bobrik What do you think about this?

github-actions · 2022-11-04T22:20:57Z

size-limit report 📦

Path	Size	Loading time (3g)	Running time (snapdragon)	Total time
webapp/public/assets/app.js	485.37 KB (+0.01% 🔺)	9.8 s (+0.01% 🔺)	4.1 s (+2.99% 🔺)	13.8 s
webapp/public/assets/app.css	18.62 KB (0%)	373 ms (0%)	0 ms (+100% 🔺)	373 ms
webapp/public/assets/styles.css	9.54 KB (0%)	191 ms (0%)	0 ms (+100% 🔺)	191 ms
packages/pyroscope-flamegraph/dist/index.js	129.73 KB (+0.01% 🔺)	2.6 s (+0.01% 🔺)	1.2 s (-24.23% 🔽)	3.8 s
packages/pyroscope-flamegraph/dist/index.node.js	130.19 KB (+0.01% 🔺)	2.7 s (+0.01% 🔺)	752 ms (+3.8% 🔺)	3.4 s
packages/pyroscope-flamegraph/dist/index.css	7.88 KB (0%)	158 ms (0%)	0 ms (+100% 🔺)	158 ms

codecov · 2022-11-04T22:22:20Z

Codecov Report

Base: 66.72% // Head: 66.56% // Decreases project coverage by -0.15% ⚠️

Coverage data is based on head (7d6b2b2) compared to base (04ab88a).
Patch coverage: 0.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1679      +/-   ##
==========================================
- Coverage   66.72%   66.56%   -0.15%     
==========================================
  Files         156      156              
  Lines        5260     5274      +14     
  Branches     1202     1206       +4     
==========================================
+ Hits         3509     3510       +1     
- Misses       1745     1758      +13     
  Partials        6        6

Impacted Files	Coverage Δ
...megraph/src/convert/convertJaegerTraceToProfile.ts	`16.33% <0.00%> (ø)`
webapp/javascript/ui/StatusMessage/index.tsx	`61.91% <0.00%> (-3.09%)`	⬇️
webapp/javascript/redux/reducers/tracing.ts	`19.45% <0.00%> (-1.16%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

bobrik · 2022-11-08T01:05:10Z

While this is correct in a sense that that's how much wall time was spent processing a request

In my example the whole request took 162ms, but your top flamegraph slab is 268ms. That doesn't seem to match your description. That doesn't seem right.

Rperry2174 · 2022-11-08T06:39:47Z

@bobrik did you see the note at the bottom?

One of the characteristics of the trace from the original issue was that it had parallel executions, particularly in logic-proxy: upstream node. That's why it looked so strange before.
This PR makes it so that in such cases when the sum of children nodes' duration is above parent duration, parent duration is set to the sum of children nodes' duration.
One unfortunate consequence of this is that now durations of the top nodes are inflated and don't always match the durations in span, e.g look at the top eyeball-facing-service: request. While this is correct in a sense that that's how much wall time was spent processing a request, it might be confusing to some.

This is why there is a discrepancy between the request length and what the flamegraph shows... what do you think the expected behavior should be here since flamegraphs don't actually have a concept of "parallel" execution

While the full request only took 162ms, the flamegraph shows 268ms because there was 268ms of "work" being done (just some of it was happening in parallel). This is the most accurate way to represent this situation that we could think of so would love to hear your thoughts on how you plan to use this flamegraph and what the expected output would be in your opinion

bobrik · 2022-11-09T03:22:18Z

did you see the note at the bottom?

Yes, I quoted from it. I don't agree with the "that's how much wall time was spent processing a request" framing.

While the full request only took 162ms, the flamegraph shows 268ms because there was 268ms of "work" being done (just some of it was happening in parallel).

Tracing is generally used to gauge latency (wall time), not work (which is usually a proxy for cpu time) as flamegraphs.

I don't know if there's a good way to translate concurrent spans into flamegraphs. It might be acceptable to represent them as you do with your change visually, but show the actual timing values on the slabs. As a concrete example for my trace:

[ logic-proxy: upstream (70.1%, 133.91ms)                                                         ]
[ logic-proxy: upstream_ttfb (60.39%, 112.75ms) ][ customer-code: http_request (60.39%, 112.69ms) ]

Here we scale width of upstream_ttfb and http_request to make them fit into upstream, but keep them proportional.

If you can add some sort of warning to upstream span showing that it has children adding up to more than their parent (or maybe even just parallel running children), that would be even better to avoid confusion.

Let me know if this makes sense.

Rperry2174 · 2022-11-11T15:43:15Z

hey @bobrik yes this makes sense and might be a good compromise we'll look into seeing what changes we'd need to make in order to allow the flamegraph component to make them "overwrite" the true values, but this may take a little while to dig into

cescoferraro · 2022-11-16T03:26:22Z

@petethepig how can I try this PR on my project?
@Rperry2174 we at Tracetest really need this

cescoferraro · 2022-11-16T13:27:11Z

I published this PR as cescoferraro-flamegraph

eh-am · 2022-11-16T15:13:26Z

Hey @cescoferraro, even though you already published a PR, we have a CI step that publishes the flamegraph (https://github.com/pyroscope-io/pyroscope/actions/runs/3397437411/jobs/5649595841). You can point to @pyroscope/flamegraph@0.25.2-1679-ab8cd3f.0

CLAassistant · 2023-03-15T13:13:39Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

chore: fixes the logic in convertJaegerTraceToProfile function

7d6b2b2

petethepig mentioned this pull request Nov 6, 2022

Incorrect output of convertJaegerTraceToProfile #1671

Open

petethepig self-assigned this Nov 7, 2022

cescoferraro mentioned this pull request Nov 15, 2022

Add Flame Graph as a Trace view kubeshop/tracetest#1484

Merged

4 tasks

jorgeepc mentioned this pull request Nov 25, 2022

[trace visualization] Implement flame graph kubeshop/tracetest#1417

Closed

petethepig closed this Jul 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: fixes the logic in convertJaegerTraceToProfile function #1679

chore: fixes the logic in convertJaegerTraceToProfile function #1679

petethepig commented Nov 4, 2022 •

edited by Rperry2174

Loading

github-actions bot commented Nov 4, 2022

codecov bot commented Nov 4, 2022 •

edited

Loading

bobrik commented Nov 8, 2022

Rperry2174 commented Nov 8, 2022 •

edited

Loading

bobrik commented Nov 9, 2022

Rperry2174 commented Nov 11, 2022

cescoferraro commented Nov 16, 2022 •

edited

Loading

cescoferraro commented Nov 16, 2022 •

edited

Loading

eh-am commented Nov 16, 2022

CLAassistant commented Mar 15, 2023

chore: fixes the logic in convertJaegerTraceToProfile function #1679

chore: fixes the logic in convertJaegerTraceToProfile function #1679

Conversation

petethepig commented Nov 4, 2022 • edited by Rperry2174 Loading

Trace

Flamegraph Before

Flamegraph After

Notes

github-actions bot commented Nov 4, 2022

size-limit report 📦

codecov bot commented Nov 4, 2022 • edited Loading

Codecov Report

bobrik commented Nov 8, 2022

Rperry2174 commented Nov 8, 2022 • edited Loading

bobrik commented Nov 9, 2022

Rperry2174 commented Nov 11, 2022

cescoferraro commented Nov 16, 2022 • edited Loading

cescoferraro commented Nov 16, 2022 • edited Loading

eh-am commented Nov 16, 2022

CLAassistant commented Mar 15, 2023

petethepig commented Nov 4, 2022 •

edited by Rperry2174

Loading

codecov bot commented Nov 4, 2022 •

edited

Loading

Rperry2174 commented Nov 8, 2022 •

edited

Loading

cescoferraro commented Nov 16, 2022 •

edited

Loading

cescoferraro commented Nov 16, 2022 •

edited

Loading