Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different begin segment display between AWS xray and plugin #237

Open
bendanye opened this issue Jul 3, 2024 · 13 comments
Open

Different begin segment display between AWS xray and plugin #237

bendanye opened this issue Jul 3, 2024 · 13 comments
Assignees
Labels
datasource/X-Ray type/bug Something isn't working

Comments

@bendanye
Copy link

bendanye commented Jul 3, 2024

What happened:
There are differences in the display between AWS xray and plugin.

  • In AWS Xray is showing A (nginx) as the beginning of the trace
  • in the plugin is showing B (company) as the beginning of the trace (or at least is at the top of the trace)

What you expected to happen:
It should be consistent in displaying regardless from xray or plugin itself

Screenshots
xray

plugin

Environment:

  • Grafana version: 11.0.0
  • Plugin version: 2.8.3
  • OS Grafana is installed on: EKS cluster using helm chart
@bendanye bendanye added datasource/X-Ray type/bug Something isn't working labels Jul 3, 2024
@idastambuk
Copy link
Contributor

Hi @bendanye, so far I'm unfortunately unable to reproduce this.
Can you tell us if the Node Graph visualization also shows a different segment than ngnix?
Also, is the company segment shown anywhere in the graph or segments timeline in X-ray console, or is it just there instead of the nginx segment?

@bendanye
Copy link
Author

bendanye commented Jul 5, 2024

Hi @idastambuk
The company segment is shown somewhere. From the screenshot you can see company is shown after nginx in the plugin but while xray is showing nginx first
plugin segments
xray segments

For the Node Graph visualization in both xray and the plugin is showing the same (i have attached the screenshots)
plugin graph
xray graph

@idastambuk
Copy link
Contributor

Hi @bendanye thanks for the screenshots! Looking at them, it seems like there are multiple top-level segments (including __ngninx and _company) and the plugin is just displaying them in a different order (alphabetical?) than the AWS console, is this indeed the issue?

@bendanye
Copy link
Author

bendanye commented Jul 6, 2024

Hi @idastambuk i think it will be very misleading if did not expand the node graph, it might seem the first request is to company instead of nginx. Also we will associate the top of the segment as the first request.

@njvrzm
Copy link
Contributor

njvrzm commented Jul 8, 2024

We're still trying to reproduce this - meanwhile, could you provide a bit more detailed information? Specifically, in the trace list where you're seeing the spans out of the expected order, could you check the "Start Time" shown in the span details and let us know if the Company span that's above has a later start time than the nginx span that's below? From your screenshots it looks like its start time is earlier, which could explain why it's being sorted the way it is.

It would also be helpful to know if the parent span is set correctly - you can open the query inspector, go the the Data column, and find the spans that look out of order. Is the Nginx span an ancestor of the Company span?

@bendanye
Copy link
Author

bendanye commented Jul 9, 2024

could you check the "Start Time" shown in the span details and let us know if the Company span that's above has a later start time than the nginx span that's below

The start time for company is 3.9 while the nginx is 3.8

It would also be helpful to know if the parent span is set correctly - you can open the query inspector, go the the Data column, and find the spans that look out of order. Is the Nginx span an ancestor of the Company span?

This is what i seeing:

query inspector

@njvrzm
Copy link
Contributor

njvrzm commented Jul 9, 2024

Thanks very much for the details, @bendanye, they're very helpful. I think we have an idea for a fix - I'll discuss with the team later and we'll see about getting this into our backlog.

@idastambuk
Copy link
Contributor

FYI the suggested solution here is to sort top level segments (no parent span) by their startTime, ascending

@idastambuk idastambuk self-assigned this Aug 15, 2024
@idastambuk
Copy link
Contributor

Hi @bendanye Im taking another look at this and I'm having trouble reproducing the wrong sorting. Our Trace View visualization will sort the top level spans according to their startTime, if the start time is in milliseconds, no matter which order they come in from x-ray.

Additionally, this screenshot seems to show that the spans are sorted correctly according to their start time, since the span timeline (the colored lines) seems to go from left to right. I'm wondering if then this is a problem with the response having different data or x-ray console using some other parameters to sort.
Image

While looking at the screenshots, it seems like there is data on two separate traces that doesn't match - would it be possible to get this info on one single trace that is still giving you issues:

  1. A screenshot on the top level segments timeline in Grafana's trace view, compared to the segments timeline x-ray console in AWS.
  2. The data (table) view in Grafana, but only traces without a parentSpanId. You can accomplish this by sorting the table by parentSpanId, which will put the empty cells first.

We really appreciate your help with reproducing this!

@bendanye
Copy link
Author

Hi @idastambuk

Sure here the recent result

query_inspector service operation

@idastambuk
Copy link
Contributor

Hi @bendanye thanks for the screenshots. Looking at the data, it does seem like our plugin is displaying correctly the data we're getting from x-ray - start_time for lookup root span is before the start time for nginx root span:

Image

It IS strange that the start times for both root spans start AFTER their children spans. Additionally, ngnix nested spans clearly start before lookup spans. I'm not sure if there's anything we could do here, since it seems like the data from x-ray seems to be inconsistent. This could be a problem with instrumentation where the root spans are recorded incorrectly - are you able to double check this?

@bendanye
Copy link
Author

bendanye commented Sep 2, 2024

Hi @idastambuk, i have checked again and if i expand both nginx and lookup and the start time shows nginx is earlier than lookup.

Image
Image

@idastambuk
Copy link
Contributor

Hi @bendanye the start times you show are for children span. As mentioned, the timing is definitely unusual for them, but the root span sorting comes from startTimes for root spans, and that seems to be data coming into the plugin, and not calculated IN the plugin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasource/X-Ray type/bug Something isn't working
Projects
Status: Waiting
Development

No branches or pull requests

3 participants