Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APM] Use duration instead of percent on breakdown chart #83665

Open
smith opened this issue Nov 18, 2020 · 8 comments
Open

[APM] Use duration instead of percent on breakdown chart #83665

smith opened this issue Nov 18, 2020 · 8 comments
Labels
apm:service-overview discuss Team:APM All issues that need APM UI Team support

Comments

@smith
Copy link
Contributor

smith commented Nov 18, 2020

For #81135 we want to show the total duration instead of the percent on the breakdown ("Average duration by span type") chart on the service overview and transactions overview pages.

The series for the area chart are using stackMode="percentage" currently. We want the same look, but to be based on the total duration instead of the percentage.

Follow-up for #81719.

@smith smith added Team:APM All issues that need APM UI Team support v7.11.0 apm:service-overview labels Nov 18, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/apm-ui (Team:apm)

@felixbarny
Copy link
Member

Back then we deliberately did not use the total duration.

#34716

As we already have a graph for the Transaction duration, this graph should not show the duration but rather the percentage. Especially for spikey response times, it makes it easier to get an overview of the relative contribution to response per span.type. By hovering over the breakdown graph and looking at the tooltips from the transaction duration graph, you can still see the total execution time at that particular time.

Did any of the preconditions or assumptions change?

@formgeist
Copy link
Contributor

Did any of the preconditions or assumptions change?

We've definitely had feedback that the percentage breakdown makes it harder to see compared to the duration-based breakdown. I can primarily speak from a workflow and UX perspective - I'd like to have @nehaduggal or @alex-fedotyev elaborate on the user feedback and intent.

The latency chart (previously transaction duration) has changed and will change significantly in the coming iterations to serve more as a reference for time-based events like alerts, anomalies, and deployments. That's the role of the chart in the new Overview, and in the Transaction redesign (still WIP) the "basic" metrics will be more condensed serving again as reference points from that initial overview analysis (or supporting other transaction types or narrowed down results by use of filters). The role of the "time spent" chart changes significantly to provide a more detailed look into the application and one that we'll enhance in the aforementioned Transactions redesign. If you want we can set up some time to go through this.

@alex-fedotyev
Copy link

@felixbarny - this is very interesting, as with customers we are observing the opposite, i.e. it is extremely hard to identify where the problem is while looking at span by type percentage chart.

Here is an example:
image
Looking at the screenshot above, it is impossible to tell whether any of those fluctuations in the chart correspond to an issue or are minor. Since the absolute impact is missing here, this makes this chart less useful without response time and mouse-sync across two charts.

Here is corresponding response time chart at the same timeframe:
image

The proposal to move to absolute scale vs percentage is here to both see what resource is impacting performance as well as when we are seeing the issue(s) happening on the same chart.
And this would be augmented on the list of top dependencies on the same service overview page.
image

Lastly I put together a sample Kibana dashboard imagining how it could look like in new design.
image

In the simple terms - in operations and SRE roles folks often look time series data, and ability to quickly match shapes across images becomes broken when we use percentage mixed up with absolute value charts.

Does this provide more context?

@felixbarny
Copy link
Member

It does, and it makes a lot of sense. I guess it depends on the use case and the situation. But what I still worry about a bit is that you won't be able to get a sense of where time is spent in the areas where there's no spike. That could be solved in multiple ways

  • by users just selecting a timeframe without a spike
  • by adding back the gauges for the average percentage of app etc in the current timeframe (some of that we actually now have in the dependencies table)
  • Letting users toggle between percent and duration

Just some thoughts :)

@sorenlouv
Copy link
Member

sorenlouv commented Jan 12, 2021

it is extremely hard to identify where the problem is while looking at span by type percentage chart.

If a spike in latency is caused by database outage isn't that easier to spot that the DB is acting up when breakdown chart uses relative units instead of absolutes?
I agree that the user cannot identify the spike in the first place purely from the breakdown chart, but isn't that what they have latency charts for?

Letting users toggle between percent and duration

If we can make a non-intrusive UI for this and it's trivial to implement we should consider doing this.

@formgeist
Copy link
Contributor

Letting users toggle between percent and duration

If we can make a non-intrusive UI for this and it's trivial to implement we should consider doing this.

I can also see some benefit in allowing the user to toggle between either charts, and I think we can do that quite easily by adding the options to the panel.

span-type-breakdown-toggle

The only thing that came to mind was that we might not want to do it on the Transactions pages, because the new layout and design describes that the "Average duration by span type" chart will become the primary time-series chart with annotations. Meaning the user changing this to the percent-chart will not see the annotations. Not sure if we're going to re-consider our prioritization of the charts on the Transactions views because of this, or just accept that if the user toggles to the %-chart, the time-series annotations are not shown.

image

@botelastic
Copy link

botelastic bot commented Nov 1, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@botelastic botelastic bot added the stale Used to mark issues that were closed for being stale label Nov 1, 2021
@stale stale bot removed the stale Used to mark issues that were closed for being stale label Jul 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apm:service-overview discuss Team:APM All issues that need APM UI Team support
Projects
None yet
Development

No branches or pull requests

6 participants