Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk Usage visualization always 0 #12435

Closed
Michi01 opened this issue Jun 5, 2019 · 15 comments · Fixed by #17272
Closed

Disk Usage visualization always 0 #12435

Michi01 opened this issue Jun 5, 2019 · 15 comments · Fixed by #17272
Assignees
Labels
bug Metricbeat Metricbeat module Team:Integrations Label for the Integrations team

Comments

@Michi01
Copy link

Michi01 commented Jun 5, 2019

@ruflin ruflin added Team:Integrations Label for the Integrations team Metricbeat Metricbeat module labels Jun 5, 2019
@exekias exekias added the bug label Jun 5, 2019
@gaby
Copy link

gaby commented Jun 5, 2019

This bug seems to come and go between versions. The main problem is Kibana lack of proper "Last Value".

Using the "Top Value" aggregation doesn't return the Top Value if there are documents with empty or null values. One way getting around this is by sorting by the same Field and then ordering by timestamp.

This same isue happens with CPU and Memory usage in the default Metricbeats Dashboard. I just delete them and use the InfraUI.

@kaiyan-sheng
Copy link
Contributor

While trying to reproduce this problem, I realized Disk Usage visualization is ordered by Doc Count instead of Average of system.filesystem.used.pct. Is that intended?

@exekias
Copy link
Contributor

exekias commented Jun 6, 2019

@kaiyan-sheng I don't think so, probably sorting by value makes more sense.

As per the report, pinging @simianhacker here, do you know if the proposed alternative to "Top value" could work better here?

@exekias
Copy link
Contributor

exekias commented Jun 27, 2019

This may be related to elastic/kibana#16124, where changing the interval to >=1m seems to work. Probably, the reason for that is that this metricset is only reporting every minute

@narph
Copy link
Contributor

narph commented Jul 5, 2019

@exekias, encountered this recently, it seems that TSVB will in all visualizations except the time series will only show the value of the last data and not the data of the selected time range. So, if no documents are found in the last bucket, value will be 0. Increasing the bucket interval in order to include at least one meaningful doc will fix the issue.

Recommendations (thanks @timroes ) are to change the visualization:
We can create a regular "Horizontal Bar" chart from the visualize editor with the same config:
A terms aggregation on system.filesystem.mount_point
As y-axis value an Average of system.filesystem.used.pct
We might want to configure the field system.filesystem.used.pct in the index pattern setting to show as percentage everywhere (which the chart would pick up).

@exekias
Copy link
Contributor

exekias commented Jul 5, 2019

Awesome, Thank you for digging into this! It sounds like increasing the bucket interval would be the smallest change, so visualization keeps its look and feel, WDYT? It seems clear now that this is caused by filesystem metricset reporting only every minute instead of 10s.

Do you want to take this? 😇

@narph
Copy link
Contributor

narph commented Jul 5, 2019

@exekias , I can take this.
Modifying the interval to 1min (instead of auto in the visualization) indeed will fix the issue if users stick with the default period configuration of 1min for filesystem.
I am afraid that if users will set it to 2 mins, or 5 or larger then they will encounter empty buckets again and the percentage will show 0 again.

@exekias
Copy link
Contributor

exekias commented Jul 5, 2019

That's true, on the other side, doing the other option would do the average on the whole time range, isn't it? That probably means that for larger ranges (> 15m) the value stops being meaningful, as you probably want the current value, instead of the average for the whole range.

It sounds like the >=1m interval solution would probably work for our defaults, and for people changing them they may need to adjust the visualization.

What we would really need to fix this once for all would be a way to retrieve the last value for the metric, is that correct?

@willemdh
Copy link

willemdh commented Jul 5, 2019

What we would really need to fix this once for all would be a way to retrieve the last value for the metric, is that correct?

That would help imho

@narph narph assigned narph and unassigned exekias Jul 5, 2019
@narph
Copy link
Contributor

narph commented Jul 8, 2019

@exekias , I am not sure what the initial goal was but if we are using the time series here and the current setup I assumed we are looking for an average value based on the latest bucket items.

A bit about TSVB:

In every visualization besides the time series we will only draw the "last bucket" so the value that would be shown very right in the chart of the time series.
There could be future plans to introduce the option of selecting the date range instead so users can see the whole time range and not just the last bucket.

From the feedback above, it seems that we do not want the whole time range but just the latest value, so the workaround to set to 1min will work only if the users have set interval for filesystem for 1 min.
If they will set it to <1min then we will encounter the issue above (0%) but if they set it >1min then there will be the chance of having 2/3/4.. docs in one bucket and from the setup we calculate an average on that.

If the goal is to display the latest value here then we will also have to change the aggregation type to "Top Hit" but we still encounter same issues with the interval.
The recommended path here is to still go for a classical visualization and use the "Top Hit" aggregation instead (count 1).
A solution should also be found for Disk Used visualization as well.
Quick ex. (we can work on labels, design etc)
CaptureCPU

@exekias
Copy link
Contributor

exekias commented Jul 8, 2019

Thank you for your research! that sounds reasonable to me 👍

I wonder what can we do then for Guage type visualizations (the one on the left of your screenshot)

@exekias
Copy link
Contributor

exekias commented Jul 16, 2019

Just saw elastic/kibana#3578, which may be helpful with Guage in the future

@kostasb
Copy link
Contributor

kostasb commented Oct 30, 2019

Are there any plans to resolve this issue? It has been troubling our users as it provides incorrect monitoring data in the Metricbeat dashboards where these visualizations are used.

Currently, possible workarounds are:

  • Change the visualization's default interval to "1m" instead of Auto, to match the default collection interval of the system module.
  • Change the system module's configuration for stats collection interval to 10s, which is the time bucket size requested for the default time picker duration of "last 15m". Modifying the collection interval may result in increased resource utilization on the monitored system.

@ivicamihalic
Copy link

I have fixed this by changing the field type from long to float on system.filesystem.used.pct field.

Steps:

image

@exekias
Copy link
Contributor

exekias commented Nov 20, 2019

I'm hitting elastic/kibana#49854 while trying to fix this one, as panel options doesn't seem to get restored when loading the dashboard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Metricbeat Metricbeat module Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging a pull request may close this issue.