-
Notifications
You must be signed in to change notification settings - Fork 8.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identify partial buckets #2803
Comments
Also clearing the milestone here since this seems important and doesn't seem super hard to implement hopefully, maybe we can get it in earlier. Would like to see input from @stormpython and @jthomassie here |
Removing clipping seems like an easy enough fix. We would simply default to what we do normally. And we are given access to the time range the user specifies, so we could shade the range of times on both ends that you see in the images above. I do not think it would be very difficult to implement. However, we will have to see if there are any adverse side effects to remove clipping since we've set a lot of things up to deal with it in the code base. In closing: easy to implement, unsure of the unintended side effects. But hopefully, there are only a few if any at all. |
IIRC one of the reasons for clipping was range selection, that is if a user brushed all the way to the end of the chart they would actually end up expanding their selection in one direction rather than narrowing it overall as intended, as the lack of clipping causes the axis to extend beyond the bounds of the search. We'll need some way to halt the brush to avoid that Adding the grey bars or some other visual identifier to the end of the range would accomplish that |
I get that grouped bars are not showing all groups, but why are we so concerned about rendering the whole partial buckets when they often don't even show valid data? I think we should really focus on a solution that makes those buckets useful instead of just visible. |
Shrug, I'm not totally against clipping, this solution might make implementation a bit simpler, but maybe not. Still open to ideas here. |
Fix texts for elastic#2803
Any other solution to this? I don't want my data to disappear. |
There has been some discussion surrounding the partial buckets created using a date_histogram and a the time filter. As a stopgap solution, we have decided to removing the current "clipping" behavior as it negatively impacts grouped style bar charts. This ticket is for additional discussion regarding how we will more completely fix the underlying issue.
the problem
When using a date_histogram, elasticsearch creates time buckets rounded based on the interval. If our timefilter was set to 8:30am - 10:30am, and our interval was set to "hourly", elasticsearch would bucket on the whole hour and return buckets for 8am, 9am, 10am, and 11am. We can already detect an issue as we requested 2 hours worth of data that was bucketed hourly, but we ended up with 4 buckets.
Furthermore, the documents used to fill these buckets with metrics would still be limited by the time filter, so we would end up with two whole buckets (9 & 10am) and two partial buckets (8 & 11am). If we had chosen to calculate a sum aggregation on these buckets, the values in the partial buckets are pretty much useless.
possible solutions
The first option below seems to be a popular one, though after detailing how it would work I am of the opinion it only points out the issue and doesn't actually solve it. Options 2-4 are doubt the more difficult option, and each of the possible implementations has it's own drawbacks. It does however prevent us from having to implement disclaimers or warnings as the data should be what we assume users intend to request.
one
We could identify the partial buckets using visual indicators (which might be hidden on mouseover) and warn the user that the charted values might not be an accurate view of the data.
This approach has the benefit of showing the user exactly what their input produced in elasticsearch. Unfortunately that means that it will still be invalid is some cases, and the update also draws more attention to the issue in order to tell you to ignore it.
two
calculate the buckets we would prefer manually, and specify them via a filter aggregation. This puts the complexity in Kibana, and it isn't really clear what the edge cases are. It also forces us to decide to either limit the final bucket based on the time range, or extend it to the end of the final interval. example
three
extending the date_histogram aggregation to support custom rounding rules/target bucket counts/etc. While this is definitely worth asking for, it probably won't be a simple thing to get in.
four
extend the time range to the closest interval boundary – regardless of the time filter requested. This could have undesired effects for large intervals paired with small timeframes. It would also require that we either clip the bars or show dates that are not supposed to be in the data.
The text was updated successfully, but these errors were encountered: