Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify partial buckets #2803

Closed
spalger opened this issue Jan 29, 2015 · 7 comments
Closed

Identify partial buckets #2803

spalger opened this issue Jan 29, 2015 · 7 comments
Labels

Comments

@spalger
Copy link
Contributor

spalger commented Jan 29, 2015

There has been some discussion surrounding the partial buckets created using a date_histogram and a the time filter. As a stopgap solution, we have decided to removing the current "clipping" behavior as it negatively impacts grouped style bar charts. This ticket is for additional discussion regarding how we will more completely fix the underlying issue.

the problem

When using a date_histogram, elasticsearch creates time buckets rounded based on the interval. If our timefilter was set to 8:30am - 10:30am, and our interval was set to "hourly", elasticsearch would bucket on the whole hour and return buckets for 8am, 9am, 10am, and 11am. We can already detect an issue as we requested 2 hours worth of data that was bucketed hourly, but we ended up with 4 buckets.

Furthermore, the documents used to fill these buckets with metrics would still be limited by the time filter, so we would end up with two whole buckets (9 & 10am) and two partial buckets (8 & 11am). If we had chosen to calculate a sum aggregation on these buckets, the values in the partial buckets are pretty much useless.


possible solutions

The first option below seems to be a popular one, though after detailing how it would work I am of the opinion it only points out the issue and doesn't actually solve it. Options 2-4 are doubt the more difficult option, and each of the possible implementations has it's own drawbacks. It does however prevent us from having to implement disclaimers or warnings as the data should be what we assume users intend to request.

one

We could identify the partial buckets using visual indicators (which might be hidden on mouseover) and warn the user that the charted values might not be an accurate view of the data.
untitled-1

This approach has the benefit of showing the user exactly what their input produced in elasticsearch. Unfortunately that means that it will still be invalid is some cases, and the update also draws more attention to the issue in order to tell you to ignore it.

two

calculate the buckets we would prefer manually, and specify them via a filter aggregation. This puts the complexity in Kibana, and it isn't really clear what the edge cases are. It also forces us to decide to either limit the final bucket based on the time range, or extend it to the end of the final interval. example

three

extending the date_histogram aggregation to support custom rounding rules/target bucket counts/etc. While this is definitely worth asking for, it probably won't be a simple thing to get in.

four

extend the time range to the closest interval boundary – regardless of the time filter requested. This could have undesired effects for large intervals paired with small timeframes. It would also require that we either clip the bars or show dates that are not supposed to be in the data.

@spalger spalger added this to the 4.1.0 milestone Jan 29, 2015
@rashidkpc
Copy link
Contributor

Here's a mockup that doesn't include clipping and might more accurately show what we think might work:

screen shot 2015-01-29 at 7 26 29 am

@rashidkpc rashidkpc removed this from the 4.1.0 milestone Jan 29, 2015
@rashidkpc
Copy link
Contributor

Also clearing the milestone here since this seems important and doesn't seem super hard to implement hopefully, maybe we can get it in earlier. Would like to see input from @stormpython and @jthomassie here

@stormpython
Copy link
Contributor

Removing clipping seems like an easy enough fix. We would simply default to what we do normally. And we are given access to the time range the user specifies, so we could shade the range of times on both ends that you see in the images above. I do not think it would be very difficult to implement.

However, we will have to see if there are any adverse side effects to remove clipping since we've set a lot of things up to deal with it in the code base.

In closing: easy to implement, unsure of the unintended side effects. But hopefully, there are only a few if any at all.

@rashidkpc
Copy link
Contributor

IIRC one of the reasons for clipping was range selection, that is if a user brushed all the way to the end of the chart they would actually end up expanding their selection in one direction rather than narrowing it overall as intended, as the lack of clipping causes the axis to extend beyond the bounds of the search. We'll need some way to halt the brush to avoid that

Adding the grey bars or some other visual identifier to the end of the range would accomplish that

@spalger
Copy link
Contributor Author

spalger commented Jan 29, 2015

I get that grouped bars are not showing all groups, but why are we so concerned about rendering the whole partial buckets when they often don't even show valid data?

I think we should really focus on a solution that makes those buckets useful instead of just visible.

@rashidkpc
Copy link
Contributor

Shrug, I'm not totally against clipping, this solution might make implementation a bit simpler, but maybe not. Still open to ideas here.

stormpython added a commit to stormpython/kibana that referenced this issue Feb 3, 2015
@spalger spalger closed this as completed Mar 26, 2015
@francisca-lima
Copy link

Any other solution to this? I don't want my data to disappear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants