Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom controls for split menu #472

Closed
adrianmroz opened this issue Aug 19, 2019 · 14 comments
Closed

Custom controls for split menu #472

adrianmroz opened this issue Aug 19, 2019 · 14 comments

Comments

@adrianmroz
Copy link
Collaborator

adrianmroz commented Aug 19, 2019

We need customization of controls inside Split menu.

Few examples:

  1. For time splits on line/bar chart we shouldn't allow picking sort (always should be ascending on dimension) and limit - should be set to None.
  2. Sometimes we want custom available limits. Per split or per datacube. Does it depend on visualisation?

Food for thought: Maybe we should limit "visualisation resolution" behaviour? Right now, sometimes turnilo changes a lot under user and it happens on every change. If we could pick good defaults at the start and limit user options, most of it could be deleted. Would be good to have behaviour for correcting wrong decisions (removing continuous split on line chart or adding split on totals). Core UX of filter/split/measure should stay flexible. Menus inside could be customised.

@adrianmroz
Copy link
Collaborator Author

Visualisation Resolution mechanism conflates few responsibilities.

  1. Way of marking current definition of visualisation unfit, or less fit than others. For example, when adding additional split on Totals, we should mark Totals as unfit and switch to Table. That could and should be run on every essence change. Most of the time it will do nothing.

  2. Adjusting parameters of data visualisation. For example, correcting sorts on line chart or correcting limits with nested splits. This never should be needed. We shouldn't allow user to change to incorrect params. And if user changes visualisation, we should provide good defaults that are visualisation specific. For example after changing to line chart, we should fix sort on continuous dimension.

@alexbusu
Copy link
Contributor

alexbusu commented Nov 7, 2019

@adrianmroz So, let's have customizable limits for split dimensions. 😄
Currently this value is hardcoded.
About the incorrect params that the user might change to: the developer is the one to set the available limits, and it's his responsibility to set the right values; the user will be shown the available options for limits.
image

@alexbusu
Copy link
Contributor

alexbusu commented Nov 7, 2019

An (extended) idea is to configure limits for each level of splitting. E.g. for 1st level one can allow [ 50, 100, 200 ] while for the 2nd and 3rd just [ 5, 10, 15 ].

@adrianmroz
Copy link
Collaborator Author

So, let's have customizable limits for split dimensions. 😄
Currently this value is hardcoded

Yup, that's main driver for this issue :D

About the incorrect params that the user might change to: the developer is the one to set the available limits

That's a little bit harder - there are still some dynamic cases where user can select something that it is immediately changed back. We need something better now. And it depends on visualisation and current state. Hence this issue.

An (extended) idea is to configure limits for each level of splitting. E.g. for 1st level one can allow [ 50, 100, 200 ] while for the 2nd and 3rd just [ 5, 10, 15 ].

I see only one reason to show different limits on consecutive splits - count of underlying druid queries. We have that configured so we could "estimate" options. If you select 10 on first, feel free to select 10 on next one. But if you pick 1000 on first, we need to limit second one probably to 5.

@alexbusu
Copy link
Contributor

alexbusu commented Nov 7, 2019

If you select 10 on first, feel free to select 10 on next one. But if you pick 1000 on first, we need to limit second one probably to 5.

I'm not sure if Druid cares that much about the output limitation (well, a little bit in case of TopN queries, yes, which is the case in data split; but I saw value of 5000 for threshold, hardcoded somewhere). The computation is made on the whole time range and given filters anyway.

@adrianmroz
Copy link
Collaborator Author

Oh boy, it is very big issue for plywood-druid integration and how plywood generates groupbys for nested splits. We have setting specially for this: #205

@alexbusu
Copy link
Contributor

Oh boy, it is very big issue for plywood-druid integration and how plywood generates groupbys for nested splits. We have setting specially for this: #205

Can this be left for the developer to decide how much he would like to load the druid cluster?

@adrianmroz
Copy link
Collaborator Author

What you mean to the developer?

Because of plywood internals, number of druid queries depends on what user selects in UI. If user selects state that yields more than max queries (and it is easy to do so) plywood would return incomplete data and turnilo can't know that. That would result in wrong data presented and that defeat purpose of such tool.

So it is not a decision about load but about correctness.

And this should be decided by administrator of data source in config. I assume that's what you mean by developer?

@alexbusu
Copy link
Contributor

An (extended) idea is to configure limits for each level of splitting. E.g. for 1st level one can allow [ 50, 100, 200 ] while for the 2nd and 3rd just [ 5, 10, 15 ].

I'm talking about the existing functionality - one can now select first [5, 10, 25, 50, 100] items on each Split. A "bad" scenario would be 3 Split dimensions with last 100 (max) values each.
I was suggesting to make these values configurable on a nesting level basis. For example on 1st level one (dev) could set [10, 50, 100, 200, 500] while on the 2nd+ levels he could set the allowed values to [5, 10, 15]. Then, depending on how is this tool (Turnilo) used, in case Druid can't hold so many queries, these values can be adjusted accordingly to fit the business needs.

About the correctness, can't tell how one could guarantee that, as I don't know how plywood works, I suppose it should throw when no data can be retrieved. Anyway, how can one be sure that in 1st case I presented (3 split dimensions x 100 rows each) the result would be accurate? And how is this different than the 2nd scenario (1st dimension <=500 rows, followed by N dim. <= 15 rows).

(I hope you're not trying to tell me that if I select 500 rows in first split, then 500 queries will hit Druid for the second split dimension).

What do you think?

@adrianmroz
Copy link
Collaborator Author

I'm talking about the existing functionality - one can now select first [5, 10, 25, 50, 100] items on each Split. A "bad" scenario would be 3 Split dimensions with last 100 (max) values each.
I was suggesting to make these values configurable on a nesting level basis. For example on 1st level one (dev) could set [10, 50, 100, 200, 500] while on the 2nd+ levels he could set the allowed values to [5, 10, 15].

We would like to have configuration per dimension, not split (like custom granularity). So use age have custom limits and page path has different limit options. Then we could have some mechanism that adjusts possible values depending on other splits limit setting. And we’re talking about limit values around 1000.

So let’s say dimension page path has limits [10, 50, 100, 500, 1000] and dimension country has [10, 50, 100, 200, 500]. If you select one split you’re able to select highest values. But if you select two- you can’t select 1000 on first split and 500 on second.

About the correctness, can't tell how one could guarantee that, as I don't know how plywood works, I suppose it should throw when no data can be retrieved.

Problem is that we need to guess when data didn’t arrive. We need to calculate number of queries and compare with maxQueries configuration value. And I think we should not let u user select incorrect values instead of throwing errors. Dynamic select boxes.

Anyway, how can one be sure that in 1st case I presented (3 split dimensions x 100 rows each) the result would be accurate?

We can estimate that’s 1 + 100 + 100^2 queries.

And how is this different than the 2nd scenario (1st dimension <=500 rows, followed by N dim. <= 15 rows).

It is 1 + 500 + 500 * 15 queries.

(I hope you're not trying to tell me that if I select 500 rows in first split, then 500 queries will hit Druid for the second split dimension).

If you select limit 500 on first split (ordinal dimension) and second split limit greater than one - plywood would send 501 queries.

We didn’t consider your idea about configuring limit per split. We run that with our users and maybe it could solve some issues!

@alexbusu
Copy link
Contributor

If you select limit 500 on first split (ordinal dimension) and second split limit greater than one - plywood would send 501 queries.

Yes, it makes sense actually, because of the limit for the 2nd+ split dimension.

I see only one reason to show different limits on consecutive splits - count of underlying druid queries. We have that configured so we could "estimate" options. If you select 10 on first, feel free to select 10 on next one. But if you pick 1000 on first, we need to limit second one probably to 5.

Probably this would be the way to go.

Also, if #539 gets implemented, then these limits, per dimension, would have no effect, since that query would be a groupBy and the limit is for the whole result set (single query). That "data-grid" feature would also satisfy our needs, since we want more results (~200 rows) for the first split only, no further splits, and with a single groupBy query one can let the user input it's own limit value :)

@adrianmroz
Copy link
Collaborator Author

I wouldn’t hope that we enforce group-by for #539.

@oshermaf
Copy link

Bringing this issue back: I'm curios if any work has been done on it since the end of 2019? I'm mainly talking about customizable limits for split dimensions (or keeping the limits values hard-coded, but add higher possible values)

This was referenced May 14, 2021
@adrianmroz-allegro
Copy link
Contributor

Closed, work will be done in #755 and #756

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants