-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Add MATCHING_ROW and VALUE_AT #106152
Conversation
This adds two functions: `MATCHING_ROW` and `VALUE_AT`. `MATCHING_ROW` takes pairs of values and the second one must always be a constant and matches the variable value to offset in the constant value. It looks like: ``` FROM inventory | EVAL r=MATCHING_ROW(size, ["XS", "S, "M", "L", "XL"]) ``` That'd generate these hypothetical results ``` Cool-Shirt | 20.00 | XL | 4 Expensive-Shirt | 120.00 | XL | 4 Cool-Shirt | 20.00 | S | 1 ``` `VALUE_AT` takes an index and an array of values and returns the value at that offset. So: ``` FROM employees | EVAL languages_word = VALUE_AT(languages, ["zero", "one", "two", "three", "four"]) | SORT emp_no | LIMIT 4 | KEEP first_name, languages, languages_word ``` Would make: ``` Georgi | 2 | two Bezalel | 5 | null Parto | 4 | four Chirstian | 5 | null ``` You can combine them together: ``` FROM inventory | EVAL r=MATCHING_ROW(size, ["XS", "S, "M", "L", "XL"]) | EVAL avg_price=VALUE_AT(r, [null, 20.00, null, null, 70.00]) | DROP r | WHERE price > avg_price ``` Which would yield: ``` Expensive-Shirt | 120.00 | XL | 70.00 ``` If *that* looks familiar then you've been paying close attention! It's another join strategy, specifically one that makes sense when the data you are joining against is small. Which is precisely what should happen for the `INLINESTATS` command that we implemented in the grammar a long time ago but never implemented in the engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A guide for those looking - MATCHING_ROW
is presently only implemented for a few types, but it's designed to delegate to BlockHash
which already has code turn values into ordinals and resolving values to those ordinals. It's designed to solve the harder problem of turning streams of blocks into ordinals. I just needed to add an ability to lookup
values instead of add them.
VALUE_AT
abuses our syntax for multivalue fields to get them parsed as arrays. I think convert them to Block
s when building the executor factory. That makes it easy to copy values.
Now! Problems:
- I'm abusing the multivalue parsing syntax.
- I don't perform memory tracking on the blocks.
- I'd like to be able to push blocks in those parameters rather than
List
- that'd save a lot of memory. - Obviously, I've not implemented
INLINESTATS
, just the data-node side of it. - For this to work for
INLINESTATS
theMATCHING_ROW
function needs to match all values of all columns - soMATCHING_ROW([1, 2], [1, 2, 3])
will returning[0, 1]
- but this will get multiplicative when combining more than one field. How do we make sure not to make hugeBlock
s? - Do we want to expose these to people as functions or hide them as details of the
INLINESTATS
command? I could hide them behind a pragma for now so we don't have to make a choice. I do want to test them as individual functions kind of like I've done here.
I think this isn't true. I think, at least for now, we're better off doing our standard stuff and only supporting single-valued fields for |
Replaced by the hash lookup and column lookup operators I've recently added. |
This adds two functions:
MATCHING_ROW
andVALUE_AT
.MATCHING_ROW
takes pairs of values and the second one must always be a constant and matches the variable value to offset in the constant value. It looks like:That'd generate these hypothetical results
VALUE_AT
takes an index and an array of values and returns the value at that offset. So:Would make:
You can combine them together:
Which would yield:
If that looks familiar then you've been paying close attention! It's another join strategy, specifically one that makes sense when the data you are joining against is small. Which is precisely what should happen for the
INLINESTATS
command that we implemented in the grammar a long time ago but never implemented in the engine.