Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Add support for table parameters #108169

Merged
merged 5 commits into from
May 2, 2024

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented May 1, 2024

This adds support for specifying tables in the request for use with the forthcoming LOOKUP command. Example:

curl -XPOST 'localhost:9200/_query?error_trace&pretty' -d'{
    "query": "FROM foo | LOOKUP services ON hostname | STATS MAX(bytes_out) BY service",
    "tables": {
        "services": {
            "hostname:keyword": [ "a1",   "a2",   "a3",  "b1",  "b2"],
            "service:keyword": ["auth", "auth", "auth", "web", "web"]
        }
    },
    "version": "2024.04.01"
}'

The format of these is very experimental. We're not sure at all about the API so I just picked the simplest thing we could merge and iterate on. I went with name:type for the keys because I think the types will be important when merging, especially given that we're merging with hashes.

At the moment you can't use this parameter unless you are on a snapshot version. And there's a limit of 1mb for the data in these tables because we, at the moment, can't account for them with our regular BlockFactory. Once we can account for them we can raise this limit.

This adds support for specifying tables in the request for use with the
forthcoming `LOOKUP` command. Example:
```
curl -XPOST 'localhost:9200/_query?error_trace&pretty' -d'{
    "query": "FROM foo | LOOKUP services ON hostname | STATS MAX(bytes_out) BY service",
    "tables": {
        "t1": {
            "hostname:keyword": [ "a1",   "a2",   "a3",  "b1",  "b2"],
            "service:keyword": ["auth", "auth", "auth", "web", "web"]
        }
    },
    "version": "2024.04.01"
}'
```

The format of these is very experimental. We're not sure at all about
the API so I just picked the simplest thing we could merge and iterate
on. I went with `name:type` for the keys because I think the types will
be important when merging, especially given that we're merging with
hashes.

At the moment you can't use this parameter unless you are on a snapshot
version. And there's a limit of 1mb for the data in these tables because
we, at the moment, can't account for them with our regular `BlockFactory`.
Once we can account for them we can raise this limit.
@nik9000 nik9000 requested review from costin and dnhatn May 1, 2024 21:02
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 1, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@nik9000
Copy link
Member Author

nik9000 commented May 1, 2024

I've marked this as ES|QL-ui impacting because kibana will likely want to specify these tables. It'll probably want to support user uploaded CSVs and all kinds of other fun things. And this is one way to land the CSV files.

We're going to merge this one without finalizing the syntax just so we can have something to experiment with. So if you see this merged don't think of any decisions about syntax as having been made - just that we're moving towards LOOKUP and INLINESTATS.

nik9000 added 2 commits May 1, 2024 17:38
They were causing trouble. We can add them when we need them.
* Support for {@code LOOKUP} command
*/
public static final NodeFeature LOOKUP_COMMAND = new NodeFeature("esql.lookup_command");

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cherry-picked this one in from the Lookup PR but didn't plug it in.

@@ -346,3 +346,27 @@ setup:
- match: {values.0: ["1",2.0,null,true,123,123]}
- match: {values.1: ["1",2.0,null,true,123,123]}
- match: {values.2: ["1",2.0,null,true,123,123]}

---
lookup:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is in the lookup PR and seems worth having. It doesn't run at the member, but it illustrates how you'd use this.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Nik!

private final XContentParser p;
private int length;

ParseTables(EsqlQueryRequest request, XContentParser p) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a XContent parsing test for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

"[" + RequestXContent.PRAGMA_FIELD + "] only allowed in snapshot builds",
validationException
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should reject requests with tables in non-snapshot build for now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OH! I totally had that in there at some point. I'll add it back.

@nik9000 nik9000 added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label May 2, 2024
@elasticsearchmachine elasticsearchmachine merged commit 09256ef into elastic:main May 2, 2024
15 checks passed
@nik9000 nik9000 deleted the inline_blocks branch May 2, 2024 13:22
elasticsearchmachine pushed a commit that referenced this pull request Jun 7, 2024
This adds support for `LOOKUP`, a command that implements a sort of
inline `ENRICH`, using data that is passed in the request:

```
$ curl -uelastic:password -HContent-Type:application/json -XPOST \
    'localhost:9200/_query?error_trace&pretty&format=txt' \
-d'{
    "query": "ROW a=1::LONG | LOOKUP t ON a",
    "tables": {
        "t": {
            "a:long":     [    1,     4,     2],
            "v1:integer": [   10,    11,    12],
            "v2:keyword": ["cat", "dog", "wow"]
        }
    },
    "version": "2024.04.01"
}'
      v1       |      v2       |       a       
---------------+---------------+---------------
10             |cat            |1
```

This required these PRs: * #107624 * #107634 * #107701 * #107762 *
#107923 * #107894 * #107982 * #108012 * #108020 * #108169 * #108191 *
#108334 * #108482 * #108696 * #109040 * #109045

Closes #107306
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) ES|QL-ui Impacts ES|QL UI >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants