ESQL: Use faster field caps #105067

nik9000 · 2024-02-02T13:48:47Z

The field capabilities has an internal-only option to produce un-merged output. This expands that option to be available to any caller inside of Elasticsearch and uses it in ES|QL to speed up queries across many indices with many fields. Across 5,000 indices with a couple thousand fields each (metricbeat) the FROM * query went from 600ms to 60ms. Across 50,000 indices that went from 6600ms to 600ms. 600ms is still too slow for such a simple query, but one step at a time!

This is faster because field capabilities wants to present a field-centric result but ES|QL actually needs a different flavor of field-centric result with some differences smoothed away. If we take over the merging process we can use a few tools that the field caps API uses internally to be fast - mostly the sha256 of the mapping - to save on doing work that wasn't available in the other view. Also, two merges is more expensive than one.

That 90% reduction in runtime doesn't banish field caps from the flamegraphs. You still see it, but it's now much less prominent. And you don't see the merging process at all. Now it's all data-node side operations field caps.

Relates to #103369

The field capabilities has an internal-only option to produce un-merged output. This expands that option to be available to any caller inside of Elasticsearch and uses it in ES|QL to speed up queries across many indices with many fields. Across 5,000 indices with a couple thousand fields each (metricbeat) the `FROM *` query went from 600ms to 60ms. Across 50,000 indices that went from 6600ms to 600ms. 600ms is still too slow for such a simple query, but one step at a time! This is faster because field capabilities wants to present a field-centric result but ES|QL actually needs a different flavor of field-centric result with some differences smoothed away. If we take over the merging process we can use a few tools that the field caps API uses internally to be fast - mostly the sha256 of the mapping - to save on doing work that wasn't available in the other view. Also, two merges is more expensive than one. That 90% reduction in runtime doesn't banish field caps from the flamegraphs. You still see it, but it's now much less prominent. And you don't see the merging process at all. Now it's all data-node side operations field caps. Relates to elastic#103369

elasticsearchmachine · 2024-02-02T13:49:11Z

Hi @nik9000, I've created a changelog YAML for you.

nik9000

I imagine it'd be nice to port all of the resolution infrastructure to un-merged results but there is lot of it, and a lot of layers that I don't understand. So I did the simplest thing I could think of - I made my own version that i could be sure was fairly efficient and then asserted it had the same output as the original.

nik9000 · 2024-02-02T13:49:23Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/EsqlPlugin.java

+                ),
+                new EsqlIndexResolver(
+                    services.client(),
+                    EsqlDataTypeRegistry.INSTANCE


I'm only keep the original IndexResolver so I can compare the results.

nik9000 · 2024-02-02T13:52:28Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

-            );
+            esqlIndexResolver.resolveAsMergedMapping(table.index(), fieldNames, false, Map.of(), listener);
+
+//            indexResolver.resolveAsMergedMapping(table.index(), fieldNames, false, Map.of(), new ActionListener<>() {


I've been swapping the line above with these lines to make sure I output the same result.

nik9000 · 2024-02-02T14:28:17Z

I'm not quit mimicing EsField#isAlias and, looking around, I'm not sure what it's used for. As far as I can tell we serialize it in ESQL and don't use it. And in EQL it's passed in and thrown out.

… into faster_field_caps

Sometimes we produce output with a *ton* of `null`s and when we do a significant amount of time is spent on tracking the memory usage of each chunk of null colunms. Whole milliseconds in a request that takes dozens of milliseconds. We can avoid all of this by sharing all constant-null vectors produced for a single block. When running `FROM *` in ESQL on a data-less coordinating node this cuts reading reading pages from 15% of the time to 3.4% of the time, cutting the entire operation from 634ms to 599ms. Note that elastic#105067 is already open and should save about 540ms from the operation. It's unlikely that combining these two will yield a 59ms operation, but we live in hope. Relates to elastic#103369

astefan

I really like those numbers in performance improvements and it seems we are going in the right direction.

But, the most scary part about the IndexResolver removal is that it has some years of maintenance and fixes and some of the tests that are indirectly using it live in other projects. For example, one of the most important test suites I've created that deals with many edge cases is FieldExtractorTestCase in ES SQL.

If we are to use our own flavor of the IndexResolver (and I am in favor of this given the preliminary performance gains) we'd need to port some of the existent tests in SQL and QL projects here, in ESQL.
isAlias is being used to build the hierarchy of fields when the fieldcaps response is not telling us if the type of a field is actually an alias type. It is possible this was useful to ES SQL to extract data from _source when the fields API didn't exist but I am not 100% sure it is still needed now (FieldExtractorTestCase.testAliasFromDocValueField is a test that checks this type of hierarchy).

nik9000 · 2024-02-06T18:09:04Z

I talked to @astefan and the plan now is to try and port some tests from SQL that have more complex mappings to ESQL. And to run both resolution mechanisms if assertions are enabled - basically to uncomment that big testing blob I have. He'll do a little more research on his side so he can review this in more depth. I'll port the test.

nik9000 · 2024-02-09T18:45:56Z

For example, one of the most important test suites I've created that deals with many edge cases is FieldExtractorTestCase in ES SQL.

I've ported it!

nik9000 · 2024-02-09T18:47:00Z

@astefan, I've ported the tests from SQL and everything seems to be passing. Could you have another look?

Sometimes we produce output with a *ton* of `null`s and when we do a significant amount of time is spent on tracking the memory usage of each chunk of null colunms. Whole milliseconds in a request that takes dozens of milliseconds. We can avoid all of this by sharing all constant-null vectors produced for a single block. When running `FROM *` in ESQL on a data-less coordinating node this cuts reading reading pages from 15% of the time to 3.4% of the time, cutting the entire operation from 634ms to 599ms. Note that #105067 is already open and should save about 540ms from the operation. It's unlikely that combining these two will yield a 59ms operation, but we live in hope. Relates to #103369

elasticsearchmachine · 2024-02-09T20:12:23Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

astefan

I think the tests are fine in FieldExtractorTestCase 👍

I started digging and trying out different things. I looked at IndexResolverTests from sql and I think we should port that one as well (in that one we have a text for fix in #100869 for example). For one of those tests I tried out - testMergeIncompatibleCapabilities - I get a ql and esql didn't make the same resolution: types differ [INTEGER] != [UNSUPPORTED] message.
It is because doc_values: false, which means not aggregateable in one of the indices.

Going further and ignoring the awesome EsqlIndexResolver-IndexResolver checks you added in EsqlSession, for that specific mapping in testMergeIncompatibleCapabilities and query from test* | stats max(emp_no) I get Cannot use field [emp_no] due to ambiguities being mapped as aggregatable except in [test2] which is ok..., BUT we have no tests for these types of errors :-(.

So:

IndexResolverTests needs some of its tests ported to ESQL. I am not particularly fond of this class because for some tests it's loading the mappings "manually" (not actually relying on field_caps natural output) and then asserting behavior on them
we need some coverage for these mapping incompatibilities that reach the planning (the Analyzer in this case). I'll look into this today

astefan · 2024-02-12T15:56:08Z

...sql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/FieldExtractorTestCase.java

+
+    /**
+     * <pre>
+     * "int_field": {


Is not a big deal, more a consistency issue: this is called int_field, the test itself has integer_field. The subfield here is text while in the test is str.

I'll clean this one up. I always write int instead of integer when I work with that mapper. Not sure why. OTOH, I tend to use str when I mean text|keyword. Old habit. I'll double check it.

nik9000 · 2024-02-22T13:56:14Z

...ingle-node/src/javaRestTest/java/org/elasticsearch/xpack/esql/qa/single_node/RestEsqlIT.java

@@ -185,8 +185,7 @@ public void testIncompatibleMappingsErrors() throws IOException {
        assertException("from test_alias | where _size is not null | limit 1", "Unknown column [_size]");
        assertException(
            "from test_alias | where message.hash is not null | limit 1",
-            "Cannot use field [message.hash] due to ambiguities",
-            "incompatible types: [integer] in [index2], [murmur3] in [index1]"
+            "Cannot use field [message.hash] with unsupported type [murmur3]"


I'm not sure how this works, but I don't think it's a killer for the approach, but we should decide. This example makes it look like using two field types will give you the ambiguities error. But in other examples the QL index resolution stuff will give you the unsupported error. We could do either one on the ESQL side, but I picked doing the unsupported error because it's the first such error message I found. I'd be happy to switch it, but that'll break other tests which I'll have to coax back to grene.

These tests here are new (added with https://github.com/elastic/elasticsearch/pull/105544/files) because we didn't cover this in tests.

In any case, my personal opinion is that your approach is the better approach: tell the user early that there is field that's unsupported, rather than telling "these two fields have mixed incompatible types" and the user then trying to make them compatible (reindexing or whatever) and later to find out that the effort was in vain because the type they chose to make compatible is the wrong type.

I'm glad you like it!

nik9000 · 2024-02-22T15:28:42Z

I've talked with @andreabucci and I'm going to port VerifierErrorMessagesTests - at least, as much of it as applies.

nik9000 · 2024-02-22T15:48:21Z

I've talked with @andreabucci

sorry, wrong @. I just typed @and and hit enter. Sorry!

costin

LGTM

costin · 2024-02-22T17:40:26Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/execution/PlanExecutor.java

    private final PreAnalyzer preAnalyzer;
    private final FunctionRegistry functionRegistry;
    private final Mapper mapper;
    private final Metrics metrics;
    private final Verifier verifier;

-    public PlanExecutor(IndexResolver indexResolver) {
+    public PlanExecutor(IndexResolver indexResolver, EsqlIndexResolver esqlIndexResolver) {


Why keep IndexResolver around? To check the behavior is the same?

costin · 2024-02-22T17:56:26Z

...sql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/FieldExtractorTestCase.java

+        );
+        String err = EntityUtils.toString(e.getResponse().getEntity());
+        assertThat(err, containsString("Cannot use field [f] with unsupported type [ip_range]"));
+        assertThat(err, containsString("Cannot use field [f.raw] with unsupported type [ip_range]"));


Great points @astefan . Tl;dr is that ESQL already has and will have better handling of field extraction and thus conflict resolution than SQL and by having a different resolved, we should be able to evolve this better.
in SQL extraction is done through field caps hence why we expected a whole field hierarchy to match - it might not have been perfect but it got us pretty far.

In ESQL that assumption no longer holds - #100875 allowed subfields to allow the following mapping:

index 1 - field.name keyword
index 2 - field = keyword

In Index 1, field is object, as parent for field.name. This conflicts with field in index2 which is a keyword - using the SQL behavior we discarded field.name as unsupported. However there's no field.name in index2 and thus there's no mapping conflict - in other words, field is problematic but field.name is not.

I think we're in a good enough place with this and considering the performance savings I suggest we move forward with it - there's a chance there might be some subtle cases we haven't covered yet and by changing the behavior we're going to run into them but right now we have enough testing in place to be confident about them.
Furthermore I would backport this behavior in 8.13 to test it as much as possible. Potentially introduce an internal flag to have the fallback behavior for a while when running into problems.

nik9000 · 2024-02-22T19:03:54Z

Potentially introduce an internal flag to have the fallback behavior for a while when running into problems.

A flag conflicts with any behavior changes we do after this. At some point we'll want to diverge and then we'd have to be really careful with testing with the flag on and off. That's trick!

astefan

LGTM. I'm really happy with this PR. @nik9000 thank you for the patience and the performance improvements this change brings.

nik9000 · 2024-02-26T04:11:30Z

LGTM. I'm really happy with this PR. @nik9000 thank you for the patience and the performance improvements this change brings.

Thanks for your patience with me too!

elasticsearchmachine · 2024-02-27T12:19:02Z

💔 Backport failed

The backport operation could not be completed due to the following error:

There are no branches to backport to. Aborting.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 105067

nik9000 · 2024-02-28T08:36:11Z

The backport operation could not be completed due to the following error:

oh. Well. That makes sense. I'll do it myself.

nik9000 · 2024-03-04T13:55:25Z

Too late to backport I think.

nik9000 added >enhancement :Analytics/ES|QL AKA ESQL v8.13.0 labels Feb 2, 2024

nik9000 requested a review from astefan February 2, 2024 13:48

Update docs/changelog/105067.yaml

827e3d6

nik9000 commented Feb 2, 2024

View reviewed changes

nik9000 added 2 commits February 2, 2024 12:49

Precommit

1148dce

Merge remote-tracking branch 'refs/remotes/nik9000/faster_field_caps'…

7b34c67

… into faster_field_caps

This was referenced Feb 2, 2024

ESQL: Speed up reading many nulls #105088

Merged

ESQL: Improve performance for queries with small limit across large clusters #103369

Closed

astefan reviewed Feb 5, 2024

View reviewed changes

nik9000 added 5 commits February 6, 2024 13:09

Merge branch 'main' into faster_field_caps

a6c12a9

WIP

3dc6c45

WIP

a609833

Finish porting

f88319a

All

e8bd7be

nik9000 marked this pull request as ready for review February 9, 2024 20:12

Merge branch 'main' into faster_field_caps

0f147c5

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Feb 9, 2024

astefan reviewed Feb 13, 2024

View reviewed changes

nik9000 added 3 commits February 21, 2024 17:21

Merge branch 'main' into faster_field_caps

5cdd866

Format!

3bdca55

One more

ab58335

nik9000 commented Feb 22, 2024

View reviewed changes

Merge branch 'main' into faster_field_caps

96a13d5

costin approved these changes Feb 22, 2024

View reviewed changes

More tests

5d5b4be

nik9000 added 2 commits February 22, 2024 14:47

Merge branch 'main' into faster_field_caps

ae4cad8

Closer

5a066ba

astefan approved these changes Feb 23, 2024

View reviewed changes

Almost!

1dd1c8d

nik9000 added auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) auto-backport-and-merge labels Feb 26, 2024

nik9000 added 4 commits February 25, 2024 23:51

Merge branch 'main' into faster_field_caps

8457072

Merge branch 'main' into faster_field_caps

1719ba3

Ski olds

5fe7cc0

Oh come on

0e69134

elasticsearchmachine merged commit 73a170b into elastic:main Feb 27, 2024
14 checks passed

nik9000 deleted the faster_field_caps branch February 27, 2024 12:18

elasticsearchmachine added the backport pending label Feb 27, 2024

nik9000 removed backport pending auto-backport-and-merge labels Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESQL: Use faster field caps #105067

ESQL: Use faster field caps #105067

nik9000 commented Feb 2, 2024

elasticsearchmachine commented Feb 2, 2024

nik9000 left a comment

nik9000 Feb 2, 2024

nik9000 Feb 2, 2024

nik9000 commented Feb 2, 2024

astefan left a comment

nik9000 commented Feb 6, 2024

nik9000 commented Feb 9, 2024

nik9000 commented Feb 9, 2024

elasticsearchmachine commented Feb 9, 2024

astefan left a comment

astefan Feb 12, 2024

nik9000 Feb 13, 2024

nik9000 Feb 22, 2024

astefan Feb 22, 2024

nik9000 Feb 22, 2024

nik9000 commented Feb 22, 2024

nik9000 commented Feb 22, 2024

costin left a comment

costin Feb 22, 2024

nik9000 Feb 22, 2024

costin Feb 22, 2024

nik9000 commented Feb 22, 2024

astefan left a comment

nik9000 commented Feb 26, 2024

elasticsearchmachine commented Feb 27, 2024

nik9000 commented Feb 28, 2024

nik9000 commented Mar 4, 2024

ESQL: Use faster field caps #105067

ESQL: Use faster field caps #105067

Conversation

nik9000 commented Feb 2, 2024

elasticsearchmachine commented Feb 2, 2024

nik9000 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nik9000 commented Feb 2, 2024

astefan left a comment

Choose a reason for hiding this comment

nik9000 commented Feb 6, 2024

nik9000 commented Feb 9, 2024

nik9000 commented Feb 9, 2024

elasticsearchmachine commented Feb 9, 2024

astefan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nik9000 commented Feb 22, 2024

nik9000 commented Feb 22, 2024

costin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nik9000 commented Feb 22, 2024

astefan left a comment

Choose a reason for hiding this comment

nik9000 commented Feb 26, 2024

elasticsearchmachine commented Feb 27, 2024

💔 Backport failed

nik9000 commented Feb 28, 2024

nik9000 commented Mar 4, 2024