Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Remove the NESTED DataType #111495

Merged
merged 11 commits into from
Aug 6, 2024
Merged

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Jul 31, 2024

We don't support nested fields at the moment and when we do I'm not sure it'll be a DataType. Maybe. Probably. But let's deal with that when we support it.

We don't support nested fields at the moment and when we do I'm not sure
it'll *be* a DataType. Maybe. Probably. But let's deal with that when we
support it.
@@ -82,7 +81,7 @@ private static void walkMapping(String name, Object value, Map<String, EsField>
// extract field type
DataType esDataType = getType(content);
final Map<String, EsField> properties;
if (esDataType == OBJECT || esDataType == NESTED) {
if (esDataType == OBJECT) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from Types in QL, but it's only used in testing in ESQL. I think this safe for now.

// allow compound object even if they are unknown (but not NESTED)
if (type != NESTED && fieldProperties.isEmpty() == false) {
// allow compound object even if they are unknown
if (fieldProperties.isEmpty() == false) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the most interesting line, I think. But we don't make NESTED any more so I feel like this is pretty safe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this would change things. It'd make the sub-fields of the NESTED object appear. I think. I'll double check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this is ok. Stuff like object with sub-fields still work 👍 ; I tested an empty index with this kind of mapping:

            "name": {
                "type": "object",
                "properties": {
                    "first_name": {
                        "type": "keyword"
                    },
                    "last_name": {
                        "type": "keyword"
                    }
                }
            }

and we get back only name.first_name and name.last_name. Which I think it's ok? Now I'm wondering if OBJECT has the same treatment as NESTED, since it disappears.

Also, nested doesn't seem to accept fields, but what it does (and I forgot about it) is include_in_parent but I don't think we ever supported this since _field_caps behaves identically with include_in_parent true or false.

@nik9000 nik9000 requested a review from astefan July 31, 2024 20:15
@nik9000 nik9000 marked this pull request as ready for review July 31, 2024 20:15
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 31, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@nik9000
Copy link
Member Author

nik9000 commented Aug 1, 2024

I talked with @astefan and I'm going to make a yaml test that uses nested just to make sure we're not doing something really unexpected. It should be fine, but let's be super careful here.

@nik9000
Copy link
Member Author

nik9000 commented Aug 1, 2024

Mechanically what happens for nested fields and all fields under them is that the IndexResolver hits the nested field and marks it as unsupported. Then it hits the children of the nested field and marks them as unsupported because the parent is unsupported.

@nik9000
Copy link
Member Author

nik9000 commented Aug 1, 2024

Mechanically what happens for nested fields and all fields under them is that the IndexResolver hits the nested field and marks it as unsupported. Then it hits the children of the nested field and marks them as unsupported because the parent is unsupported.

We don't want this. At least, it's a change and maybe not a good one. nested fields aren't the same as non-nested fields. Previously we were removing them entirely from ESQL. It just didn't see them. With this change as it stands now they'd start showing up as nested. I don't think we want that, at least, not now. I'm going to try and filter them out earlier.

@nik9000
Copy link
Member Author

nik9000 commented Aug 1, 2024

OK! I've moved the filtering of nested fields lower. That should be better....

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok to remove this data type and I hope we are careful enough with this step.
Maybe add two-three more to 40_unsupported_types.yml with a mapping that more paranoid:

            "name": {
                "type": "object",
                "properties": {
                    "first_name": {
                        "type": "keyword"
                    },
                    "last_name": {
                        "type": "keyword"
                    }
                }
            },
            "bla": {
                "type": "object",
                "properties": {
                    "obj": {
                        "type": "object",
                        "properties": {
                            "first_name": {
                                "type": "keyword"
                            },
                            "last_name": {
                                "type": "keyword"
                            }
                        }
                    }
                }
            },
            "nested": {
                "type": "nested",
                "properties": {
                    "sub_nested": {
                        "type": "nested",
                        "properties": {
                            "first_name": {
                                "type": "keyword"
                            }
                        }
                    },
                    "last_name": {
                        "type": "keyword"
                    }
                }
            }

// allow compound object even if they are unknown (but not NESTED)
if (type != NESTED && fieldProperties.isEmpty() == false) {
// allow compound object even if they are unknown
if (fieldProperties.isEmpty() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this is ok. Stuff like object with sub-fields still work 👍 ; I tested an empty index with this kind of mapping:

            "name": {
                "type": "object",
                "properties": {
                    "first_name": {
                        "type": "keyword"
                    },
                    "last_name": {
                        "type": "keyword"
                    }
                }
            }

and we get back only name.first_name and name.last_name. Which I think it's ok? Now I'm wondering if OBJECT has the same treatment as NESTED, since it disappears.

Also, nested doesn't seem to accept fields, but what it does (and I forgot about it) is include_in_parent but I don't think we ever supported this since _field_caps behaves identically with include_in_parent true or false.

@nik9000
Copy link
Member Author

nik9000 commented Aug 2, 2024

Maybe add two-three more to 40_unsupported_types.yml with a mapping that more paranoid:

👍

@nik9000 nik9000 added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Aug 5, 2024
@nik9000
Copy link
Member Author

nik9000 commented Aug 5, 2024

That snapshot failures are real - old versions sometimes send nested as the data type:

[2024-08-05T12:22:54,015][WARN ][o.e.x.e.a.EsqlResponseListener] [test-cluster-0] Request failed with status [INTERNAL_SERVER_ERROR]: java.io.IOException: Unknown DataType for type name: nested
	at org.elasticsearch.xpack.esql.core.type.DataType.readFrom(DataType.java:449)
	at org.elasticsearch.xpack.esql.core.type.EsField.<init>(EsField.java:57)
	at org.elasticsearch.common.io.stream.NamedWriteableAwareStreamInput.readNamedWriteable(NamedWriteableAwareStreamInput.java:56)
	at org.elasticsearch.common.io.stream.NamedWriteableAwareStreamInput.readNamedWriteable(NamedWriteableAwareStreamInput.java:32)
	at org.elasticsearch.xpack.esql.io.stream.PlanNamedTypes.lambda$readEsIndex$3(PlanNamedTypes.java:771)
	at org.elasticsearch.common.io.stream.StreamInput.readImmutableMap(StreamInput.java:
	at org.elasticsearch.xpack.esql.io.stream.PlanNamedTypes.readEsIndex(PlanNamedTypes.java:771)
	at org.elasticsearch.xpack.esql.io.stream.PlanNamedTypes.readEsRelation(PlanNamedTypes.java:545)

When we serialize EsRelation we serialize the original EsIndex in addition to the `

@nik9000
Copy link
Member Author

nik9000 commented Aug 6, 2024

When we serialize EsRelation we serialize the original EsIndex in addition to the `

Fields. To the fields.

@astefan check out the transport change I just pushed. It looks like we serialize the mapping of the index in addition to the FieldAttributes. I'm not sure we do anything with it. I'm tempted to remove it to be honest. But not here. The transport change I've added gets the job done and limits the blast radius of the unexpected serialization.

@nik9000 nik9000 removed the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Aug 6, 2024
@nik9000
Copy link
Member Author

nik9000 commented Aug 6, 2024

@astefan I pushed a change just now to explain the reading change better. It's to do with the way we reuse a function for serialization.

@astefan
Copy link
Contributor

astefan commented Aug 6, 2024

Thank you @nik9000

@nik9000
Copy link
Member Author

nik9000 commented Aug 6, 2024

Thank you @nik9000

That sounds like and approal. I'm mash the merge button. Let me know if you have any more ideas around this one. It's been a trip!

@nik9000 nik9000 merged commit 3d6b7aa into elastic:main Aug 6, 2024
15 checks passed
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Aug 7, 2024
* upstream/main: (132 commits)
  Fix compile after several merges
  Update docs with new behavior on skip conditions (elastic#111640)
  Skip on any instance of node or version features being present (elastic#111268)
  Skip on any node capability being present (elastic#111585)
  [DOCS] Publishes Anthropic inference service docs. (elastic#111619)
  Introduce `ChunkedZipResponse` (elastic#109820)
  [Gradle] fix esql compile cacheability (elastic#111651)
  Mute org.elasticsearch.datastreams.logsdb.qa.StandardVersusLogsIndexModeChallengeRestIT testTermsQuery elastic#111666
  Mute org.elasticsearch.datastreams.logsdb.qa.StandardVersusLogsIndexModeChallengeRestIT testMatchAllQuery elastic#111664
  Mute org.elasticsearch.xpack.esql.analysis.VerifierTests testMatchCommand elastic#111661
  Mute org.elasticsearch.xpack.esql.optimizer.LocalPhysicalPlanOptimizerTests testMatchCommandWithMultipleMatches {default} elastic#111660
  Mute org.elasticsearch.xpack.esql.optimizer.LocalPhysicalPlanOptimizerTests testMatchCommand {default} elastic#111659
  Mute org.elasticsearch.xpack.esql.optimizer.LocalPhysicalPlanOptimizerTests testMatchCommandWithWhereClause {default} elastic#111658
  LogsDB qa tests - add specific matcher for source (elastic#111568)
  ESQL: Move `randomLiteral` (elastic#111647)
  [ESQL] Clean up UNSUPPORTED type blocks (elastic#111648)
  ESQL: Remove the `NESTED` DataType (elastic#111495)
  ESQL: Move more out of esql-core (elastic#111604)
  Improve MvPSeriesWeightedSum edge case and add more tests (elastic#111552)
  Add link to flood-stage watermark exception message (elastic#111315)
  ...

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
mhl-b pushed a commit that referenced this pull request Aug 8, 2024
We don't support nested fields at the moment and when we do I'm not sure
it'll *be* a DataType. Maybe. Probably. But let's deal with that when we
support it.
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Sep 4, 2024
We don't support nested fields at the moment and when we do I'm not sure
it'll *be* a DataType. Maybe. Probably. But let's deal with that when we
support it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants