-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove support for 'external values' in document parsing? #56063
Comments
Pinging @elastic/es-search (:Search/Mapping) |
Pinging @elastic/es-core-infra (:Core/Infra/Plugins) |
This issue showed up in the core/infra team's issue triage because it has the |
Thanks @williamrandolph, that makes sense to me. I just had a bad habit of using labels to ping teams for awareness, whereas they really indicate ownership. |
I had a thought on a nice way to implement this, which would be that rather that setting an external value on the parse context, we could instead switch out the XContentParser. Field mappers that consume object-type structures such as the completion mapper could create a new XContentParser over their parsed values and pass that on to any subfields. This could also allow us to add some guards on single-valued field mappers to prevent their subfields calling |
#72203) The majority of field mappers read a single value from their positioned XContentParser, and do not need to call nextToken. There is a general assumption that the same holds for any multifields defined on them, and so the XContentParser is passed down to their multifields builder as-is. This assumption does not hold for mappers that accept json objects, and so we have a second mechanism for passing values around called 'external values', where a mapper can set a specific value on its context and child mappers can then check for these external values before reading from xcontent. The disadvantage of this is that every field mapper now needs to check its context for external values. Because the values are defined by their java class, we can also know that in the vast majority of cases this functionality is unused. We have only two mappers that actually make use of this, CompletionFieldMapper and GeoPointFieldMapper. This commit removes external values entirely, and replaces it with the ability to pass a modified XContentParser to multifields. FieldMappers can just check the parser attached to their context for data and don't need to worry about multiple sources. Plugins implementing field mappers will need to take the removal of external values into account. Implementations that are passing structured objects as external values should instead use ParseContext.switchParser and wrap the objects using MapXContentParser.wrapObject(). GeoPointFieldMapper passes on a fake parser that just wraps its input data formatted as a geohash; CompletionFieldMapper has a slightly more complicated parser that in general wraps its metadata, but if textOrNull() is called without the parser being advanced just returns its text input. Relates to #56063
elastic#72203) The majority of field mappers read a single value from their positioned XContentParser, and do not need to call nextToken. There is a general assumption that the same holds for any multifields defined on them, and so the XContentParser is passed down to their multifields builder as-is. This assumption does not hold for mappers that accept json objects, and so we have a second mechanism for passing values around called 'external values', where a mapper can set a specific value on its context and child mappers can then check for these external values before reading from xcontent. The disadvantage of this is that every field mapper now needs to check its context for external values. Because the values are defined by their java class, we can also know that in the vast majority of cases this functionality is unused. We have only two mappers that actually make use of this, CompletionFieldMapper and GeoPointFieldMapper. This commit removes external values entirely, and replaces it with the ability to pass a modified XContentParser to multifields. FieldMappers can just check the parser attached to their context for data and don't need to worry about multiple sources. Plugins implementing field mappers will need to take the removal of external values into account. Implementations that are passing structured objects as external values should instead use ParseContext.switchParser and wrap the objects using MapXContentParser.wrapObject(). GeoPointFieldMapper passes on a fake parser that just wraps its input data formatted as a geohash; CompletionFieldMapper has a slightly more complicated parser that in general wraps its metadata, but if textOrNull() is called without the parser being advanced just returns its text input. Relates to elastic#56063
#72203) (#72448) The majority of field mappers read a single value from their positioned XContentParser, and do not need to call nextToken. There is a general assumption that the same holds for any multifields defined on them, and so the XContentParser is passed down to their multifields builder as-is. This assumption does not hold for mappers that accept json objects, and so we have a second mechanism for passing values around called 'external values', where a mapper can set a specific value on its context and child mappers can then check for these external values before reading from xcontent. The disadvantage of this is that every field mapper now needs to check its context for external values. Because the values are defined by their java class, we can also know that in the vast majority of cases this functionality is unused. We have only two mappers that actually make use of this, CompletionFieldMapper and GeoPointFieldMapper. This commit removes external values entirely, and replaces it with the ability to pass a modified XContentParser to multifields. FieldMappers can just check the parser attached to their context for data and don't need to worry about multiple sources. Plugins implementing field mappers will need to take the removal of external values into account. Implementations that are passing structured objects as external values should instead use ParseContext.switchParser and wrap the objects using MapXContentParser.wrapObject(). GeoPointFieldMapper passes on a fake parser that just wraps its input data formatted as a geohash; CompletionFieldMapper has a slightly more complicated parser that in general wraps its metadata, but if textOrNull() is called without the parser being advanced just returns its text input. Relates to #56063
Question: given that we removed support for external values in #72448 , are we done with this issue? Where do we stand with the three items above that are not marked done, namely deprecate and remove multifields on geo_point, deprecate and remove nested completion mappers, and remove ParserContext.switchParser() ? |
No we aren't done with this issue unfortunately. Here is the discussion on why #72203 doesn't close the issue and what remains to be done: #72203 (comment). |
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
For fields whose value can be a JSON array or object like geopoints, the standard way of supporting multi-fields doesn’t work. When parsing a document, the parent mapper consumes the whole complex value, so when the parser is passed on to its subfields, they aren't able to parse the source value. To help support this case, parent fields can pass a parsed object through
ParseContext#externalValue
. Subfields can then use this parsed object instead of parsing the source value.This support for 'external values' has a few downsides:
FieldMapper#parseCreateField
, each field mapper must check whether an external value is set, regardless of whether it makes sense to pass custom values to the mapper. Making sure to check for an external value is a silent contract and is easy to forget.I wanted to raise the idea of removing support for external values. Within our own code, here are the mappers that would be affected:
ParentJoinFieldMapper
passes the IDs to its internalParentIdFieldMapper
objects. This could be replaced by a custom method onParentIdFieldMapper
.PercolatorFieldMapper
passes the query builder to an internalBinaryFieldMapper
. This could also be replaced by a custom method.CompletionFieldMapper
passes a parsed completion object to its sub-fields (Completion types with multi-fields support #34081). I wonder how important it is to support multi-fields for completion mappings, or if we could just drop support.GeoPointFieldMapper
passes a geohash to its sub-fields. This allows it to have multi-fields that are geopoints or even keywords. I also wonder about the use cases + importance of this functionality.It looks like external values were also used in the attachment mapper, but this mapper was removed and migrated to an ingest processor.
Here is a proposed process for removing this support
ParserContext.switchParser()
The text was updated successfully, but these errors were encountered: