Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow metadata fields in the _source #61590

Merged
merged 15 commits into from
Sep 18, 2020
Merged

Conversation

csoulios
Copy link
Contributor

So far we don't allow metadata fields in the document _source. However, in the case of the _doc_count field mapper (#58339) we want to be able to set

This PR adds a method to the metadata field parsers that exposes if the field can be included in the document source or not.
This way each metadata field can configure if it can be included in the document _source

@csoulios csoulios added :Search Foundations/Mapping Index mappings, including merging and defining field types v8.0.0 v7.10.0 labels Aug 26, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Mapping)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Aug 26, 2020
Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that DocumentParser will require some additional changes -- it looks like FieldMapper#parse will never called on a metadata field, even if isAllowedInSource is enabled and the value is present in the _source. It'd be great to add a test using a dummy MetadataFieldMapper to check that everything is wired up correctly.

@csoulios
Copy link
Contributor Author

csoulios commented Sep 9, 2020

@elasticmachine run elasticsearch-ci/packaging-sample-windows

/**
* @return Whether a metadata field can be included in the document _source.
*/
default boolean isAllowedInSource() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we really need this new flag isAllowedInSource. Maybe we could avoid adding a new flag and instead do the following:

  • For metadata fields that are not allowed in _source, make sure that MetadataFieldMapper#parse throws an descriptive error.
  • Many metadata fields currently have logic in parse that's unrelated to _source parsing. We could make sure to move it into the special methods preParse or postParse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtibshirani I agree with you that delegating this to MetadataFieldMapper#parse() would be a simpler solution. However I see two caveats:

  • The approach with isAllowedInSource() method detects that a metadata field is being parsed when parsing the field name. This is early in the parsing process. An exception is thrown and the field value parsing is skipped.

  • As you mention, I have seen classes (such as IgnoredFieldMapper and VersionFieldMapper) that override parse() method to do nothing on one hand, while calling super.parse() from preParse()/postParse() methods. This approach is too complicated imho, but I am not sure that refactoring this is very simple. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is early in the parsing process. An exception is thrown and the field value parsing is skipped.

I don't think we need to optimize performance in this case because it's an error condition (and should be a rare one too).

This approach is too complicated imho, but I am not sure that refactoring this is very simple.

I agree that the logic is complex/ hard to read! I haven't deeply looked into the refactoring myself -- maybe you could try it out and we can rediscuss the approach if you see a roadblock ?

throw new MapperParsingException("Field [" + currentFieldName + "] is a metadata field and cannot be added inside"
+ " a document. Use the index API request parameters.");
if (context.mapperService().isFieldAllowedInSource(context.path().pathAsText(currentFieldName))) {
// If token is a metadata field and is allowed in source, parse its value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning for rejecting null and non-leaf values here?

Copy link
Contributor Author

@csoulios csoulios Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimczi and I had a discussion about the possible use cases we want to cover with this feature in the near future and agreed that for the sake of simplicity we should only accept non-null values (no objects or arrays).

I don't have any strong opinions and I am happy to revisit this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think null should be accepted but non-leaf values seemed more challenging. Now that I re-think about it, it shouldn't be an issue as long as we consume the value from the root (object or not).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So +1 to handle object, arrays and null values and let the metadata field mapper deals with it.

Copy link
Contributor Author

@csoulios csoulios Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let me work on this again

@csoulios
Copy link
Contributor Author

@jtibshirani I pushed changes that address your comments. Please let me know what you think. :)

}

@Override
protected void parseCreateField(ParseContext context) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of overriding parse here, could we override parseCreateField?

@Override
protected void parseCreateField(ParseContext context) throws IOException {
    throw new MapperParsingException("Field [" + name() + "] is a metadata field and cannot be added inside"
        + " a document. Use the index API request parameters.");
}

Then we could do the following:

  • For meta fields that cannot be specified in _source, move all the relevant logic out of parseCreateField and into preParse or postParse as appropriate.
  • Remove the new doParse method, as it's no longer needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I had a hard time thinking a simple and clean way to refactor this part.

Initially, I tried the approach you suggested, but FieldMapper#parse() does all the exception handling by encapsulating all exceptions thrown by parseCreateField() offering a preview of the parsed value.

(See

)

This means that the

    throw new MapperParsingException("Field [" + name() + "] is a metadata field and cannot be added inside"
        + " a document. Use the index API request parameters.");

will be encapsulated in the following exception:

     throw new MapperParsingException("failed to parse field [{}] of type [{}] in document with id '{}'. " +
                "Preview of field's value: '{}'", e, fieldType().name(), fieldType().typeName(),
                context.sourceToParse().id(), valuePreview);

On the other hand, methods preParse() and postParse() delegate parsing to parse() for the same reason (to handle parsing exception). If we were to move all parsing in those two methods, we would have to replicate exception handling in those methods as well.

Since all FieldMapper#parse() does is to delegate parsing to parseCreateField() and only do the exception handling, I decided to redirect this functionality in 'MetadataFieldMapper#doParse()and call this method frompreParse()andpostParse()`.

Copy link
Contributor

@jtibshirani jtibshirani Sep 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me that approach you tried/ I suggested is the cleanest way to go. My reasoning...

This means that the ... will be encapsulated in the following exception:

This seems like an okay compromise to me, all the exception information is there so the user can determine the cause. It's a little confusing that we wrap the message, but this should be a rare situation.

On the other hand, methods preParse() and postParse() delegate parsing to parse() for the same reason (to handle parsing exception).

I think that exception handling code is only really helpful when dealing with incorrect user-supplied values. Since preParse and postParse work with internal data such as context.sourceToParse().id(), it would indicate a serious logic error. So I don't think we'd need the same general exception handling strategy where we create a nice user-facing message.

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good to me, I have one last comment. Thanks for all the iterations on this!

private static Mapper getMapper(final ParseContext context, ObjectMapper objectMapper, String fieldName, String[] subfields) {
String fieldPath = context.path().pathAsText(fieldName);
if (context.mapperService().isMetadataField(fieldPath)) {
for (MetadataFieldMapper metadataFieldMapper : context.docMapper().mapping().getMetadataMappers()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed we're doing a linear scan through all the metadata mappers. Maybe we should make sure to store them as a map from name -> mapper to avoid this overhead?

Copy link
Contributor Author

@csoulios csoulios Sep 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I had this point in mind for improvement. I just saw that linear scan happens at:

for (MetadataFieldMapper metadataMapper : metadataFieldsMappers) {

I understand that pre/postParse() internalParseDocument() is called once per document, while the linear scan in getMapper() is executed once per metadata field.

I will fix this asap!

Copy link
Contributor Author

@csoulios csoulios Sep 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this last bit. Can you please have one last look?

Thank you for your patience and excellent guidance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes sense to me.

It feels redundant that we store an array of metadata mappers, a map from class -> mapper, and a map from name -> mapper. This could be simplified, but we don't necessarily need to do it in this PR.

@csoulios csoulios merged commit 55294e5 into elastic:master Sep 18, 2020
@csoulios csoulios deleted the metadata-source branch September 18, 2020 06:45
csoulios added a commit that referenced this pull request Sep 18, 2020
Backports #61590 to 7.x

    So far we don't allow metadata fields in the document _source. However, in the case of the _doc_count field mapper (#58339) we want to be able to set

    This PR adds a method to the metadata field parsers that exposes if the field can be included in the document source or not.
    This way each metadata field can configure if it can be included in the document _source
@andreidan
Copy link
Contributor

Removed backport pending as this was backported via #62616

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v7.10.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants