Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow field mappers to retrieve fields from source. #56928

Merged
merged 7 commits into from
May 28, 2020

Conversation

jtibshirani
Copy link
Contributor

@jtibshirani jtibshirani commented May 19, 2020

This PR adds new method FieldMapper#lookupValues(SourceLookup) that extracts and parses the source values. This lets us return values like numbers and dates in a consistent format, and also handle special data types like constant_keyword. The lookupValues method calls into parseSourceValue, which mappers can override to specify how values should be parsed.

Older draft PR: #56473

Relates to #55363.

@jtibshirani jtibshirani added >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types labels May 19, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Mapping)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label May 19, 2020
@jtibshirani
Copy link
Contributor Author

jtibshirani commented May 19, 2020

As part of this PR, we need to decide on the 'standard format' for each field type. I'll work to collect feedback/ discussion on this point. Here's the current proposal:

  • boolean: parse into boolean
  • binary: no change to value
  • completion: no change to value
  • constant_keyword: don't consult source, return the constant value
  • date: parse to date string based on mapping format
  • date_nanos: parse to long nanos based on mapping format
  • ip: collapse 0s in IPv6 addresses (to be consistent with docvalues_fields)
  • keyword: convert value to string, do not apply normalization
  • numeric types (integer, float, etc.): parse value to the right type, for example for a long field "42.3" becomes 42
  • range types (integer_range, etc.): parse each bound to the right type, this includes both numbers, dates, and IPs
  • rank_feature: convert value to float
  • scaled_float: convert value to float, taking scaling factor into account
  • text, annotated_text: convert value to string

Some notable examples where we don't parse at all:

  • histogram, rank_features, vector: no change to value, don't attempt to convert each component into a number. For me the complexity of digging through and standardizing these structured values outweighed the benefit.
  • murmur3, token_count: simply convert to string. It didn't seem practical to always recompute a hash/ token count, and I couldn't see an important use case for returning these values.

We never return a field that has been dropped through ignore_malformed. This way, the values returned for particular field will always have the same type and format. The idea is that the value is ignored in all high-level parts of the search request, including the query, aggs, and field retrieval.

Geo fields aren't handled yet, I plan to do this in a follow-up PR where we can have a dedicated discussion.

@jtibshirani jtibshirani changed the title Introduce a method FieldMapper#lookupValues to retrieve fields from source. Introduce FieldMapper#lookupValues to retrieve fields from source. May 19, 2020
@jtibshirani jtibshirani force-pushed the lookup-source-value branch 3 times, most recently from 66e6018 to 00e879d Compare May 19, 2020 04:01
@jtibshirani jtibshirani changed the title Introduce FieldMapper#lookupValues to retrieve fields from source. Allow field mappers to retrieve fields from source. May 19, 2020
@nik9000
Copy link
Member

nik9000 commented May 20, 2020

date: parse to long millis based on mapping format

This is one place where format is pretty important, I think. We should probably apply whatever the default "format" is for this field. Though that can totally be a follow up.

Also in a follow up, folks should be able to specify the format, I think. That way folks can say things like "give this is millis-since-epoch even if the default format is ISO8601".

I'm not sure if we need time zone support too. I know the aggs have it, but that feels like a big much at this point.

@nik9000
Copy link
Member

nik9000 commented May 20, 2020

ip: no change to value

Should this canonicalize the IP?

@jtibshirani
Copy link
Contributor Author

This is one place where format is pretty important, I think. We should probably apply whatever the default "format" is for this field. Though that can totally be a follow up.

I'll handle this now, I do think it fits best within this PR!

Also in a follow up, folks should be able to specify the format, I think. That way folks can say things like "give this is millis-since-epoch even if the default format is ISO8601".

+1, this is tracked in the meta-issue as a follow-up (#55363).

Should this canonicalize the IP?

Sorry if I'm missing something obvious, what's an example of the canonicalization you have in mind ?

@jtibshirani jtibshirani requested a review from astefan May 20, 2020 18:56
@nik9000
Copy link
Member

nik9000 commented May 20, 2020

Sorry if I'm missing something obvious, what's an example of the canonicalization you have in mind ?

I believe round tripping IPs through doc values would apply this.

@nik9000
Copy link
Member

nik9000 commented May 20, 2020

I believe round tripping IPs through doc values would apply this.

I think it might be as simple as parsing and toStringing to make that happen though.

@jtibshirani
Copy link
Contributor Author

A heads up that I rebased and force-pushed to incorporate a refactor (#56976).

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Left few minor comments.

@jtibshirani
Copy link
Contributor Author

Thanks @astefan for the review!

@nik9000 it would be great to get your review as well. Here are the changes since last time:

  • We now format dates using the format in the mapping. This applies to date, date_nanos, and date_range.
  • IPs are standardized using the utilities in InetAddresses. I don't have a strong intuition here, but it felt like a good choice since it matches what's returned in docvalue_fields.

@jtibshirani jtibshirani force-pushed the lookup-source-value branch from a5c3ce3 to f2d259b Compare May 28, 2020 04:16
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Sorry for the delay.

@jtibshirani jtibshirani merged this pull request into elastic:field-retrieval May 28, 2020
@jtibshirani jtibshirani deleted the lookup-source-value branch May 28, 2020 15:11
jtibshirani added a commit that referenced this pull request May 28, 2020
This PR adds new method `FieldMapper#lookupValues(SourceLookup)` that extracts
and parses the source values. This lets us return values like numbers and dates
in a consistent format, and also handle special data types like
`constant_keyword`. The `lookupValues` method calls into `parseSourceValue`,
which mappers can override to specify how values should be parsed.
jtibshirani added a commit that referenced this pull request May 28, 2020
This PR adds new method `FieldMapper#lookupValues(SourceLookup)` that extracts
and parses the source values. This lets us return values like numbers and dates
in a consistent format, and also handle special data types like
`constant_keyword`. The `lookupValues` method calls into `parseSourceValue`,
which mappers can override to specify how values should be parsed.
jtibshirani added a commit that referenced this pull request Jun 8, 2020
This PR adds new method `FieldMapper#lookupValues(SourceLookup)` that extracts
and parses the source values. This lets us return values like numbers and dates
in a consistent format, and also handle special data types like
`constant_keyword`. The `lookupValues` method calls into `parseSourceValue`,
which mappers can override to specify how values should be parsed.
jtibshirani added a commit that referenced this pull request Jun 17, 2020
This PR adds new method `FieldMapper#lookupValues(SourceLookup)` that extracts
and parses the source values. This lets us return values like numbers and dates
in a consistent format, and also handle special data types like
`constant_keyword`. The `lookupValues` method calls into `parseSourceValue`,
which mappers can override to specify how values should be parsed.
jtibshirani added a commit that referenced this pull request Jun 26, 2020
This PR adds new method `FieldMapper#lookupValues(SourceLookup)` that extracts
and parses the source values. This lets us return values like numbers and dates
in a consistent format, and also handle special data types like
`constant_keyword`. The `lookupValues` method calls into `parseSourceValue`,
which mappers can override to specify how values should be parsed.
jtibshirani added a commit that referenced this pull request Jun 30, 2020
This PR adds new method `FieldMapper#lookupValues(SourceLookup)` that extracts
and parses the source values. This lets us return values like numbers and dates
in a consistent format, and also handle special data types like
`constant_keyword`. The `lookupValues` method calls into `parseSourceValue`,
which mappers can override to specify how values should be parsed.
jtibshirani added a commit that referenced this pull request Jun 30, 2020
This PR adds new method `FieldMapper#lookupValues(SourceLookup)` that extracts
and parses the source values. This lets us return values like numbers and dates
in a consistent format, and also handle special data types like
`constant_keyword`. The `lookupValues` method calls into `parseSourceValue`,
which mappers can override to specify how values should be parsed.
jtibshirani added a commit that referenced this pull request Jul 8, 2020
This PR adds new method `FieldMapper#lookupValues(SourceLookup)` that extracts
and parses the source values. This lets us return values like numbers and dates
in a consistent format, and also handle special data types like
`constant_keyword`. The `lookupValues` method calls into `parseSourceValue`,
which mappers can override to specify how values should be parsed.
jtibshirani added a commit that referenced this pull request Jul 14, 2020
This PR adds new method `FieldMapper#lookupValues(SourceLookup)` that extracts
and parses the source values. This lets us return values like numbers and dates
in a consistent format, and also handle special data types like
`constant_keyword`. The `lookupValues` method calls into `parseSourceValue`,
which mappers can override to specify how values should be parsed.
jtibshirani added a commit that referenced this pull request Jul 15, 2020
This PR adds new method `FieldMapper#lookupValues(SourceLookup)` that extracts
and parses the source values. This lets us return values like numbers and dates
in a consistent format, and also handle special data types like
`constant_keyword`. The `lookupValues` method calls into `parseSourceValue`,
which mappers can override to specify how values should be parsed.
jtibshirani added a commit that referenced this pull request Jul 18, 2020
This PR adds new method `FieldMapper#lookupValues(SourceLookup)` that extracts
and parses the source values. This lets us return values like numbers and dates
in a consistent format, and also handle special data types like
`constant_keyword`. The `lookupValues` method calls into `parseSourceValue`,
which mappers can override to specify how values should be parsed.
jtibshirani added a commit that referenced this pull request Jul 21, 2020
This PR adds new method `FieldMapper#lookupValues(SourceLookup)` that extracts
and parses the source values. This lets us return values like numbers and dates
in a consistent format, and also handle special data types like
`constant_keyword`. The `lookupValues` method calls into `parseSourceValue`,
which mappers can override to specify how values should be parsed.
jtibshirani added a commit that referenced this pull request Jul 23, 2020
This PR adds new method `FieldMapper#lookupValues(SourceLookup)` that extracts
and parses the source values. This lets us return values like numbers and dates
in a consistent format, and also handle special data types like
`constant_keyword`. The `lookupValues` method calls into `parseSourceValue`,
which mappers can override to specify how values should be parsed.
jtibshirani added a commit to jtibshirani/elasticsearch that referenced this pull request Jul 27, 2020
This PR adds new method `FieldMapper#lookupValues(SourceLookup)` that extracts
and parses the source values. This lets us return values like numbers and dates
in a consistent format, and also handle special data types like
`constant_keyword`. The `lookupValues` method calls into `parseSourceValue`,
which mappers can override to specify how values should be parsed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants