Search 'fields' option design + implementation #55363

jtibshirani · 2020-04-16T22:40:40Z

Original issue: #49028
Feature branch: field-retrieval
Docs: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/search-fields.html

Motivation

Often a user wants to retrieve a particular set of fields during a search. Currently, we don't support this usage pattern in a good way. In short, given a list of fields, there is no easy way to load all of their values:

We can’t load all of them from doc values. Some fields like text fields may not have doc values at all, or we may exceed the limit for a reasonable number of doc value fields to load.
It’s not easy to load all of them through source. For example, if the field is a field alias, it’s difficult to determine where to find its value in the source.

Better field retrieval support is becoming even more important now that we're introducing more field types that don’t fit the typical pattern like constant_keyword and the proposed runtime fields (#48063).

Feature Summary

We plan to add a new fields section to the search request, which users would specify instead of using source filtering to load fields from source:

POST logs-*/_search
{
  "query": { "match_all": {} },
  "fields": [
    "file.*",
    {
      "field": "event.timestamp",
      "format": "epoch_millis"
    },
    ...
  ]
}

Both full field names and wildcard patterns are accepted. Only leaf fields are returned, the API will not allow for fetching object values. The fields are returned as a flat list in the fields section in each hit, the same as we do for docvalue_fields and script_fields.

Overall, the API gives a friendly way to load fields from source:

If a non-standard field like a field alias, multi-field, or constant_keyword is specified in fields, then we’ll consult the mappings to find and return the right value.
The fields are returned in a flat list, as opposed to structured JSON.
For date and numeric field types, we would support the same format parameter as we do for docvalue_fields to allow for adjusting the format of the results.
Each value would be returned in a 'canonical' format -- for example if a field is mapped as an integer, it will be returned as an integer even if it was specified as a string in the _source.

Some clarifications:

In this first pass, the API will not attempt to load from stored fields or doc values.
For simplicity of parsing, values will always be returned in an array, even if there is only one value present.

Implementation Plan

Future improvements:

Move FieldMapper#lookupValues to MappedFieldType. (?)
Handle meta fields like _size.
Make use of more efficient source parsing: Partially parse source documents to speed up source access #52591.
Support the API in inner_hits.

Open Questions

If a wildcard pattern matches both a parent field and one of its multi-fields, should we just return the parent to avoid returning the same value twice? A similar question holds for field aliases and their target fields.
Should the API return fields in _source that have been disabled in the mappings (enabled: false)?
For keyword fields, should we apply the normalizer or return the original value?

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-04-16T22:40:42Z

Pinging @elastic/es-search (:Search/Search)

This commit adds the capability to `FieldTypeLookup` to retrieve a field's paths in the _source. When retrieving a field's values, we consult these source paths to make sure we load the relevant values. This allows us to handle requests for field aliases and multi-fields. We also retrieve values that were copied into the field through copy_to. To me this is what users would expect out of the API, and it's consistent with what comes back from `docvalues_fields` and `stored_fields`. However it does add some complexity, and was not something flagged as important from any of the clients I spoke to about this API. I'm looking for feedback on this point. Relates to #55363.

This PR replaces the marker interface with the method FieldMapper#parsesArrayValue. I find this cleaner and it will help with the fields retrieval work (elastic#55363). The refactor also ensures that only field mappers can declare they parse array values. Previously other types like ObjectMapper could implement the marker interface and be passed array values, which doesn't make sense.

This PR replaces the marker interface with the method FieldMapper#parsesArrayValue. I find this cleaner and it will help with the fields retrieval work (#55363). The refactor also ensures that only field mappers can declare they parse array values. Previously other types like ObjectMapper could implement the marker interface and be passed array values, which doesn't make sense.

This PR replaces the marker interface with the method FieldMapper#parsesArrayValue. I find this cleaner and it will help with the fields retrieval work (elastic#55363). The refactor also ensures that only field mappers can declare they parse array values. Previously other types like ObjectMapper could implement the marker interface and be passed array values, which doesn't make sense.

This PR replaces the marker interface with the method FieldMapper#parsesArrayValue. I find this cleaner and it will help with the fields retrieval work (#55363). The refactor also ensures that only field mappers can declare they parse array values. Previously other types like ObjectMapper could implement the marker interface and be passed array values, which doesn't make sense.

This commit adds the capability to `FieldTypeLookup` to retrieve a field's paths in the _source. When retrieving a field's values, we consult these source paths to make sure we load the relevant values. This allows us to handle requests for field aliases and multi-fields. We also retrieve values that were copied into the field through copy_to. To me this is what users would expect out of the API, and it's consistent with what comes back from `docvalues_fields` and `stored_fields`. However it does add some complexity, and was not something flagged as important from any of the clients I spoke to about this API. I'm looking for feedback on this point. Relates to #55363.

jtibshirani · 2020-06-30T00:28:16Z

I thought more about the question of whether we should apply normalizer before returning a keyword, and to me it makes sense to apply normalization by default:

The actual name normalizer suggests that it is performing standardization, which is exactly the sort of thing we claim to do when returning values. I was hesitant earlier because normalization felt more like text analysis to me, but I think it's typically value standardization (and we just happen to re-use the analyzer framework).
It's most consistent with other parts of the search response, in particular terms aggregations.
If the original value is actually interesting in terms of casing, accents, etc., then a user will probably avoid normalizing, and instead perform a case-insensitive search (coming soon :)). Or, they would set up a multi-field where one field contains the original, and the other a normalized form.

Tagging @jpountz @jimczi @javanna @nik9000 in case they have thoughts on the above, happy to discuss here!

This commit adds the capability to `FieldTypeLookup` to retrieve a field's paths in the _source. When retrieving a field's values, we consult these source paths to make sure we load the relevant values. This allows us to handle requests for field aliases and multi-fields. We also retrieve values that were copied into the field through copy_to. To me this is what users would expect out of the API, and it's consistent with what comes back from `docvalues_fields` and `stored_fields`. However it does add some complexity, and was not something flagged as important from any of the clients I spoke to about this API. I'm looking for feedback on this point. Relates to #55363.

jimczi · 2020-06-30T20:57:14Z

+1 to support normalizer ootb. As you said it's consistent with aggregations and script. That would also make the docvalue_fields alternative less appealing, which I find important since we want to simplify the reasoning when retrieving fields.

javanna · 2020-07-01T08:44:24Z

@jtibshirani when you say "by default" does that mean it can be disabled? From the point of view of "when you retrieve, you get what you sent, when you search and aggregate, you search on and get back what was indexed" I am torn, I would personally expect the raw value loaded from source. Though if users can control what they get, I would not mind that the default is normalized.

jtibshirani · 2020-07-07T23:02:43Z

@javanna I was indeed wondering if we could make the behavior configurable, but don't have immediate plans to do so. It's always nice to avoid configuration options and have strong defaults.

I have also been on the fence about this, I can see arguments both ways. I suggest that we move forward with normalizing the values for now. I'm going to ask the teams planning to use this feature (SQL, ML, Kibana) to try to integrate with it before we ship it, and have a short list of questions I plan to ask them which includes keyword normalization. The questions are tracked in the issue description.

jtibshirani · 2020-07-16T18:46:08Z

We discussed how geo fields should be returned with @talevy and @imotov. A summary of our discussion:

Geo data should be returned in a consistent format. We accept a variety of formats during indexing, and feel it would be helpful for clients if all fields were returned in a single format.
The default format should be 'geojson', since this matches our usual JSON return format, and it's natural for Kibana. We should also support well-known text, which is best for SQL. Note that geo points (in addition to shapes) will also be returned in geojson.
The user will be able to select the format by setting format: wkt or format: geojson.
Ideally if the source value is already in the right format to return, we won't re-parse it to a geolib object then re-serialize it.

jtibshirani · 2020-07-16T20:24:14Z

An additional note: now that we'll return points in geojson format by default, for consistency we should accept this format when indexing points. We currently don't allow geojson, the work to add support is tracked in #47815.

…60100) This feature adds a new `fields` parameter to the search request, which consults both the document `_source` and the mappings to fetch fields in a consistent way. The PR merges the `field-retrieval` feature branch. Addresses #49028 and #55363.

…60258) This feature adds a new `fields` parameter to the search request, which consults both the document `_source` and the mappings to fetch fields in a consistent way. The PR merges the `field-retrieval` feature branch. Addresses #49028 and #55363.

jtibshirani · 2020-07-28T20:58:13Z

The feature branch was merged in #60100. I'll open new issues/ PRs for the follow-up improvements mentioned in the description.

jtibshirani added >feature :Search/Search Search-related issues that do not fall into other categories Meta labels Apr 16, 2020

jtibshirani self-assigned this Apr 16, 2020

This was referenced Apr 17, 2020

A high level way of retrieving values for certain fields #49028

Closed

Add a simple 'fetch fields' phase. #55639

Merged

jtibshirani mentioned this issue Apr 28, 2020

In field retrieval API, handle non-standard source paths. #55889

Merged

jtibshirani changed the title ~~High-level field retrieval API design + implementation~~ Field retrieval API design + implementation Apr 28, 2020

jtibshirani mentioned this issue May 1, 2020

Remove support for 'external values' in document parsing? #56063

Open

5 tasks

rjernst added the Team:Search Meta label for search team label May 4, 2020

jtibshirani mentioned this issue May 19, 2020

Allow field mappers to retrieve fields from source. #56928

Merged

jtibshirani mentioned this issue Jun 2, 2020

Remove the 'array value parser' marker interface. #57571

Merged

jtibshirani mentioned this issue Jun 30, 2020

Add docs for the fields retrieval API. #58787

Merged

wylieconlon mentioned this issue Jul 2, 2020

[Lens] Field list is empty when missing permissions to view the mappings elastic/kibana#70520

Closed

jtibshirani mentioned this issue Jul 8, 2020

Apply keyword normalizers in the field retrieval API. #59260

Merged

javanna mentioned this issue Jul 9, 2020

Add support for runtime fields #59332

Closed

30 tasks

flash1293 mentioned this issue Jul 16, 2020

[Lens] Improve field existence check data fetching elastic/kibana#72012

Closed

jimczi mentioned this issue Jul 20, 2020

Highlighting should leverage the new field retrieval API #59931

Closed

This was referenced Jul 23, 2020

Add search 'fields' option to support high-level field retrieval. #60100

Merged

Support spatial fields in field retrieval API. #59821

Merged

jtibshirani mentioned this issue Jul 28, 2020

Streamline serialization when retrieving spatial data through fields. #60259

Closed

jtibshirani closed this as completed Jul 28, 2020

jtibshirani mentioned this issue Aug 11, 2020

Follow-up improvements to field retrieval. #60985

Closed

8 tasks

kertal mentioned this issue Aug 19, 2020

[Discover] Experimental usage of ES fields API elastic/kibana#75407

Closed

7 tasks

jtibshirani changed the title ~~Field retrieval API design + implementation~~ Search 'fields' option design + implementation Aug 27, 2020

Mpdreamz mentioned this issue Nov 16, 2020

7.10.1 Meta Ticket elastic/elasticsearch-net#5096

Closed

61 tasks

stevejgordon mentioned this issue Dec 17, 2020

7.11.0 Meta Ticket elastic/elasticsearch-net#5198

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search 'fields' option design + implementation #55363

Search 'fields' option design + implementation #55363

jtibshirani commented Apr 16, 2020 •

edited

Loading

elasticmachine commented Apr 16, 2020

jtibshirani commented Jun 30, 2020 •

edited

Loading

jimczi commented Jun 30, 2020

javanna commented Jul 1, 2020

jtibshirani commented Jul 7, 2020

jtibshirani commented Jul 16, 2020 •

edited

Loading

jtibshirani commented Jul 16, 2020

jtibshirani commented Jul 28, 2020

Search 'fields' option design + implementation #55363

Search 'fields' option design + implementation #55363

Comments

jtibshirani commented Apr 16, 2020 • edited Loading

Motivation

Feature Summary

Implementation Plan

Open Questions

elasticmachine commented Apr 16, 2020

jtibshirani commented Jun 30, 2020 • edited Loading

jimczi commented Jun 30, 2020

javanna commented Jul 1, 2020

jtibshirani commented Jul 7, 2020

jtibshirani commented Jul 16, 2020 • edited Loading

jtibshirani commented Jul 16, 2020

jtibshirani commented Jul 28, 2020

jtibshirani commented Apr 16, 2020 •

edited

Loading

jtibshirani commented Jun 30, 2020 •

edited

Loading

jtibshirani commented Jul 16, 2020 •

edited

Loading