Skip to content

Commit

Permalink
Add docs for the fields retrieval API.
Browse files Browse the repository at this point in the history
  • Loading branch information
jtibshirani committed Jun 30, 2020
1 parent e94971e commit 9317815
Showing 1 changed file with 184 additions and 77 deletions.
261 changes: 184 additions & 77 deletions docs/reference/search/search-fields.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,33 +4,203 @@

By default, each hit in the search response includes the document
<<mapping-source-field,`_source`>>, which is the entire JSON object that was
provided when indexing the document. If you only need certain source fields in
the search response, you can use the <<source-filtering,source filtering>> to
restrict what parts of the source are returned.
provided when indexing the document. To retrieve specific fields in the search
response, you can use the `fields` parameter:

Returning fields using only the document source has some limitations:
[source,console]
----
GET /_search
{
"query": {
"term": {
"user.id": "8a4f500d"
}
},
"fields": ["user.name", "timestamp"],
"_source": false
}
----

* The `_source` field does not include <<multi-fields, multi-fields>> or
<<alias, field aliases>>. Likewise, a field in the source does not contain
values copied using the <<copy-to,`copy_to`>> mapping parameter.
* Since the `_source` is stored as a single field in Lucene, the whole source
object must be loaded and parsed, even if only a small number of fields are
needed.
The `fields` parameter consults both a document's `_source` and the index
mappings to load and return values. Because it makes use of the mappings,
`fields` has some advantages over referencing the `_source` directly: it
accepts <<multi-fields, multi-fields>> and <<alias, field aliases>>, and
also formats field values like dates in a consistent way.

To avoid these limitations, you can:
A document's `_source` is stored as a single field in Lucene. So the whole
`_source` object must be loaded and parsed even if only a small number of
fields are requested. To avoid this limitation, you can try another option for
loading fields:

* Use the <<docvalue-fields, `docvalue_fields`>>
parameter to get values for selected fields. This can be a good
choice when returning a fairly small number of fields that support doc values,
such as keywords and dates.
* Use the <<request-body-search-stored-fields, `stored_fields`>> parameter to get the values for specific stored fields. (Fields that use the <<mapping-store,`store`>> mapping option.)
* Use the <<request-body-search-stored-fields, `stored_fields`>> parameter to
get the values for specific stored fields (fields that use the
<<mapping-store,`store`>> mapping option).

You can find more detailed information on each of these methods in the
You can find more detailed information on each of these methods in the
following sections:

* <<source-filtering>>
* <<search-fields-param>>
* <<docvalue-fields>>
* <<stored-fields>>
* <<source-filtering>>

[discrete]
[[search-fields-param]]
=== Fields

The `fields` parameter allows for retrieving a list of document fields in
the search response. It consults both the document `_source` and the index
mappings to return each value in a standardized way that matches its mapping
type. By default, date fields are formatted according to the
<<mapping-date-format,date format>> parameter in their mappings.

.*Example*
[%collapsible]
====
The following search request uses the `fields` parameter to retrieve values
for the `clientip` field, all fields starting with `location.`, and the
`timestamp` field:
[source,console]
----
POST logs-*/_search
{
"query": {
"match_all": {}
},
"fields": [
"clientip",
"location.*", <1>
{
"field": "timestamp",
"format": "epoch_millis" <2>
}
],
"_source": false
}
----
<1> Both full field names and wildcard patterns are accepted.
<2> Using object notation, you can pass a `format` parameter to apply a custom
format for the field's values. This is currently supported for
<<date,`date` fields>> and <<date_nanos, `date_nanos` fields>>, which
accept a <<mapping-date-format,date format>>.
The values are returned as a flat list in the `fields` section in each hit:
[source,console-result]
----
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "http-logs",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"clientip" : [
"192.0.2.0"
],
"location.city" : [
"Toronto"
],
"location.country" : [
"Canada"
],
"timestamp" : [
"1593274413000"
]
}
}
]
}
}
----
// TESTRESPONSE[skip:no test set-up]
Only leaf fields are returned -- `fields` does not allow for fetching entire
objects.
====

The `fields` parameter handles field types like <<alias, field aliases>> and
<<constant-keyword, `constant_keyword`>> whose values aren't always present in
the `_source`. Other mapping options are also respected, including
<<ignore-above, `ignore_above`>>, <<ignore-malformed, `ignore_malformed`>> and
<<null-value, `null_value`>>.

[discrete]
[[docvalue-fields]]
=== Doc value fields

You can use the <<docvalue-fields,`docvalue_fields`>> parameter to return
<<doc-values,doc values>> for one or more fields in the search response.

Doc values store the same values as the `_source` but in an on-disk,
column-based structure that's optimized for sorting and aggregations. Since each
field is stored separately, {es} only reads the field values that were requested
and can avoid loading the whole document `_source`.

Doc values are stored for supported fields by default. However, doc values are
not supported for <<text,`text`>> or
{plugins}/mapper-annotated-text-usage.html[`text_annotated`] fields.

.*Example*
[%collapsible]
====
The following search request uses the `docvalue_fields` parameter to retrieve
doc values for the `clientip` field, all fields starting with `location.`, and the
`timestamp` field:
[source,console]
----
GET /_search
{
"query": {
"match_all": {}
},
"docvalue_fields": [
"clientip", <1>
"location.*",
{
"field": "timestamp",
"format": "epoch_millis" <2>
}
]
}
----
<1> Both full field names and wildcard patterns are accepted.
<2> Using object notation, you can pass a `format` parameter to apply a custom
format for the field's doc values. <<date,Date fields>> support a
<<mapping-date-format,date `format`>>. <<number,Numeric fields>> support a
https://docs.oracle.com/javase/8/docs/api/java/text/DecimalFormat.html[DecimalFormat
pattern]. Other field datatypes do not support the `format` parameter.
====

TIP: You cannot use the `docvalue_fields` parameter to retrieve doc values for
nested objects. If you specify a nested object, the search returns an empty
array (`[ ]`) for the field. To access nested fields, use the
<<request-body-search-inner-hits, `inner_hits`>> parameter's `docvalue_fields`
property.

[discrete]
[[source-filtering]]
Expand Down Expand Up @@ -122,69 +292,6 @@ GET /_search
----
====


[discrete]
[[docvalue-fields]]
=== Doc value fields

You can use the <<docvalue-fields,`docvalue_fields`>> parameter to return
<<doc-values,doc values>> for one or more fields in the search response.

Doc values store the same values as the `_source` but in an on-disk,
column-based structure that's optimized for sorting and aggregations. Since each
field is stored separately, {es} only reads the field values that were requested
and can avoid loading the whole document `_source`.

Doc values are stored for supported fields by default. However, doc values are
not supported for <<text,`text`>> or
{plugins}/mapper-annotated-text-usage.html[`text_annotated`] fields.

.*Example*
[%collapsible]
====
The following search request uses the `docvalue_fields` parameter to
retrieve doc values for the following fields:
* Fields with names starting with `my_ip`
* `my_keyword_field`
* Fields with names ending with `_date_field`
[source,console]
----
GET /_search
{
"query": {
"match_all": {}
},
"docvalue_fields": [
"my_ip*", <1>
{
"field": "my_keyword_field" <2>
},
{
"field": "*_date_field",
"format": "epoch_millis" <3>
}
]
}
----
<1> Wildcard patten used to match field names, specified as a string.
<2> Wildcard patten used to match field names, specified as an object.
<3> With the object notation, you can use the `format` parameter to specify a
format for the field's returned doc values. <<date,Date fields>> support a
<<mapping-date-format,date `format`>>. <<number,Numeric fields>> support a
https://docs.oracle.com/javase/8/docs/api/java/text/DecimalFormat.html[DecimalFormat
pattern]. Other field datatypes do not support the `format` parameter.
====

TIP: You cannot use the `docvalue_fields` parameter to retrieve doc values for
nested objects. If you specify a nested object, the search returns an empty
array (`[ ]`) for the field. To access nested fields, use the
<<request-body-search-inner-hits, `inner_hits`>> parameter's `docvalue_fields`
property.


[discrete]
[[stored-fields]]
=== Stored fields
Expand Down

0 comments on commit 9317815

Please sign in to comment.