Skip to content

Commit

Permalink
Introduce the new convention for multi-fields text indexing to the RE…
Browse files Browse the repository at this point in the history
…ADME. (#140)

* Introduce the new convention for multi-fields text indexing to the README.
* Be a little more explicit in the changelog for #137
  • Loading branch information
webmat authored Oct 24, 2018
1 parent 4ec8988 commit f9d5f01
Show file tree
Hide file tree
Showing 3 changed files with 70 additions and 48 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ All notable changes to this project will be documented in this file based on the
* Remove `*.timezone.offset.sec` fields as too specific for ECS at the moment. #134
* Make the following fields keyword: device.vendor, file.path, file.target_path, http.response.body, network.name, organization.name, url.href, url.path, url.query, user_agent.original
* Rename `url.host.name` to `url.hostname` to better align with industry convention.
* Make the following fields keyword: device.vendor, file.path, file.target_path, http.response.body, network.name, organization.name, url.href, url.path, url.query, user_agent.original. #137
* Only two fields using `text` indexing at this time are `message` and `error.message`.

### Bugfixes

Expand Down
58 changes: 34 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -458,40 +458,50 @@ Contributions of additional uses cases on top of ECS are welcome.

### Multi-fields text indexing

ElasticSearch can index text multiple ways:
Elasticsearch can index text multiple ways:

* [text](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html) indexing allows for full text search, or searching arbitrary words that
* [text](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html)
indexing allows for full text search, or searching arbitrary words that
are part of the field.
* [keyword](https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html) indexing allows for much faster
[exact match](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html)
and [prefix search](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html),
* [keyword](https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html)
indexing allows for much faster
[exact match filtering](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html),
[prefix search](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html),
and allows for [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html)
(what Kibana visualizations are built on).

In some cases, only one type of indexing makes sense for a field.
By default, unless your index mapping or index template specifies otherwise
(as the ECS index template does),
Elasticsearch indexes text field as `text` at the canonical field name,
and indexes a second time as `keyword`, nested in a multi-field.

However there are cases where both types of indexing can be useful, and we want
to index both ways.
As an example, log messages can sometimes be short enough that it makes sense
to sort them by frequency (that's an aggregation). They can also be long and
varied enough that full text search can be useful on them.
Default Elasticsearch convention:

Whenever both types of indexing are helpful, we use multi-fields indexing. The
convention used is the following:
* Canonical field: `myfield` is `text`
* Multi-field: `myfield.keyword` is `keyword`

* `foo`: `text` indexing.
The top level of the field (its plain name) is used for full text search.
* `foo.raw`: `keyword` indexing.
The nested field has suffix `.raw` and is what you will use for aggregations.
* Performance tip: when filtering your stream in Kibana (or elsewhere), if you
are filtering for an exact match or doing a prefix search,
both `text` and `keyword` field can be used, but doing so on the `keyword`
field (named `.raw`) will be much faster and less memory intensive.
For monitoring use cases, `keyword` indexing is needed almost exclusively, with
full text search on very few fields. Given this premise, ECS defaults
all text indexing to `keyword` at the top level (with very few exceptions).
Any use case that requires full text search indexing on additional fields
can simply add a [multi-field](https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html)
for full text search. Doing so does not conflict with ECS,
as the canonical field name will remain `keyword` indexed.

**Keyword only fields**
ECS multi-field convention for text:

The fields that only make sense as type `keyword` are not named `foo.raw`, the
plain field (`foo`) will be of type `keyword`, with no nested field.
* Canonical field: `myfield` is `keyword`
* Multi-field: `myfield.text` is `text`

#### Exceptions

The only exceptions to this convention are fields `message` and `error.message`,
which are indexed for full text search only, with no multi-field.
These two fields don't follow the new convention because they are deemed too big
of a breaking change with these two widely used fields in Beats.

Any future field that will be indexed for full text search in ECS will however
follow the multi-field convention where `text` indexing is nested in the multi-field.

### IDs are keywords not integers

Expand Down
58 changes: 34 additions & 24 deletions docs/implementing.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,40 +26,50 @@

### Multi-fields text indexing

ElasticSearch can index text multiple ways:
Elasticsearch can index text multiple ways:

* [text](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html) indexing allows for full text search, or searching arbitrary words that
* [text](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html)
indexing allows for full text search, or searching arbitrary words that
are part of the field.
* [keyword](https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html) indexing allows for much faster
[exact match](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html)
and [prefix search](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html),
* [keyword](https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html)
indexing allows for much faster
[exact match filtering](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html),
[prefix search](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html),
and allows for [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html)
(what Kibana visualizations are built on).

In some cases, only one type of indexing makes sense for a field.
By default, unless your index mapping or index template specifies otherwise
(as the ECS index template does),
Elasticsearch indexes text field as `text` at the canonical field name,
and indexes a second time as `keyword`, nested in a multi-field.

However there are cases where both types of indexing can be useful, and we want
to index both ways.
As an example, log messages can sometimes be short enough that it makes sense
to sort them by frequency (that's an aggregation). They can also be long and
varied enough that full text search can be useful on them.
Default Elasticsearch convention:

Whenever both types of indexing are helpful, we use multi-fields indexing. The
convention used is the following:
* Canonical field: `myfield` is `text`
* Multi-field: `myfield.keyword` is `keyword`

* `foo`: `text` indexing.
The top level of the field (its plain name) is used for full text search.
* `foo.raw`: `keyword` indexing.
The nested field has suffix `.raw` and is what you will use for aggregations.
* Performance tip: when filtering your stream in Kibana (or elsewhere), if you
are filtering for an exact match or doing a prefix search,
both `text` and `keyword` field can be used, but doing so on the `keyword`
field (named `.raw`) will be much faster and less memory intensive.
For monitoring use cases, `keyword` indexing is needed almost exclusively, with
full text search on very few fields. Given this premise, ECS defaults
all text indexing to `keyword` at the top level (with very few exceptions).
Any use case that requires full text search indexing on additional fields
can simply add a [multi-field](https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html)
for full text search. Doing so does not conflict with ECS,
as the canonical field name will remain `keyword` indexed.

**Keyword only fields**
ECS multi-field convention for text:

The fields that only make sense as type `keyword` are not named `foo.raw`, the
plain field (`foo`) will be of type `keyword`, with no nested field.
* Canonical field: `myfield` is `keyword`
* Multi-field: `myfield.text` is `text`

#### Exceptions

The only exceptions to this convention are fields `message` and `error.message`,
which are indexed for full text search only, with no multi-field.
These two fields don't follow the new convention because they are deemed too big
of a breaking change with these two widely used fields in Beats.

Any future field that will be indexed for full text search in ECS will however
follow the multi-field convention where `text` indexing is nested in the multi-field.

### IDs are keywords not integers

Expand Down

0 comments on commit f9d5f01

Please sign in to comment.