From 7bc8651757c6b28f01c07aa0d4f3ec3aacd758e2 Mon Sep 17 00:00:00 2001 From: Mathieu Martin Date: Fri, 19 Oct 2018 16:07:07 -0400 Subject: [PATCH 1/3] Introduce the new convention for multi-fields text indexing to the README. --- README.md | 53 +++++++++++++++++++++++++------------------- docs/implementing.md | 53 +++++++++++++++++++++++++------------------- 2 files changed, 60 insertions(+), 46 deletions(-) diff --git a/README.md b/README.md index 0fe9e727ad..ac380a0ca8 100644 --- a/README.md +++ b/README.md @@ -460,38 +460,45 @@ Contributions of additional uses cases on top of ECS are welcome. ElasticSearch can index text multiple ways: -* [text](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html) indexing allows for full text search, or searching arbitrary words that +* [text](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html) + indexing allows for full text search, or searching arbitrary words that are part of the field. -* [keyword](https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html) indexing allows for much faster - [exact match](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html) - and [prefix search](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html), +* [keyword](https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html) + indexing allows for much faster + [exact match filtering](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html), + [prefix search](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html), and allows for [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html) (what Kibana visualizations are built on). -In some cases, only one type of indexing makes sense for a field. +By default, unless your index mapping specifies otherwise, ElasticSearch indexes +text field as `text` at the canonical field name, and indexes as second time +as `keyword` in a nested field: -However there are cases where both types of indexing can be useful, and we want -to index both ways. -As an example, log messages can sometimes be short enough that it makes sense -to sort them by frequency (that's an aggregation). They can also be long and -varied enough that full text search can be useful on them. +* Canonical field: `myfield` is `text` +* Nested field: `myfield.keyword` is `keyword` -Whenever both types of indexing are helpful, we use multi-fields indexing. The -convention used is the following: +For monitoring use cases, we need almost exclusively `keyword` indexing, with +full text search on very few field fields. Given this premise, ECS defaults +all text indexing to `keyword` at the top level (with only two exceptions). +Any use case that requires full text search indexing on additional fields +can simply add a nested field for full text search. +Doing so does not conflict with ECS, as the canonical field name will remain +`keyword` indexed. -* `foo`: `text` indexing. - The top level of the field (its plain name) is used for full text search. -* `foo.raw`: `keyword` indexing. - The nested field has suffix `.raw` and is what you will use for aggregations. - * Performance tip: when filtering your stream in Kibana (or elsewhere), if you - are filtering for an exact match or doing a prefix search, - both `text` and `keyword` field can be used, but doing so on the `keyword` - field (named `.raw`) will be much faster and less memory intensive. +ECS multi-field convention for text: -**Keyword only fields** +* Canonical field: `myfield` is `keyword` +* Nested field: `myfield.text` is `text` -The fields that only make sense as type `keyword` are not named `foo.raw`, the -plain field (`foo`) will be of type `keyword`, with no nested field. +#### Exceptions + +The only two exceptions to this convention are fields `message` and `error.message`, +which are indexed for full text search only, with no nested field. +These two fields don't follow the new convention because they are deemed too big +of a breaking change with these two widely used fields in Beats. + +Any future field that will be indexed for full text search in ECS will however +follow the multi-field convention where `text` indexing is the nested field. ### IDs are keywords not integers diff --git a/docs/implementing.md b/docs/implementing.md index f515018751..3a563c8bf3 100644 --- a/docs/implementing.md +++ b/docs/implementing.md @@ -28,38 +28,45 @@ ElasticSearch can index text multiple ways: -* [text](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html) indexing allows for full text search, or searching arbitrary words that +* [text](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html) + indexing allows for full text search, or searching arbitrary words that are part of the field. -* [keyword](https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html) indexing allows for much faster - [exact match](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html) - and [prefix search](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html), +* [keyword](https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html) + indexing allows for much faster + [exact match filtering](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html), + [prefix search](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html), and allows for [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html) (what Kibana visualizations are built on). -In some cases, only one type of indexing makes sense for a field. +By default, unless your index mapping specifies otherwise, ElasticSearch indexes +text field as `text` at the canonical field name, and indexes as second time +as `keyword` in a nested field: -However there are cases where both types of indexing can be useful, and we want -to index both ways. -As an example, log messages can sometimes be short enough that it makes sense -to sort them by frequency (that's an aggregation). They can also be long and -varied enough that full text search can be useful on them. +* Canonical field: `myfield` is `text` +* Nested field: `myfield.keyword` is `keyword` -Whenever both types of indexing are helpful, we use multi-fields indexing. The -convention used is the following: +For monitoring use cases, we need almost exclusively `keyword` indexing, with +full text search on very few field fields. Given this premise, ECS defaults +all text indexing to `keyword` at the top level (with only two exceptions). +Any use case that requires full text search indexing on additional fields +can simply add a nested field for full text search. +Doing so does not conflict with ECS, as the canonical field name will remain +`keyword` indexed. -* `foo`: `text` indexing. - The top level of the field (its plain name) is used for full text search. -* `foo.raw`: `keyword` indexing. - The nested field has suffix `.raw` and is what you will use for aggregations. - * Performance tip: when filtering your stream in Kibana (or elsewhere), if you - are filtering for an exact match or doing a prefix search, - both `text` and `keyword` field can be used, but doing so on the `keyword` - field (named `.raw`) will be much faster and less memory intensive. +ECS multi-field convention for text: -**Keyword only fields** +* Canonical field: `myfield` is `keyword` +* Nested field: `myfield.text` is `text` -The fields that only make sense as type `keyword` are not named `foo.raw`, the -plain field (`foo`) will be of type `keyword`, with no nested field. +#### Exceptions + +The only two exceptions to this convention are fields `message` and `error.message`, +which are indexed for full text search only, with no nested field. +These two fields don't follow the new convention because they are deemed too big +of a breaking change with these two widely used fields in Beats. + +Any future field that will be indexed for full text search in ECS will however +follow the multi-field convention where `text` indexing is the nested field. ### IDs are keywords not integers From 535d1c277d2ec7d7cd24b4c8193734b8bad81df3 Mon Sep 17 00:00:00 2001 From: Mathieu Martin Date: Fri, 19 Oct 2018 16:42:13 -0400 Subject: [PATCH 2/3] Be a little more explicit in the changelog for @ruflin's PR #137 --- CHANGELOG.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index e52a9a7c10..a7da08eb56 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -19,6 +19,8 @@ All notable changes to this project will be documented in this file based on the * Remove `*.timezone.offset.sec` fields as too specific for ECS at the moment. #134 * Make the following fields keyword: device.vendor, file.path, file.target_path, http.response.body, network.name, organization.name, url.href, url.path, url.query, user_agent.original * Rename `url.host.name` to `url.hostname` to better align with industry convention. +* Make the following fields keyword: device.vendor, file.path, file.target_path, http.response.body, network.name, organization.name, url.href, url.path, url.query, user_agent.original. #137 + * Only two fields using `text` indexing at this time are `message` and `error.message`. ### Bugfixes From eb6968546387d4a5e364690be8cae2053eaf8b7d Mon Sep 17 00:00:00 2001 From: Mathieu Martin Date: Mon, 22 Oct 2018 15:03:15 -0400 Subject: [PATCH 3/3] feedback.apply --- README.md | 33 ++++++++++++++++++--------------- docs/implementing.md | 33 ++++++++++++++++++--------------- 2 files changed, 36 insertions(+), 30 deletions(-) diff --git a/README.md b/README.md index ac380a0ca8..6c5af604c2 100644 --- a/README.md +++ b/README.md @@ -458,7 +458,7 @@ Contributions of additional uses cases on top of ECS are welcome. ### Multi-fields text indexing -ElasticSearch can index text multiple ways: +Elasticsearch can index text multiple ways: * [text](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html) indexing allows for full text search, or searching arbitrary words that @@ -470,35 +470,38 @@ ElasticSearch can index text multiple ways: and allows for [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html) (what Kibana visualizations are built on). -By default, unless your index mapping specifies otherwise, ElasticSearch indexes -text field as `text` at the canonical field name, and indexes as second time -as `keyword` in a nested field: +By default, unless your index mapping or index template specifies otherwise +(as the ECS index template does), +Elasticsearch indexes text field as `text` at the canonical field name, +and indexes a second time as `keyword`, nested in a multi-field. + +Default Elasticsearch convention: * Canonical field: `myfield` is `text` -* Nested field: `myfield.keyword` is `keyword` +* Multi-field: `myfield.keyword` is `keyword` -For monitoring use cases, we need almost exclusively `keyword` indexing, with -full text search on very few field fields. Given this premise, ECS defaults -all text indexing to `keyword` at the top level (with only two exceptions). +For monitoring use cases, `keyword` indexing is needed almost exclusively, with +full text search on very few fields. Given this premise, ECS defaults +all text indexing to `keyword` at the top level (with very few exceptions). Any use case that requires full text search indexing on additional fields -can simply add a nested field for full text search. -Doing so does not conflict with ECS, as the canonical field name will remain -`keyword` indexed. +can simply add a [multi-field](https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html) +for full text search. Doing so does not conflict with ECS, +as the canonical field name will remain `keyword` indexed. ECS multi-field convention for text: * Canonical field: `myfield` is `keyword` -* Nested field: `myfield.text` is `text` +* Multi-field: `myfield.text` is `text` #### Exceptions -The only two exceptions to this convention are fields `message` and `error.message`, -which are indexed for full text search only, with no nested field. +The only exceptions to this convention are fields `message` and `error.message`, +which are indexed for full text search only, with no multi-field. These two fields don't follow the new convention because they are deemed too big of a breaking change with these two widely used fields in Beats. Any future field that will be indexed for full text search in ECS will however -follow the multi-field convention where `text` indexing is the nested field. +follow the multi-field convention where `text` indexing is nested in the multi-field. ### IDs are keywords not integers diff --git a/docs/implementing.md b/docs/implementing.md index 3a563c8bf3..d1a006dd71 100644 --- a/docs/implementing.md +++ b/docs/implementing.md @@ -26,7 +26,7 @@ ### Multi-fields text indexing -ElasticSearch can index text multiple ways: +Elasticsearch can index text multiple ways: * [text](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html) indexing allows for full text search, or searching arbitrary words that @@ -38,35 +38,38 @@ ElasticSearch can index text multiple ways: and allows for [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html) (what Kibana visualizations are built on). -By default, unless your index mapping specifies otherwise, ElasticSearch indexes -text field as `text` at the canonical field name, and indexes as second time -as `keyword` in a nested field: +By default, unless your index mapping or index template specifies otherwise +(as the ECS index template does), +Elasticsearch indexes text field as `text` at the canonical field name, +and indexes a second time as `keyword`, nested in a multi-field. + +Default Elasticsearch convention: * Canonical field: `myfield` is `text` -* Nested field: `myfield.keyword` is `keyword` +* Multi-field: `myfield.keyword` is `keyword` -For monitoring use cases, we need almost exclusively `keyword` indexing, with -full text search on very few field fields. Given this premise, ECS defaults -all text indexing to `keyword` at the top level (with only two exceptions). +For monitoring use cases, `keyword` indexing is needed almost exclusively, with +full text search on very few fields. Given this premise, ECS defaults +all text indexing to `keyword` at the top level (with very few exceptions). Any use case that requires full text search indexing on additional fields -can simply add a nested field for full text search. -Doing so does not conflict with ECS, as the canonical field name will remain -`keyword` indexed. +can simply add a [multi-field](https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html) +for full text search. Doing so does not conflict with ECS, +as the canonical field name will remain `keyword` indexed. ECS multi-field convention for text: * Canonical field: `myfield` is `keyword` -* Nested field: `myfield.text` is `text` +* Multi-field: `myfield.text` is `text` #### Exceptions -The only two exceptions to this convention are fields `message` and `error.message`, -which are indexed for full text search only, with no nested field. +The only exceptions to this convention are fields `message` and `error.message`, +which are indexed for full text search only, with no multi-field. These two fields don't follow the new convention because they are deemed too big of a breaking change with these two widely used fields in Beats. Any future field that will be indexed for full text search in ECS will however -follow the multi-field convention where `text` indexing is the nested field. +follow the multi-field convention where `text` indexing is nested in the multi-field. ### IDs are keywords not integers