-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce the new convention for multi-fields text indexing to the README. #140
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -458,40 +458,50 @@ Contributions of additional uses cases on top of ECS are welcome. | |
|
||
### Multi-fields text indexing | ||
|
||
ElasticSearch can index text multiple ways: | ||
Elasticsearch can index text multiple ways: | ||
|
||
* [text](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html) indexing allows for full text search, or searching arbitrary words that | ||
* [text](https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html) | ||
indexing allows for full text search, or searching arbitrary words that | ||
are part of the field. | ||
* [keyword](https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html) indexing allows for much faster | ||
[exact match](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html) | ||
and [prefix search](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html), | ||
* [keyword](https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html) | ||
indexing allows for much faster | ||
[exact match filtering](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html), | ||
[prefix search](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html), | ||
and allows for [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html) | ||
(what Kibana visualizations are built on). | ||
|
||
In some cases, only one type of indexing makes sense for a field. | ||
By default, unless your index mapping or index template specifies otherwise | ||
(as the ECS index template does), | ||
Elasticsearch indexes text field as `text` at the canonical field name, | ||
and indexes a second time as `keyword`, nested in a multi-field. | ||
|
||
However there are cases where both types of indexing can be useful, and we want | ||
to index both ways. | ||
As an example, log messages can sometimes be short enough that it makes sense | ||
to sort them by frequency (that's an aggregation). They can also be long and | ||
varied enough that full text search can be useful on them. | ||
Default Elasticsearch convention: | ||
|
||
Whenever both types of indexing are helpful, we use multi-fields indexing. The | ||
convention used is the following: | ||
* Canonical field: `myfield` is `text` | ||
* Multi-field: `myfield.keyword` is `keyword` | ||
|
||
* `foo`: `text` indexing. | ||
The top level of the field (its plain name) is used for full text search. | ||
* `foo.raw`: `keyword` indexing. | ||
The nested field has suffix `.raw` and is what you will use for aggregations. | ||
* Performance tip: when filtering your stream in Kibana (or elsewhere), if you | ||
are filtering for an exact match or doing a prefix search, | ||
both `text` and `keyword` field can be used, but doing so on the `keyword` | ||
field (named `.raw`) will be much faster and less memory intensive. | ||
For monitoring use cases, `keyword` indexing is needed almost exclusively, with | ||
full text search on very few fields. Given this premise, ECS defaults | ||
all text indexing to `keyword` at the top level (with very few exceptions). | ||
Any use case that requires full text search indexing on additional fields | ||
can simply add a [multi-field](https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html) | ||
for full text search. Doing so does not conflict with ECS, | ||
as the canonical field name will remain `keyword` indexed. | ||
|
||
**Keyword only fields** | ||
ECS multi-field convention for text: | ||
|
||
The fields that only make sense as type `keyword` are not named `foo.raw`, the | ||
plain field (`foo`) will be of type `keyword`, with no nested field. | ||
* Canonical field: `myfield` is `keyword` | ||
* Multi-field: `myfield.text` is `text` | ||
|
||
#### Exceptions | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we need to document this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch, forgot to drop that old text. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, misread that. I thought we still had a reference to I think it's worthwhile to document the break from this new convention. These are widely used fields and they will behave exactly the reverse of this new convention we're introducing. I think it's helpful to make sure it's clear. Or alternately, I would make them follow the new convention, but declare them right away as a multi-field. E.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As mentioned before, these are not exceptions for me. |
||
|
||
The only exceptions to this convention are fields `message` and `error.message`, | ||
which are indexed for full text search only, with no multi-field. | ||
These two fields don't follow the new convention because they are deemed too big | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the purpose of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure what you mean by this. Are you mixing this up with my comment here #138 (review) 😄 ? |
||
of a breaking change with these two widely used fields in Beats. | ||
|
||
Any future field that will be indexed for full text search in ECS will however | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's really skip the Exception part here as I think we need to discuss when we encounter other fields with text purpose what we should do with it and not get ahead of us in the docs here. |
||
follow the multi-field convention where `text` indexing is nested in the multi-field. | ||
|
||
### IDs are keywords not integers | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of writing this here, could we link to the ECS docs? We should if possible not explain how Elasticsearch works in ECS.