-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce the new convention for multi-fields text indexing to the README. #140
Conversation
If there's ever a change or a concept introduced by ECS that may rain 🔥on us, this is this new convention 😄 I'd like the opinions of a few people on this one please :-) I think doing the nesting differently (with full text search being the nested field, instead of keyword being the nested field) is something we don't have a choice, if we want to avoid future breaking changes as much as possible. It's probably the most controversial part, though. Then there's the question of "what should we name the nested field?", which is less critical. I think |
Note that the build fails because of an unrelated change introduced in #139 |
README.md
Outdated
and allows for [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html) | ||
(what Kibana visualizations are built on). | ||
|
||
In some cases, only one type of indexing makes sense for a field. | ||
By default, unless your index mapping specifies otherwise, ElasticSearch indexes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elasticsearch
README.md
Outdated
and allows for [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html) | ||
(what Kibana visualizations are built on). | ||
|
||
In some cases, only one type of indexing makes sense for a field. | ||
By default, unless your index mapping specifies otherwise, ElasticSearch indexes | ||
text field as `text` at the canonical field name, and indexes as second time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note here: The ECS templates has keyword as default specified: https://github.com/elastic/ecs/blob/master/template.json#L13
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. In our case, the index template does specify otherwise. Perhaps I should indeed reword this a bit, to avoid confusion.
I was referring to the broader use case when Elasticsearch, not while using the ECS index template.
README.md
Outdated
to sort them by frequency (that's an aggregation). They can also be long and | ||
varied enough that full text search can be useful on them. | ||
* Canonical field: `myfield` is `text` | ||
* Nested field: `myfield.keyword` is `keyword` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets not used nested field
here normally people think of nested datatype in Eleasticsearch: https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I can see how that can cause confusion. Looking at the documentation for multi-fields, there is no specific wording to reference that other field.
Do you have a suggestion on what we should call that nested field?
A "multi-field" sounds to me like it refers to all the fields pointing at the same data, not specifically the nested field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of canonical fields, I would use core data types (coming from the es docs) and use multi fields here. Something like the field inside the multi field
?
README.md
Outdated
|
||
Whenever both types of indexing are helpful, we use multi-fields indexing. The | ||
convention used is the following: | ||
For monitoring use cases, we need almost exclusively `keyword` indexing, with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For monitoring use cases, we need almost exclusively `keyword` indexing, with | |
For monitoring use cases, almost exclusively `keyword` indexing is needed, with |
README.md
Outdated
convention used is the following: | ||
For monitoring use cases, we need almost exclusively `keyword` indexing, with | ||
full text search on very few field fields. Given this premise, ECS defaults | ||
all text indexing to `keyword` at the top level (with only two exceptions). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all text indexing to `keyword` at the top level (with only two exceptions). | |
all text indexing to `keyword` at the top level (with only few exceptions). |
I don't like the docs to become outdated just because we add or remove one.
README.md
Outdated
full text search on very few field fields. Given this premise, ECS defaults | ||
all text indexing to `keyword` at the top level (with only two exceptions). | ||
Any use case that requires full text search indexing on additional fields | ||
can simply add a nested field for full text search. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a multi field not a nested field. Best also link here to the multi field docs.
|
||
The fields that only make sense as type `keyword` are not named `foo.raw`, the | ||
plain field (`foo`) will be of type `keyword`, with no nested field. | ||
#### Exceptions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to document this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, forgot to drop that old text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, misread that. I thought we still had a reference to .raw
fields.
I think it's worthwhile to document the break from this new convention. These are widely used fields and they will behave exactly the reverse of this new convention we're introducing. I think it's helpful to make sure it's clear.
Or alternately, I would make them follow the new convention, but declare them right away as a multi-field. E.g. message
is keyword
and message.text
is text
. This would avoid the uncomfortable explanation of explaining two exceptions, and would let people do fast exact match filtering based on the message
field without having to reintroduce message.keyword
;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned before, these are not exceptions for me.
|
||
The only two exceptions to this convention are fields `message` and `error.message`, | ||
which are indexed for full text search only, with no nested field. | ||
These two fields don't follow the new convention because they are deemed too big |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the purpose of message
is to be index so even if we would not have it as text today I think it should be text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean by this. message
is indexed as text
right now.
Are you mixing this up with my comment here #138 (review) 😄 ?
These two fields don't follow the new convention because they are deemed too big | ||
of a breaking change with these two widely used fields in Beats. | ||
|
||
Any future field that will be indexed for full text search in ECS will however |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's really skip the Exception part here as I think we need to discuss when we encounter other fields with text purpose what we should do with it and not get ahead of us in the docs here.
1ebc3f2
to
8ece1e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with getting these changes in and revise them later again.
In general I don't think we should explain Elasticsearch but only reference to the docs here but rather focus on explaining how it works in ECS based on the assumption, users will use the template.
and allows for [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html) | ||
(what Kibana visualizations are built on). | ||
|
||
In some cases, only one type of indexing makes sense for a field. | ||
By default, unless your index mapping or index template specifies otherwise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of writing this here, could we link to the ECS docs? We should if possible not explain how Elasticsearch works in ECS.
8ece1e5
to
8fa1983
Compare
Rebased on top of the autopep8 fix. |
Since it's been approved yesterday, I'll merge and we can indeed tweak later. |
8fa1983
to
eb69685
Compare
The nested field name is open for discussion. I started with the name
*.text
because I think it will be intuitive even for less technical users. If they want
to search the text, they need to targed
fieldname.text
. To me it soundsless technical than
*.analyzed
or*.fts
.I'm however open to discussing this point before merging.