Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce the new convention for multi-fields text indexing to the README. #140

Merged
merged 3 commits into from
Oct 24, 2018

Conversation

webmat
Copy link
Contributor

@webmat webmat commented Oct 19, 2018

The nested field name is open for discussion. I started with the name *.text
because I think it will be intuitive even for less technical users. If they want
to search the text, they need to targed fieldname.text. To me it sounds
less technical than *.analyzed or *.fts.

I'm however open to discussing this point before merging.

@webmat webmat mentioned this pull request Oct 19, 2018
26 tasks
@webmat
Copy link
Contributor Author

webmat commented Oct 19, 2018

If there's ever a change or a concept introduced by ECS that may rain 🔥on us, this is this new convention 😄

I'd like the opinions of a few people on this one please :-)

I think doing the nesting differently (with full text search being the nested field, instead of keyword being the nested field) is something we don't have a choice, if we want to avoid future breaking changes as much as possible. It's probably the most controversial part, though.

Then there's the question of "what should we name the nested field?", which is less critical. I think *.text is the least technical word, so that's what I went with in this initial writing of the documentation. But I'm open to suggestions.

@webmat
Copy link
Contributor Author

webmat commented Oct 19, 2018

Note that the build fails because of an unrelated change introduced in #139

README.md Outdated
and allows for [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html)
(what Kibana visualizations are built on).

In some cases, only one type of indexing makes sense for a field.
By default, unless your index mapping specifies otherwise, ElasticSearch indexes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elasticsearch

README.md Outdated
and allows for [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html)
(what Kibana visualizations are built on).

In some cases, only one type of indexing makes sense for a field.
By default, unless your index mapping specifies otherwise, ElasticSearch indexes
text field as `text` at the canonical field name, and indexes as second time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note here: The ECS templates has keyword as default specified: https://github.com/elastic/ecs/blob/master/template.json#L13

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. In our case, the index template does specify otherwise. Perhaps I should indeed reword this a bit, to avoid confusion.

I was referring to the broader use case when Elasticsearch, not while using the ECS index template.

README.md Outdated
to sort them by frequency (that's an aggregation). They can also be long and
varied enough that full text search can be useful on them.
* Canonical field: `myfield` is `text`
* Nested field: `myfield.keyword` is `keyword`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not used nested field here normally people think of nested datatype in Eleasticsearch: https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I can see how that can cause confusion. Looking at the documentation for multi-fields, there is no specific wording to reference that other field.

Do you have a suggestion on what we should call that nested field?

A "multi-field" sounds to me like it refers to all the fields pointing at the same data, not specifically the nested field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of canonical fields, I would use core data types (coming from the es docs) and use multi fields here. Something like the field inside the multi field?

README.md Outdated

Whenever both types of indexing are helpful, we use multi-fields indexing. The
convention used is the following:
For monitoring use cases, we need almost exclusively `keyword` indexing, with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For monitoring use cases, we need almost exclusively `keyword` indexing, with
For monitoring use cases, almost exclusively `keyword` indexing is needed, with

README.md Outdated
convention used is the following:
For monitoring use cases, we need almost exclusively `keyword` indexing, with
full text search on very few field fields. Given this premise, ECS defaults
all text indexing to `keyword` at the top level (with only two exceptions).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
all text indexing to `keyword` at the top level (with only two exceptions).
all text indexing to `keyword` at the top level (with only few exceptions).

I don't like the docs to become outdated just because we add or remove one.

README.md Outdated
full text search on very few field fields. Given this premise, ECS defaults
all text indexing to `keyword` at the top level (with only two exceptions).
Any use case that requires full text search indexing on additional fields
can simply add a nested field for full text search.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a multi field not a nested field. Best also link here to the multi field docs.


The fields that only make sense as type `keyword` are not named `foo.raw`, the
plain field (`foo`) will be of type `keyword`, with no nested field.
#### Exceptions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to document this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, forgot to drop that old text.

Copy link
Contributor Author

@webmat webmat Oct 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, misread that. I thought we still had a reference to .raw fields.

I think it's worthwhile to document the break from this new convention. These are widely used fields and they will behave exactly the reverse of this new convention we're introducing. I think it's helpful to make sure it's clear.

Or alternately, I would make them follow the new convention, but declare them right away as a multi-field. E.g. message is keyword and message.text is text. This would avoid the uncomfortable explanation of explaining two exceptions, and would let people do fast exact match filtering based on the message field without having to reintroduce message.keyword ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned before, these are not exceptions for me.


The only two exceptions to this convention are fields `message` and `error.message`,
which are indexed for full text search only, with no nested field.
These two fields don't follow the new convention because they are deemed too big
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the purpose of message is to be index so even if we would not have it as text today I think it should be text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean by this. message is indexed as text right now.

Are you mixing this up with my comment here #138 (review) 😄 ?

These two fields don't follow the new convention because they are deemed too big
of a breaking change with these two widely used fields in Beats.

Any future field that will be indexed for full text search in ECS will however
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's really skip the Exception part here as I think we need to discuss when we encounter other fields with text purpose what we should do with it and not get ahead of us in the docs here.

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with getting these changes in and revise them later again.

In general I don't think we should explain Elasticsearch but only reference to the docs here but rather focus on explaining how it works in ECS based on the assumption, users will use the template.

and allows for [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html)
(what Kibana visualizations are built on).

In some cases, only one type of indexing makes sense for a field.
By default, unless your index mapping or index template specifies otherwise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of writing this here, could we link to the ECS docs? We should if possible not explain how Elasticsearch works in ECS.

@webmat
Copy link
Contributor Author

webmat commented Oct 23, 2018

Rebased on top of the autopep8 fix.

@webmat
Copy link
Contributor Author

webmat commented Oct 24, 2018

Since it's been approved yesterday, I'll merge and we can indeed tweak later.

@webmat webmat merged commit f9d5f01 into elastic:master Oct 24, 2018
@webmat webmat deleted the text-indexing-readme branch October 24, 2018 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants