Introduce the new convention for multi-fields text indexing to the README. #140

webmat · 2018-10-19T20:09:38Z

The nested field name is open for discussion. I started with the name *.text
because I think it will be intuitive even for less technical users. If they want
to search the text, they need to targed fieldname.text. To me it sounds
less technical than *.analyzed or *.fts.

I'm however open to discussing this point before merging.

webmat · 2018-10-19T20:15:10Z

If there's ever a change or a concept introduced by ECS that may rain 🔥on us, this is this new convention 😄

I'd like the opinions of a few people on this one please :-)

I think doing the nesting differently (with full text search being the nested field, instead of keyword being the nested field) is something we don't have a choice, if we want to avoid future breaking changes as much as possible. It's probably the most controversial part, though.

Then there's the question of "what should we name the nested field?", which is less critical. I think *.text is the least technical word, so that's what I went with in this initial writing of the documentation. But I'm open to suggestions.

webmat · 2018-10-19T20:17:53Z

Note that the build fails because of an unrelated change introduced in #139

ruflin · 2018-10-22T11:52:23Z

README.md

 and allows for [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html)
 (what Kibana visualizations are built on).

-In some cases, only one type of indexing makes sense for a field.
+By default, unless your index mapping specifies otherwise, ElasticSearch indexes


Elasticsearch

ruflin · 2018-10-22T11:53:01Z

README.md

 and allows for [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html)
 (what Kibana visualizations are built on).

-In some cases, only one type of indexing makes sense for a field.
+By default, unless your index mapping specifies otherwise, ElasticSearch indexes
+text field as `text` at the canonical field name, and indexes as second time


Note here: The ECS templates has keyword as default specified: https://github.com/elastic/ecs/blob/master/template.json#L13

Yes. In our case, the index template does specify otherwise. Perhaps I should indeed reword this a bit, to avoid confusion.

I was referring to the broader use case when Elasticsearch, not while using the ECS index template.

ruflin · 2018-10-22T11:54:24Z

README.md

-to sort them by frequency (that's an aggregation). They can also be long and
-varied enough that full text search can be useful on them.
+* Canonical field: `myfield` is `text`
+* Nested field: `myfield.keyword` is `keyword`


Lets not used nested field here normally people think of nested datatype in Eleasticsearch: https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html

Ok I can see how that can cause confusion. Looking at the documentation for multi-fields, there is no specific wording to reference that other field.

Do you have a suggestion on what we should call that nested field?

A "multi-field" sounds to me like it refers to all the fields pointing at the same data, not specifically the nested field.

Instead of canonical fields, I would use core data types (coming from the es docs) and use multi fields here. Something like the field inside the multi field?

ruflin · 2018-10-22T11:54:57Z

README.md


-Whenever both types of indexing are helpful, we use multi-fields indexing. The
-convention used is the following:
+For monitoring use cases, we need almost exclusively `keyword` indexing, with


Suggested change

For monitoring use cases, we need almost exclusively `keyword` indexing, with

For monitoring use cases, almost exclusively `keyword` indexing is needed, with

ruflin · 2018-10-22T11:55:31Z

README.md

-convention used is the following:
+For monitoring use cases, we need almost exclusively `keyword` indexing, with
+full text search on very few field fields. Given this premise, ECS defaults
+all text indexing to `keyword` at the top level (with only two exceptions).


Suggested change

all text indexing to `keyword` at the top level (with only two exceptions).

all text indexing to `keyword` at the top level (with only few exceptions).

I don't like the docs to become outdated just because we add or remove one.

ruflin · 2018-10-22T11:56:03Z

README.md

+full text search on very few field fields. Given this premise, ECS defaults
+all text indexing to `keyword` at the top level (with only two exceptions).
+Any use case that requires full text search indexing on additional fields
+can simply add a nested field for full text search.


It is a multi field not a nested field. Best also link here to the multi field docs.

ruflin · 2018-10-22T11:56:36Z

README.md


-The fields that only make sense as type `keyword` are not named `foo.raw`, the
-plain field (`foo`) will be of type `keyword`, with no nested field.
+#### Exceptions


I don't think we need to document this.

Good catch, forgot to drop that old text.

Sorry, misread that. I thought we still had a reference to .raw fields.

I think it's worthwhile to document the break from this new convention. These are widely used fields and they will behave exactly the reverse of this new convention we're introducing. I think it's helpful to make sure it's clear.

Or alternately, I would make them follow the new convention, but declare them right away as a multi-field. E.g. message is keyword and message.text is text. This would avoid the uncomfortable explanation of explaining two exceptions, and would let people do fast exact match filtering based on the message field without having to reintroduce message.keyword ;-)

As mentioned before, these are not exceptions for me.

ruflin · 2018-10-22T11:57:07Z

README.md

+
+The only two exceptions to this convention are fields `message` and `error.message`,
+which are indexed for full text search only, with no nested field.
+These two fields don't follow the new convention because they are deemed too big


I think the purpose of message is to be index so even if we would not have it as text today I think it should be text.

Not sure what you mean by this. message is indexed as text right now.

Are you mixing this up with my comment here #138 (review) 😄 ?

ruflin · 2018-10-22T11:57:50Z

README.md

+These two fields don't follow the new convention because they are deemed too big
+of a breaking change with these two widely used fields in Beats.
+
+Any future field that will be indexed for full text search in ECS will however


Let's really skip the Exception part here as I think we need to discuss when we encounter other fields with text purpose what we should do with it and not get ahead of us in the docs here.

ruflin

I'm good with getting these changes in and revise them later again.

In general I don't think we should explain Elasticsearch but only reference to the docs here but rather focus on explaining how it works in ECS based on the assumption, users will use the template.

ruflin · 2018-10-23T11:09:54Z

README.md

 and allows for [aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html)
 (what Kibana visualizations are built on).

-In some cases, only one type of indexing makes sense for a field.
+By default, unless your index mapping or index template specifies otherwise


Instead of writing this here, could we link to the ECS docs? We should if possible not explain how Elasticsearch works in ECS.

webmat · 2018-10-23T19:59:46Z

Rebased on top of the autopep8 fix.

webmat · 2018-10-24T17:39:53Z

Since it's been approved yesterday, I'll merge and we can indeed tweak later.

…ADME.

webmat mentioned this pull request Oct 19, 2018

Getting ECS to 1.0 #115

Closed

26 tasks

webmat requested review from ruflin, MikePaquette and robgil October 19, 2018 20:12

ruflin reviewed Oct 22, 2018

View reviewed changes

webmat force-pushed the text-indexing-readme branch from 1ebc3f2 to 8ece1e5 Compare October 22, 2018 19:36

ruflin approved these changes Oct 23, 2018

View reviewed changes

webmat force-pushed the text-indexing-readme branch from 8ece1e5 to 8fa1983 Compare October 23, 2018 19:59

webmat mentioned this pull request Oct 24, 2018

Make cloud.instance.name and kubernetes.instance.name multi-fields #119

Closed

Mathieu Martin added 3 commits October 24, 2018 13:42

Introduce the new convention for multi-fields text indexing to the RE…

7bc8651

…ADME.

Be a little more explicit in the changelog for @ruflin's PR elastic#137

535d1c2

feedback.apply

eb69685

webmat force-pushed the text-indexing-readme branch from 8fa1983 to eb69685 Compare October 24, 2018 17:42

webmat merged commit f9d5f01 into elastic:master Oct 24, 2018

webmat deleted the text-indexing-readme branch October 24, 2018 17:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce the new convention for multi-fields text indexing to the README. #140

Introduce the new convention for multi-fields text indexing to the README. #140

webmat commented Oct 19, 2018

webmat commented Oct 19, 2018 •

edited

Loading

webmat commented Oct 19, 2018

ruflin Oct 22, 2018

ruflin Oct 22, 2018

webmat Oct 22, 2018

ruflin Oct 22, 2018

webmat Oct 22, 2018

ruflin Oct 23, 2018

ruflin Oct 22, 2018

ruflin Oct 22, 2018

ruflin Oct 22, 2018

ruflin Oct 22, 2018

webmat Oct 22, 2018

webmat Oct 22, 2018 •

edited

Loading

ruflin Oct 23, 2018

ruflin Oct 22, 2018

webmat Oct 23, 2018

ruflin Oct 22, 2018

ruflin left a comment

ruflin Oct 23, 2018

webmat commented Oct 23, 2018

webmat commented Oct 24, 2018 •

edited

Loading

	For monitoring use cases, we need almost exclusively `keyword` indexing, with
	For monitoring use cases, almost exclusively `keyword` indexing is needed, with

	all text indexing to `keyword` at the top level (with only two exceptions).
	all text indexing to `keyword` at the top level (with only few exceptions).

Introduce the new convention for multi-fields text indexing to the README. #140

Introduce the new convention for multi-fields text indexing to the README. #140

Conversation

webmat commented Oct 19, 2018

webmat commented Oct 19, 2018 • edited Loading

webmat commented Oct 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

webmat Oct 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruflin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

webmat commented Oct 23, 2018

webmat commented Oct 24, 2018 • edited Loading

webmat commented Oct 19, 2018 •

edited

Loading

webmat Oct 22, 2018 •

edited

Loading

webmat commented Oct 24, 2018 •

edited

Loading