Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Meta] Migrate to ecs@mappings template shipped with Elasticsearch #8542

Closed
5 tasks done
joshdover opened this issue Nov 20, 2023 · 24 comments
Closed
5 tasks done

[Meta] Migrate to ecs@mappings template shipped with Elasticsearch #8542

joshdover opened this issue Nov 20, 2023 · 24 comments
Assignees
Labels
meta Team:Ecosystem Label for the Packages Ecosystem team [elastic/ecosystem]

Comments

@joshdover
Copy link
Contributor

joshdover commented Nov 20, 2023

Background

tl;dr - we plan to change all integrations index templates to reference the ecs@mappings component template as a fallback for any ECS fields that are not explicitly mapped by integrations.

As part of the Logs+ project, the Observability team has been improving the Stack's ability to make logs as searchable as possible out of the box. Part of this effort has included introducing a new ecs@mappings component template included in Elasticsearch by default. This template dynamically maps all ECS fields with appropriate mappings if more specific mappings are not defined by other component templates. Today, these default ECS mappings are applied to any logs-*-* data stream that uses the default logs-*-* index template, and is not in use by our integration data streams.

We want to bring this improvement to integration logs data streams to solve these problems:

  • Cases where an integration developer forgets to add an ECS mapping explicitly to their index and so the field is not searchable
  • Cases where a user appends a new ECS-compatible field with a custom Beats processor, but does not add a mapping
  • Keeping the ECS field mapping definitions in sync between the ECS repo, the ecs@mappings template in Elasticsearch, and integrations that explicitly reference ECS fields using our elastic-package dependencies.

By ensuring that all ECS-compatible fields get indexed and are searchable by default and with a single source of truth for the mapping definitions, we can solve these problems for users and integration developers by not requiring them to ever explicitly map ECS fields. Integration developers also will not need to update ECS versions in their packages anymore, relying the ecs@mappings template to always be up to date and in sync with the Stack version.

Phases

Step 1: Enable ecs@mappings on all logs data streams

The goal of this step is to:

  • Enable new integrations to use the ecs@mappings template instead of explicitly defined mappings in integration's fields definitions
  • Enable existing integrations to gracefully map any ECS fields that are not already explicitly mapped by the integration's field definitions

The proposal is to enable this by default for all integrations logs data streams managed by Fleet starting in some stack version (depending on when we land the change). On this Stack upgrade, existing data stream index templates will be automatically updated by Fleet to start referencing the ecs@mappings template, at lower precedence than the @package and @custom component templates.

We considered a more cautious approach and making this opt-in by each integration, but are leaning towards not doing this for the following reasons:

  • As far as we can tell, this is a non-breaking change and will not result in any dropped data or existing index fields being index differently
    • All fields use ignore_malformed to avoid dropping documents
    • Fields explicitly defined by the integration or users via the @custom template will override the ecs@mappings and continue to be mapped the same as before
  • It's more effort for both the Fleet Platform team and integration developers to support an opt-in feature
  • It's another option we have to deprecate

Step 2: Add support for index.query.default_fields on fields defined by ecs@mappings

In order to support migrating existing integrations to the new ecs@mappings fields, we need to change how Fleet installs integrations and configured the index.query.default_fields setting so that searches in Kibana that don't specify a field name also will search against fields mapped by ecs@mappings.

Today, Fleet will automatically populate the index.query.default_fields setting to the list of fields explicitly defined by an integration's field definitions. This needs to change to allow the default again of default_fields: ["*"] without introducing maxClauseCount errors, see discussion and possible solutions here: elastic/elasticsearch#102378

Following these Elasticsearch improvements, we will need to introduce package-spec and Fleet changes to change how we specify the default_field setting. This should probably be opt-in to give integration developers a chance to test their changes, but we should consider whether or not this is necessary if there are no breaking changes in the ES change that gets introduced.

Step 3: Enable ECS system tests in elastic-package

We should enable integration developers to run a standard set of system tests against their data streams with documents of ECS data to ensure that any mapping overrides of ECS fields specified by the integration are always compatible with ECS.

@P1llus can you link to our existing test suite for the legacy dynamic templates?

Step 4: Remove explicit ECS mappings from existing integrations

At this point, existing integrations can start removing explicit ECS mappings and rely on the source of truth ecs@mappings template shipped with the Stack version.
For metrics datastream we need to resolve #8623 before removing ecs.yml

Tasks

  1. Team:Fleet
    gsantoro
  2. 4 of 4
    zmoog
  3. Team:Integrations
    zmoog
  4. Team:Search
    gsantoro
  5. Team:Search
    flash1293 gsantoro
@joshdover joshdover added meta Team:Ecosystem Label for the Packages Ecosystem team [elastic/ecosystem] labels Nov 20, 2023
@ruflin
Copy link
Member

ruflin commented Nov 21, 2023

The proposal is to enable this by default for all integrations logs data streams managed by Fleet starting in some stack version (depending on when we land the change).

I'm proposing to enable this not only on logs data streams but all data streams for integrations.

@jsoriano
Copy link
Member

The proposal is to enable this by default for all integrations logs data streams managed by Fleet starting in some stack version (depending on when we land the change).

I'm proposing to enable this not only on logs data streams but all data streams for integrations.

+1 to enable this for all data streams. Maybe we can split this in two phases, but I think it would be confusing to have ECS mappings imported automatically in some data streams but not others.

Enable ECS system tests in elastic-package

We already have some code that loads ECS mappings for validation of documents in tests. It is loaded now under certain conditions, basically for packages that use import_mappings. We could add another condition to add it also for stacks that include this mapping.

@joshdover
Copy link
Contributor Author

Sure we can do other data streams as well. Metrics seems obvious, but should we do the same for traces? cc @axw

We already have some code that loads ECS mappings for validation of documents in tests. It is loaded now under certain conditions, basically for packages that use import_mappings. We could add another condition to add it also for stacks that include this mapping.

Makes sense to me 👍

There were no major concerns raised from the email that was sent out and the TTL has passed. @jsoriano @jen-huang I think we can move this into the Fleet & Integrations backlog for prioritization and scoping for step 1. Further steps are blocked by Elasticsearch changes at this time.

@ishleenk17
Copy link
Contributor

@joshdover @ruflin : I agree this should be done for metrics datastream as well.
But for metrics datastream, when we come to a point of removal of the ecs.yml file, we need to ensure that the fields which are dimensions are taken care of in the ecs mappings being done at ES. Eg: In IIS's ecs mapping
Is this something already being taken care of ES ?

@axw
Copy link
Member

axw commented Nov 28, 2023

Sure we can do other data streams as well. Metrics seems obvious, but should we do the same for traces? cc @axw

We have done this in the new Elasticsearch apm-data plugin: https://github.com/elastic/elasticsearch/blob/9551fcef4024dc6bfe63ff6e60ff63652306a10c/x-pack/plugin/apm-data/src/main/resources/index-templates/traces-apm%40template.yaml#L18

That's just for traces-apm-* at the moment. In the future it should apply to traces-*, and at that point it would probably make sense to move the template to the core plugin alongside the templates for logs-* and metrics-*.

I don't think there's any point in changing Fleet to do this for traces-* now. We will be relying on the base templates for dynamic mapping, and the APM integration package will become input-only.

@felixbarny
Copy link
Member

But for metrics datastream, when we come to a point of removal of the ecs.yml file, we need to ensure that the fields which are dimensions are taken care of in the ecs mappings being done at ES. Eg: In IIS's ecs mapping
Is this something already being taken care of ES ?

I don't think that we'll add dimension definition to ecs@mappings. Metric integrations would then either need to explicitly define fields that are dimensions or rely on elastic/elasticsearch#98384.

@ruflin
Copy link
Member

ruflin commented Nov 28, 2023

I don't think there's any point in changing Fleet to do this for traces-* now.

I don't know the code that would do it, but if we end up with if (type == "traces") { don't do it } I suggest we added it for traces too, it will just not have any additional effect but we have simpler code and cosistency.

ecs fields and dimensions

There is a third option here that we potentially introduce an additional component template with the most common dimensions. This component template is then linked for tsdb indices by Fleet. As soon as elastic/elasticsearch#98384 lands, this becomes obsolete. @ishleenk17 It might be worth to create a separate Github issue to discuss this in more details and also link it in the description.

@axw
Copy link
Member

axw commented Nov 28, 2023

I don't think there's any point in changing Fleet to do this for traces-* now.

I don't know the code that would do it, but if we end up with if (type == "traces") { don't do it } I suggest we added it for traces too, it will just not have any additional effect but we have simpler code and cosistency.

Fair. It shouldn't make a functional difference if it is added, so if it makes the code simpler then that's fine.

@ishleenk17
Copy link
Contributor

It might be worth to create a separate Github issue to discuss this in more details and also link it in the description.

I have created this ticket for tracking and added to description as well.

@muthu-mps
Copy link
Contributor

@joshdover - As this changes is going to remove the dependency of updating ECS versions.

@joshdover
Copy link
Contributor Author

@muthu-mps this hasn't been implemented yet and we're not ready for integrations to yet move to this pattern. It will also require bumping the constraints.kibana.version to which version we ship this change in, which may or may not be acceptable for a given integration.

gsantoro added a commit to elastic/kibana that referenced this issue Jan 31, 2024
## Summary

I added a reference to the component template `ecs@mappings` when
building any index template for an integration.

The same behaviour is valid for both logs and metrics index templates.

Linked to elastic/integrations#8542

Close #174905

### Checklist

Delete any items that are not applicable to this PR.

- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: Kyle Pollich <kyle.pollich@elastic.co>
Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>
@ebeahan
Copy link
Member

ebeahan commented Feb 8, 2024

Question for folks involved in the effort here: which group or team owns keeping the ecs@mappings template up to date in Elasticsearch?

@felixbarny
Copy link
Member

We've set up an automated test that checks that the ecs@mappings component template maps all ECS fields to the correct field type. If fails, it sends out an email and notifies some slack channels. See also https://github.com/elastic/elasticsearch/blob/main/.buildkite/pipelines/ecs-dynamic-template-tests.yml.

@andrewkroh
Copy link
Member

Great. It looks like this test pulls from ECS main, generates fake data based on the declared type for each field, then I assume verifies that it gets the right mapping. 👍

https://github.com/elastic/elasticsearch/blob/fbceb7fb93e58d07be8592530c1f6d9fc19ddb13/x-pack/plugin/stack/src/javaRestTest/java/org/elasticsearch/xpack/stack/EcsDynamicTemplatesIT.java#L58

CoenWarmer pushed a commit to CoenWarmer/kibana that referenced this issue Feb 15, 2024
## Summary

I added a reference to the component template `ecs@mappings` when
building any index template for an integration.

The same behaviour is valid for both logs and metrics index templates.

Linked to elastic/integrations#8542

Close elastic#174905

### Checklist

Delete any items that are not applicable to this PR.

- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: Kyle Pollich <kyle.pollich@elastic.co>
Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>
@gsantoro
Copy link
Contributor

gsantoro commented Feb 21, 2024

I would like to provide a summary of what's the current status:

  • Step 0: ecs@mappings component template is installed by Elasticsearch and referenced by logs-*-* index template
  • Step 1: ecs@mappings component template is referenced by any integration (both logs and metrics) starting from 8.13.0
  • Step 2: we removed default_fields: ["message"] from logs@mappings component template. Not having that default_fields is like having default_fields: ["*"]. Tests has been added.
  • Step 2: with Lower field limit in integrations #177447 we will lower total_fields.limit to 1k in Fleet code for all integrations. If there are more fields than that limit, ignore_dynamic_beyond_limit will make sure those fields will be ignored, added to _ignored and we won't hit the maxClauseCount limit
  • Step 2: with Remove code to calculate default_fields in Fleet #177605 we want to remove from Fleet the code that calculate default_fields from the fields defined in the integrations. Removing this field entirely will default to default_fields: ["*"]
  • Step 3: we tested the current behaviour after upgrading to 8.13.0 that you get both "legacy" (from integrations) and new dynamic mappings from ecs@mappings.
  • Step 3: with Enable ECS system tests in elastic-package (Step 3) #1649 we want to document how to remove ecs.yml from an integration to instead use ecs@mappings. We will need to fix package-spec and elastic-package integration tests.
  • Step 4: we will ask integrations devs to remove references to ecs mappings from the integrations

Hey @felixbarny, @ruflin , @zmoog, can you please let me know if I missed anything or if I got something wrong

@gsantoro
Copy link
Contributor

A couple of questions still left:

  • in issue 99872 we talk about removing default_field: ["message"]. I can't find it in 8.13.0-SNAPSHOT cluster at logs@settings. Did we remove it already?
  • in Step 2 we say we will need to introduce package-spec and Fleet changes to change how we specify the default_field setting. I am not sure what changes are involved in package-spec. I'll have a look and maybe add an issue later here. Does anyone have more info about this part?

@felixbarny or @ruflin

@ruflin
Copy link
Member

ruflin commented Feb 22, 2024

@felixbarny
Copy link
Member

we will need to introduce package-spec and Fleet changes to change how we specify the default_field setting

In essence, there should be a flag for integrations to opt-in or opt-out of the generation of the default_field. When disabled, the default field stetting should not be set, or set to "*" (my preference is to not set and instead rely on the defaults).

@gsantoro
Copy link
Contributor

@felixbarny I might have missed the discussion about introducing a flag to opt-in or opt-out of the default_field behaviour in Fleet. Was this already discussed and approved?

While I would generally agree to introduce a flag to be on the safe side, it would make the implementation a bit more complicated, it won't solve the problem of adding the dynamic fields from ecs@mappings, and it will not be aligned with the default behaviour in Elasticsearch.

Also, while looking at the code, I noticed a hard limit of 1024 fields in Kibana that I wasn't aware of.

https://github.com/elastic/kibana/blob/3ef768aa45601bc3b089107eeadb2f965e743487/x-pack/plugins/fleet/server/services/epm/elasticsearch/template/default_settings.ts#L53

This is a static value that doesn't depend on total_fields.limit or any other variable. Also we don't sort the fields in any way, so we can't assume that those will be the first 1024.

For all those reasons, I would lean toward removing the field calculation altogether from the Kibana code.

@ruflin
Copy link
Member

ruflin commented Feb 23, 2024

Interesting find in the Kibana code. Does it mean that even if integrations have more then 1024 fields, these were never added, at least on the query side? If yes, I agree we can just go with the new setting.

There is still the total field limit meaning that even though by default users didn't search on these fields, these were still indexed up to 10k fields. I would like to see this field limit also be changed to 1024 (aka removed) by default and if an integration needs more fields, overwrites it. I wonder if this is already possible today through the Elasticsearch settings in the dataset so no additional changes on our end would be needed. This would also allow users to overwrite dynamic field limit if needed.

@gsantoro
Copy link
Contributor

@ruflin , I was surprised as well but that seems to be the behaviour. Truncate the default fields to 1024 no matter what.

About lowering the total field limit I have created Lower field limit in integrations #177447

@felixbarny
Copy link
Member

The limit in Fleet is in place to avoid hitting the max clause limit in Elasticsearch. The max clause limit depends on the hardware configuration the ES nodes are running on and it's not impacted by the max total fields limit.

I'd also really like to get rid of the default_field generation in Fleet but there's a risk that for some integrations (like input integrations with an unknown number of fields), this could lead to hitting the max clause limit in ES. Maybe the risk is small enough, though. If we thing the risk is too high to be tolerable in a minor version, we should only remove setting the default_field in Fleet for integrations where we're also lowering the total field limit.

fkanout pushed a commit to fkanout/kibana that referenced this issue Mar 4, 2024
## Summary

I added a reference to the component template `ecs@mappings` when
building any index template for an integration.

The same behaviour is valid for both logs and metrics index templates.

Linked to elastic/integrations#8542

Close elastic#174905

### Checklist

Delete any items that are not applicable to this PR.

- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: Kyle Pollich <kyle.pollich@elastic.co>
Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>
@ishleenk17
Copy link
Contributor

@gsantoro @felixbarny
I see this task as completed, so can the Integrations also consider not having ecs.yml mappings now? Will it be taken care by default?
For Metrics Integrations, I suppose that's not true looking at this comment.

In summary, this might imply that there is no change for integrations regarding ECS mappings; we will continue to include them as we currently do?

@lalit-satapathy @tommyers-elastic

@ruflin
Copy link
Member

ruflin commented Mar 27, 2024

@zmoog will provide documentation shortly and also run a session on the migration details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta Team:Ecosystem Label for the Packages Ecosystem team [elastic/ecosystem]
Projects
None yet
Development

No branches or pull requests

10 participants