Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Composable templates] Improve support for customizing settings and mappings for groups of data streams #92426

Closed
joshdover opened this issue Dec 16, 2022 · 14 comments
Assignees
Labels
:Data Management/Data streams Data streams and their lifecycles :Data Management/Indices APIs APIs to create and manage indices and templates >enhancement Team:Data Management Meta label for data/management team

Comments

@joshdover
Copy link
Contributor

joshdover commented Dec 16, 2022

Why

Integration users want to be able to define mappings and index settings that get applied to different groups of indices, at different levels of granularity:

  • global (eg *)
  • per-type (eg. logs-*)
  • per-package (eg. *-nginx.*-*)
  • per-dataset (eg. logs-nginx.access-*) - we support this today
  • per-dataset-per-namespace (eg. logs-nginx.access-foo)
  • per-namespace (eg. *-*-foo)

Today, Fleet will install integrations with an empty <type>-<dataset>@custom component template that users can use to override any index setting mapping or mapping defined by the package. This only solves one of the levels of granularity desired, and adding additional component templates at each possible level will lead to explosion in the number of templates Fleet creates, the vast majority of which will be empty, creating a poor UX in the template UIs and APIs.

We want to find another solution that would not lead to this explosion of templates.

Proposed solution: Make composable template existence optional

Today if you try to create an index template that references a component template that does not exist in the composedOf field, Elasticsearch will return a 400 error. This solution would add a new option to allow some templates to not exist at creation of the index template. This would allow the user or Fleet to create initialize index templates that reference component templates for different levels of granularity that will only be applied once the template is actually exists. For instance (just a demonstration, the precedence here is debatable still):

PUT _index_template/logs-nginx.access-foo?ignore_missing=true
{
  "index_patterns": ["logs-nginx.access-*"],
  "template": {
    
  },
  "priority": 250,
  "composed_of": [
    "logs-nginx.access@package",
	"global@custom",
    "logs@custom",
    "global-foo@custom",
	"logs-nginx@custom",
    "logs-nginx.access@custom",
    "logs-nginx.access-foo@custom",
    ".fleet_globals-1",
    ".fleet_agent_id_verification-1"
  ]
}

Pros:

  • Relatively small change for Elasticsearch that unblocks many use cases
  • The precedence rules are well defined by the model and we can rely upon them when building integrations and improving the functionality of package installation and upgrades.
  • Does not reverse the gains we made with composable templates over template merging which were hard to understand and debug

Cons:

  • If a user doesn't agree with the precedence rules between these groups that we've set, there's no way to override them without changing an asset managed by Fleet (the index template).
  • Namespace-level granularity still requires that Fleet create a new index template each time a user configures an integration to ship to a new namespace. In some cases, this will be dynamic ([Fleet] Dynamic data stream namespaces kibana#134971) and not known to Fleet ahead of time, so the namespace-specific index template will not yet exist and creating a component template with the name logs-nginx.access-foo@custom will not do anything.
  • Relies on a naming convention which may be brittle
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@ruflin
Copy link
Member

ruflin commented Dec 19, 2022

I did play around a bit with the Elasticsearch code. Commenting out the following lines has the effect that validation does not happen:

https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java#L514-L519

I suggest we take inspiration from #87354 and add a config option ignore_missing_component_template: {boolean} which would then look like the following:

PUT _index_template/logs-nginx.access-foo
{
  "index_patterns": ["logs-nginx.access-*"],
  "priority": 250,
  "ignore_missing_component_template: true",
  "composed_of": [
    "logs-nginx.access@package",
    ...
  ]
}

Happy to open a pull request with the proposed change in case the team agrees.

ruflin added a commit to ruflin/elasticsearch that referenced this issue Dec 19, 2022
As described in elastic#92426 a config option to not ignore missing component templates is needed. This introduces the config option `ignore_missing_component_templates`. If set to true, missing component templates are ignored. If set to false or not set at all, the existing behaviour applies.

This is currently a draft PR as it only contains the functionality. It still needs tests and docs. Goal is to get an initial set of opinions on it.
@ruflin
Copy link
Member

ruflin commented Dec 19, 2022

I put up a draft PR with the change implemented: #92436 Misses test, docs etc. and I'm sure I missed a few bits that also need to be changed but so far it seems to be working.

@joshdover
Copy link
Contributor Author

I like having it as a property of the template rather than a request-time parameter. From the Fleet side, I'm +1 on this direction.

@ruflin
Copy link
Member

ruflin commented Dec 30, 2022

@dakrone Any feedback on the proposed implementation plan for this? #92436

@dakrone
Copy link
Member

dakrone commented Dec 30, 2022

@ruflin I am working on a set of proposals for us to discuss and decide between. Today and Monday are holidays in the US, so we can discuss sometime next week.

@dakrone dakrone self-assigned this Dec 30, 2022
@dakrone
Copy link
Member

dakrone commented Jan 5, 2023

We discussed some possible solutions for this. I'll put the discussed options here:

Option 1 — keep things the way they are

Always an option, we could keep the template stuff the same way it currently works. Instead, we could potentially invest time into making templates easier to manage, improving the UX without adding additional leniency.

Pros
Easiest to implement.
Adds the least amount of leniency to Elasticsearch.
May encourage us to improve the user experience of managing many templates.
Cons
Managing a large number of component templates is currently cumbersome.

Option 2 — total leniency

This is the option the Fleet team suggested in the linked Github issue. In this option the existence of component templates is not required. If a component template does not exist at template or index creation time, it will be silently ignored.

Pros
Simple to implement.
Leniency is generally easier to get started for users.
Cons
Adds a large amount of leniency; typos in a component name will go unnoticed until index creation time.
Deleting a component template incurs action-at-a-distance of unintended consequences affecting an unknown number of composable index templates.

Option 3 — template-specific leniency

In this option (proposed PR), rather than making all component template existence lenient, we would allow the user to opt-in to leniency with a setting put on the composable index template itself. An example of what this could look like is (ignore the naming):

PUT /_index_template/my_template
{
  "index_patterns": ["logs-*-*"],
  "data_stream": {},
  "composed_of": ["component-1", "thing@custom"],
  "ignore_missing_component_templates": true
}

If this setting is set to 'true', then any missing component template will be ignored for this index template only. Templates without this setting, or with the setting set to false, would still require the existence of all of their component templates.

Pros
Leniency is only added for a single template, instead of all templates.
Cons
Same downsides as option two, however, limited to a single index template instead of all templates.

Option 4 — name-based leniency

For this option, rather than leniency for all component templates, Elasticsearch would codify the component template naming scheme along with its leniency. This means that any component template ending with "@Custom" would leniently be ignored if it does not exist. In the following example:

PUT /_index_template/my_template
{
  "index_patterns": ["logs-*-*"],
  "data_stream": {},
  "composed_of": ["component-1", "logs@custom", "global@custom"],
}

The two bolded component templates, "logs@custom" and "global@custom", may optionally exist. If they do not exist, then they will be leniently ignored.

Pros
No need to opt-in or configure an index template specially for the leniency.
Leniency is limited to a subset of composable templates, so typos will still be caught for other templates.
Cons
Leniency is not self-documenting, which may be confusing to users who don't understand why component templates must exist unless they end in "@Custom".
Codifying @custom suffix means we will be stuck with it for a long time.

Option 5 — template-specific and name-based leniency

This is a combination of options 3 and 4, to reduce the leniency further. For this proposal the non-existence of component templates ending with "@Custom" will be treated leniently only if the template setting is set to opt-in to the lenient behavior. For example:

PUT /_index_template/my_template
{
  "index_patterns": ["logs-*-*"],
  "data_stream": {},
  "composed_of": ["component-1", "logs@custom", "global@custom"],
  "ignore_missing_custom_component_templates": true
}

The existence of "logs@custom" and "global@custom" will behave leniently because the "ignore_missing_custom_component_templates" (naming to be determined later) settings has been set to "true".

Pros
The strictest we can be while still making changes to allow leniency.
Users are aware of behavior due to the new setting being present.
Leniency is opt-in rather than implicit.
Cons
Requires updating templates with the new setting.
Codifies the @Custom suffix as the standard for this behavior.
Setting may not indicate what it does (meaning the user will need to read documentation to understand its use.)

Option 6 — specify which templates can be optional

For this option, instead of either hardcoding the name of the templates that can be missing, or adding leniency for all component templates, the user/package specifies which component templates are allowed to be missing. The list could be either concrete names, or support rudimentary wildcards (so "*@Custom" could be allowed.)

PUT /_index_template/my_template
{
  "index_patterns": ["logs-*-*"],
  "data_stream": {},
  "composed_of": ["component-1", "logs@custom", "global@custom"],
  "ignore_missing_component_templates": ["missing-1", "*@custom"]
}

The existence of "logs@custom" and "global@custom" will behave leniently because all "*@Custom" component templates are allowed to be missing.

Pros
Explicitly shows user which component template presence will be ignored
Users are aware of behavior due to the new setting being present.
Leniency is opt-in rather than implicit.
Does not necessitate hardcoding the @Custom suffix into ES.
Cons
Requires updating templates with the new setting.


After discussing this on the Data Management team, we decided that we like option 6 the best.

@dakrone
Copy link
Member

dakrone commented Jan 5, 2023

@ruflin are you still interested in the implementation for this? (i.e., the implementation of option 6, which is only slightly different than your existing PR) If you are, great! If not let me know and I can see where it will fit into our existing work.

@ruflin
Copy link
Member

ruflin commented Jan 9, 2023

++ on option 6. Happy to take on the implementation on my end.

I suggest to implement option 6 in two steps:

  1. Exact match of component templates in ignore_missing_component_templates, no support for *
  2. Add support for * if there is actually a need for it

@dakrone Is that ok for you and team?

@joshdover
Copy link
Contributor Author

++ to option 6. Updating the existing templates will not necessarily be an issue for Fleet's use case. We already have a versioning scheme on package installations to be able to detect when we need to reinstall templates due to a change like this.

@dakrone
Copy link
Member

dakrone commented Jan 9, 2023

@ruflin that implementation plan sounds great to me, thanks for taking it on.

@joshdover
Copy link
Contributor Author

@hop-dev @kpollich FYI this change is coming in which should help us implement some of the customization APIs we've been interested in.

@ruflin
Copy link
Member

ruflin commented Jan 16, 2023

#92436 is now in pretty good shape. Functionality wise it all works as expected but likely code still needs a bit of cleanup. Would be great to get some initial feedback.

dakrone added a commit that referenced this issue Jan 31, 2023
This change introduces the configuration option `ignore_missing_component_templates` as discussed in #92426 The implementation [option 6](#92426 (comment)) was picked with a slight adjustment meaning no patterns are allowed.

## Implementation

During the creation of an index template, the list of component templates is checked if all component templates exist. This check is extended to skip any component templates which are listed under `ignore_missing_component_templates`. An index template that skips the check for the component template `logs-foo@custom` looks as following:


```
PUT _index_template/logs-foo
{
  "index_patterns": ["logs-foo-*"],
  "data_stream": { },
  "composed_of": ["logs-foo@package", "logs-foo@custom"],
  "ignore_missing_component_templates": ["logs-foo@custom"],
  "priority": 500
}
```

The component template `logs-foo@package` has to exist before creation. It can be created with:

```
PUT _component_template/logs-foo@custom
{
  "template": {
    "mappings": {
      "properties": {
        "host.ip": {
          "type": "ip"
        }
      }
    }
  }
}
```

## Testing

For manual testing, different scenarios can be tested. To simplify testing, the commands from `.http` file are added. Before each test run, a clean cluster is expected.

### New behaviour, missing component template

With the new config option, it must be possible to create an index template with a missing component templates without getting an error:

```
### Add logs-foo@package component template

PUT http://localhost:9200/
    _component_template/logs-foo@package
Authorization: Basic elastic password
Content-Type: application/json

{
  "template": {
    "mappings": {
      "properties": {
        "host.name": {
          "type": "keyword"
        }
      }
    }
  }
}

### Add logs-foo index template

PUT http://localhost:9200/
    _index_template/logs-foo
Authorization: Basic elastic password
Content-Type: application/json

{
  "index_patterns": ["logs-foo-*"],
  "data_stream": { },
  "composed_of": ["logs-foo@package", "logs-foo@custom"],
  "ignore_missing_component_templates": ["logs-foo@custom"],
  "priority": 500
}

### Create data stream

PUT http://localhost:9200/
    _data_stream/logs-foo-bar
Authorization: Basic elastic password
Content-Type: application/json

### Check if mappings exist

GET http://localhost:9200/
    logs-foo-bar
Authorization: Basic elastic password
Content-Type: application/json
```

It is checked if all templates could be created and data stream mappings are correct.

### Old behaviour, with all component templates

In the following, a component template is made optional but it already exists. It is checked, that it will show up in the mappings:

```
### Add logs-foo@package component template

PUT http://localhost:9200/
    _component_template/logs-foo@package
Authorization: Basic elastic password
Content-Type: application/json

{
  "template": {
    "mappings": {
      "properties": {
        "host.name": {
          "type": "keyword"
        }
      }
    }
  }
}

### Add logs-foo@custom component template

PUT http://localhost:9200/
    _component_template/logs-foo@custom
Authorization: Basic elastic password
Content-Type: application/json

{
  "template": {
    "mappings": {
      "properties": {
        "host.ip": {
          "type": "ip"
        }
      }
    }
  }
}

### Add logs-foo index template

PUT http://localhost:9200/
    _index_template/logs-foo
Authorization: Basic elastic password
Content-Type: application/json

{
  "index_patterns": ["logs-foo-*"],
  "data_stream": { },
  "composed_of": ["logs-foo@package", "logs-foo@custom"],
  "ignore_missing_component_templates": ["logs-foo@custom"],
  "priority": 500
}

### Create data stream

PUT http://localhost:9200/
    _data_stream/logs-foo-bar
Authorization: Basic elastic password
Content-Type: application/json

### Check if mappings exist

GET http://localhost:9200/
    logs-foo-bar
Authorization: Basic elastic password
Content-Type: application/json
```

### Check old behaviour

Ensure, that the old behaviour still exists when a component template is used that is not part of `ignore_missing_component_templates`: 

```
### Add logs-foo index template

PUT http://localhost:9200/
    _index_template/logs-foo
Authorization: Basic elastic password
Content-Type: application/json

{
  "index_patterns": ["logs-foo-*"],
  "data_stream": { },
  "composed_of": ["logs-foo@package", "logs-foo@custom"],
  "ignore_missing_component_templates": ["logs-foo@custom"],
  "priority": 500
}
```

Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
@dakrone
Copy link
Member

dakrone commented Jan 31, 2023

This has been resolved by #92436, so I'm closing this issue as fixed.

@dakrone dakrone closed this as completed Jan 31, 2023
mark-vieira pushed a commit to mark-vieira/elasticsearch that referenced this issue Jan 31, 2023
This change introduces the configuration option `ignore_missing_component_templates` as discussed in elastic#92426 The implementation [option 6](elastic#92426 (comment)) was picked with a slight adjustment meaning no patterns are allowed.

## Implementation

During the creation of an index template, the list of component templates is checked if all component templates exist. This check is extended to skip any component templates which are listed under `ignore_missing_component_templates`. An index template that skips the check for the component template `logs-foo@custom` looks as following:


```
PUT _index_template/logs-foo
{
  "index_patterns": ["logs-foo-*"],
  "data_stream": { },
  "composed_of": ["logs-foo@package", "logs-foo@custom"],
  "ignore_missing_component_templates": ["logs-foo@custom"],
  "priority": 500
}
```

The component template `logs-foo@package` has to exist before creation. It can be created with:

```
PUT _component_template/logs-foo@custom
{
  "template": {
    "mappings": {
      "properties": {
        "host.ip": {
          "type": "ip"
        }
      }
    }
  }
}
```

## Testing

For manual testing, different scenarios can be tested. To simplify testing, the commands from `.http` file are added. Before each test run, a clean cluster is expected.

### New behaviour, missing component template

With the new config option, it must be possible to create an index template with a missing component templates without getting an error:

```
### Add logs-foo@package component template

PUT http://localhost:9200/
    _component_template/logs-foo@package
Authorization: Basic elastic password
Content-Type: application/json

{
  "template": {
    "mappings": {
      "properties": {
        "host.name": {
          "type": "keyword"
        }
      }
    }
  }
}

### Add logs-foo index template

PUT http://localhost:9200/
    _index_template/logs-foo
Authorization: Basic elastic password
Content-Type: application/json

{
  "index_patterns": ["logs-foo-*"],
  "data_stream": { },
  "composed_of": ["logs-foo@package", "logs-foo@custom"],
  "ignore_missing_component_templates": ["logs-foo@custom"],
  "priority": 500
}

### Create data stream

PUT http://localhost:9200/
    _data_stream/logs-foo-bar
Authorization: Basic elastic password
Content-Type: application/json

### Check if mappings exist

GET http://localhost:9200/
    logs-foo-bar
Authorization: Basic elastic password
Content-Type: application/json
```

It is checked if all templates could be created and data stream mappings are correct.

### Old behaviour, with all component templates

In the following, a component template is made optional but it already exists. It is checked, that it will show up in the mappings:

```
### Add logs-foo@package component template

PUT http://localhost:9200/
    _component_template/logs-foo@package
Authorization: Basic elastic password
Content-Type: application/json

{
  "template": {
    "mappings": {
      "properties": {
        "host.name": {
          "type": "keyword"
        }
      }
    }
  }
}

### Add logs-foo@custom component template

PUT http://localhost:9200/
    _component_template/logs-foo@custom
Authorization: Basic elastic password
Content-Type: application/json

{
  "template": {
    "mappings": {
      "properties": {
        "host.ip": {
          "type": "ip"
        }
      }
    }
  }
}

### Add logs-foo index template

PUT http://localhost:9200/
    _index_template/logs-foo
Authorization: Basic elastic password
Content-Type: application/json

{
  "index_patterns": ["logs-foo-*"],
  "data_stream": { },
  "composed_of": ["logs-foo@package", "logs-foo@custom"],
  "ignore_missing_component_templates": ["logs-foo@custom"],
  "priority": 500
}

### Create data stream

PUT http://localhost:9200/
    _data_stream/logs-foo-bar
Authorization: Basic elastic password
Content-Type: application/json

### Check if mappings exist

GET http://localhost:9200/
    logs-foo-bar
Authorization: Basic elastic password
Content-Type: application/json
```

### Check old behaviour

Ensure, that the old behaviour still exists when a component template is used that is not part of `ignore_missing_component_templates`: 

```
### Add logs-foo index template

PUT http://localhost:9200/
    _index_template/logs-foo
Authorization: Basic elastic password
Content-Type: application/json

{
  "index_patterns": ["logs-foo-*"],
  "data_stream": { },
  "composed_of": ["logs-foo@package", "logs-foo@custom"],
  "ignore_missing_component_templates": ["logs-foo@custom"],
  "priority": 500
}
```

Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Data streams Data streams and their lifecycles :Data Management/Indices APIs APIs to create and manage indices and templates >enhancement Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

4 participants