Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow inputs / module to define a target index. #13255

Closed
7 tasks done
ph opened this issue Aug 15, 2019 · 17 comments
Closed
7 tasks done

Allow inputs / module to define a target index. #13255

ph opened this issue Aug 15, 2019 · 17 comments

Comments

@ph
Copy link
Contributor

ph commented Aug 15, 2019

We are currently working on the agent and we require that inputs (or modules depending on the beats) allow a user to define to which index the fetched data must be sent to.

So the high-level requirement is to add a new setting field that the users can specify the index they want to target and the elasticsearch output should take care of reading that value and route the events to the appropriate destination.

I think the configuration for Metricbeat and Filebeat would look like this:

filebeat.inputs:
 - type: log
   paths: /var/log/message
   index: mycustomindex
- type: log
  index: anotheindex
metricbeat.modules:
 - modules: aws
   index: awsevents
- type: docker
  index: dockerevents

Tasks:

@urso
Copy link

urso commented Aug 20, 2019

The Elasticsearch output can pick the index name from (beat.Event).Meta["index"], if set.

We can either require the inputs/modules to explicitely set Meta["index"], or add a setting for beat.ClientConfig to set the index name dynamically (via Meta["index"]). The Meta["index"] becomes the @metadata.index field in the spool file, or when publish to another output.

Currently Beats index settings in the outputs allow users to configure dynamic format strings. These allow users to set the index name based on events contents (this is used to for the timestamps).
Besides full dynamic expension, we have places with 'Scoped' string expansion (Beats scope, like agent version and name).

If users run with ILM, then we want them to configure and use a write alias. But with standalone agent or without ILM, what should the index name be?

Some options for the index setting:

  • static string only
  • Scoped only (allow user to set agent version)
  • Scoped + timestamp (allow user to configure daily indices, but don't allow users to access Beats event fields)
  • Fully dynamic (like the outputs index/indices settings).

Being able to configure the index name in the input means that we might have indices different from the indices provisioned via Beats index management. Do we want to forbid users to set this setting, but allow it being configured via Fleet only? Or do we want to add support to provision indices via the agent (the idxmgmt package should be usable in isolation)?

@ph
Copy link
Contributor Author

ph commented Aug 20, 2019

Being able to configure the index name in the input means that we might have indices different from the indices provisioned via Beats index management.

Do we want to forbid users to set this setting, but allow it being configured via Fleet only? Or do we want to add support to provision indices via the agent (the idxmgmt package should be usable in isolation)?

Not sure if we should forbid the users to use it without fleet. My reasoning behind this is users can do this today using index formatted strings in the output add adding fields on each event. Currently, when a user choose to do this, they are a bit on their own for management: change templates, index patterns.

Now, in the context of UI driven experience I presume the following is possible:

  • Allow users to define the index that the input will target.
  • UI can check if a template is matching that new index name.
  • Propose steps for ILM or templates management if required.

cc @ruflin @mattapperson @michalpristas for awareness

@ruflin
Copy link
Member

ruflin commented Aug 21, 2019

What we support on the input level for index options does not require the full feature set we offer on the global one. I would be happy if we support in a first version only pure static strings only inside the input. I would hope this simplifies the implementation.

@urso
Copy link

urso commented Aug 21, 2019

Implementation wise one or the other is no big difference, if at all. The question is more like: what do we want allow users to configure. With static strings we have to expect either a write_alias always, ILM always, or some other entity creating daily indices if ILM is not enabled.

Just having a constant string means, we don't know, but we don't allow to publish to daily indices directly via Beats. We just hope for the best.

@ph ph assigned faec Sep 5, 2019
@ph
Copy link
Contributor Author

ph commented Sep 9, 2019

@faec looking at the notes from @urso and @ruflin I would also say we want to have it static in the beginning. I think our assumptions would be to have them use ILM to have the same behavior as daily indices.

@ph
Copy link
Contributor Author

ph commented Sep 9, 2019

Also, I am looking at the problem with the checks we want to add to the agent on startups, if we make the index completely dynamic it would be near impossible to know if an indice is created upfront before sending any data to ES.

@ruflin I am sure that we would want a version in the index name no?

@ruflin
Copy link
Member

ruflin commented Sep 10, 2019

I think adding a version to the index name is the responsibility of the user or system that configures it. So if we want apache-2 as index name, this would be directly given in the config. No need for magic on the agent side to figure out a version.

Right now the format of an index heavily depends on the version of the Beat. As in the future it's only inputs and config files which mostly dictate the format, the beat version becomes almost irrelevant.

@faec
Copy link
Contributor

faec commented Sep 11, 2019

Just met with @ruflin about this

It sounds like the API need is for a single static string per input, which is easy to implement but has the risk of being confusing if people expect all the features that come with other index settings, or for it to interact nicely with other settings instead of just overriding them.

I like the suggestion Steffen mentioned of setting the agent version or some similar suffix rather than the full index string, which would be more broadly useful without growing the configuration complexity too much, but as I understand it from @ruflin that would not address the API need. So I think we should go with a simple static string override, but make sure we frame the configuration so as not to overpromise what it does. One suggestion was to name the configuration field staticIndex, to more clearly connote a single overriding string, and that's probably what I'll do for the initial implementation, but suggestions are welcome on how to best keep the configuration effects simple / clear.

@ph
Copy link
Contributor Author

ph commented Sep 11, 2019

I agree with only supporting the static use case so we effectively limit all the corner case and also +1 on making it an explicit name, I don't have anything better than static_index for now, we can bring it up in a sync to see if we can find a better name.

@urso
Copy link

urso commented Sep 26, 2019

Configuring an index per input will not just be some features to make fleet/integrations happy. It will also be a feature available to the standalone Beat itself, that users will very likely use.

For users coming from output.elasticsearch.index, it will be somewhat funny to learn the limitation of the index setting on the input (assuming they even look at the docs). With a static name we force users to always use write aliases + ILM or do "manual" alias updates.
This complicates matters for OSS users, who do not have access to ILM.

This is why at minimum I would like to have support us to set the timestamp.

-1 on introducing static_index now, and some alternative index setting later. We should have only one setting.

There are 4 processes operating on the per-input-index setting. The actual Beat, the agent, fleet, and integrations. Even if the setting is somewhat powerful in the Beat itself (name + optional date), agent, fleet and integrations still might impose some limitations before passing it down to the actual Beat. But the later applications operate in a different environment, then OSS Beats users.

@faec
Copy link
Contributor

faec commented Sep 26, 2019

How are you imagining the setting is specified? My concern with having just an index setting is that people might reasonably expect it to behave like the output index, and it sounds like even the more flexible alternative of allowing an optional date won't do that. Is there a natural way to specify the index (as a string, presumably?) that would allow an optional date without suggesting more capabilities than we want to add right now?

@faec
Copy link
Contributor

faec commented Sep 26, 2019

Talked to @urso offline about options... it sounds like the best compromise might be to make an index field after all, as a format string with access to limited fields (probably just the date and agent like for templates). This would allow a static setting for the backend API that needs it while still allowing some flexibility for standalone user configurations.

@ph
Copy link
Contributor Author

ph commented Oct 3, 2019

the plan sound good to me.

@faec
Copy link
Contributor

faec commented Dec 19, 2019

The last piece of this was merged this morning, the final 7.x backport should go in later today :-)

@urso
Copy link

urso commented Dec 23, 2019

@ph @faec Can we close this?

@ph
Copy link
Contributor Author

ph commented Dec 30, 2019

Looking at all the PRs which also include the cherry-pick to 7.x we can close this meta issue.

@shawnz
Copy link

shawnz commented Jan 16, 2020

It seems like the raw_index metadata field currently only gets set if the index name is overridden by the input/module. But if it's not, raw_index is unset. Is it possible to set a default value, or at least have a hardcoded default like the usual "filebeat-%{[agent.version]}-%{+yyyy.MM.dd}"? This would greatly simplify my Logstash configuration. I thought the "output.logstash.index" property might be for this purpose but it doesn't seem to affect the raw_index metadata field.

Thanks, Shawn

@andresrc andresrc added the Team:Integrations Label for the Integrations team label Mar 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants