Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow es sending to non datastream #73

Merged
merged 9 commits into from
Feb 16, 2022

Conversation

aspacca
Copy link
Contributor

@aspacca aspacca commented Feb 14, 2022

Breaking change

What does this PR do?

Add support for both index and datstream for sending events do elasticsearch

Why is it important?

Not every users might use datastreams, and they need a way to send to an index or an alias: before this change it was not possible.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.md

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Closes #72

Use cases

Screenshots

Logs

@aspacca aspacca self-assigned this Feb 14, 2022
@elasticmachine
Copy link

elasticmachine commented Feb 14, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-02-15T15:50:08.073+0000

  • Duration: 26 min 32 sec

Test stats 🧪

Test Results
Failed 0
Passed 59
Skipped 0
Total 59

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

Copy link
Member

@endorama endorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with the change but I would like to discuss the implementation. I think a different option (like index_name) would be preferred over the composite es_index_or_datastream_name. If set, it overrides data stream naming, otherwise we retain the same behaviour.

This would have a set of advantages:

  • not a breaking change (old config would still work)
  • allow to configure the index name if needed
  • in case data streams are used the current config reduce cognitive load for customers (as they have to specify only data set and namespace and not worry about data stream naming constraints).
  • clearer configuration logic: either data streams or override (override supports writing to indexes too)

What to you think?

docs/README-AWS.md Outdated Show resolved Hide resolved
Co-authored-by: endorama <526307+endorama@users.noreply.github.com>
@aspacca
Copy link
Contributor Author

aspacca commented Feb 15, 2022

I'm ok with the change but I would like to discuss the implementation. I think a different option (like index_name) would be preferred over the composite es_index_or_datastream_name. If set, it overrides data stream naming, otherwise we retain the same behaviour.

This would have a set of advantages:

  • not a breaking change (old config would still work)
  • allow to configure the index name if needed
  • in case data streams are used the current config reduce cognitive load for customers (as they have to specify only data set and namespace and not worry about data stream naming constraints).
  • clearer configuration logic: either data streams or override (override supports writing to indexes too)

What to you think?

I discussed the option of an extra config param without having a breaking change with @ravikesarwani

the golden rule we agreed on was "let's make it work with defaults and/or plain config for 90% of the use cases"
that's the reason for a single config
yes, the change will be breaking, being in beta and still having users in the POC phase and not in the production one we decided it is fine to do.
regarding the override and more granular configuration (in this PR you cannot override the single value of the namespace for example), sticking to the golden rule, in a next iteration, an extra/overriding way for configuring the 10% of the use cases that don't fit will be provided. this will properly documented in a separated section of the docs as something like advanced config, for users that need to tweak the setup to their very specific cases

do you think that's reasonable?

@ravikesarwani , do you think we should reconsider the implementation according to @endorama suggestions?

@aspacca
Copy link
Contributor Author

aspacca commented Feb 15, 2022

@endorama I forgot to link the issue #72

@ravikesarwani
Copy link

in case data streams are used the current config reduce cognitive load for customers (as they have to specify only data set and namespace and not worry about data stream naming constraints).

While in theory this may seem like it, in practical (and based on our experience talking to folks in GoDaddy) no one really knows what "dataset" & "namespace" really is (even after giving them all the links for data stream naming conventions) and how we internally create the full data stream name by combining "logs-", dataset & namespace.

Having a single value is desirable and many Elasticsearch users understand index or data stream name. This is still an optional value (for AWS service logs we send it to correct data streams and they don't have to specify this) and they will only use this config when they have created their own index template (in complex use cases) and its easier for them to put the full value or the alias here. Breaking it into many pieces is our internal logic and not something I think we need to expose to our customers via way of multiple config values.

@endorama
Copy link
Member

While in theory this may seem like it, in practical (and based on our experience talking to folks in GoDaddy) no one really knows what "dataset" & "namespace" really is (even after giving them all the links for data stream naming conventions) and how we internally create the full data stream name by combining "logs-", dataset & namespace.

Having a single value is desirable and many Elasticsearch users understand index or data stream name.

@ravikesarwani this is a really interesting comment, thanks for sharing.

Approved, code was already ✔️

@aspacca aspacca merged commit 182bf2f into elastic:main Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve handling of index/data stream routing
4 participants