-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
5.0 Default pipelines #21101
Comments
I've been against default pipelines because pipelines should only be run on the first ingestion. When you update or overwrite a document, you may not want the default to run. For this reason I prefer pipelines to be manually specified. |
With |
I agree for a non-logging use case. The ability to enable default pipelines for a logging use case would be very helpful where document updates are non-existent. Further, I would only want to enable default pipelines for certain indices or have the capability to do so. Could it be an index setting? Example uses for logging might be password stripping with the set processor or field truncation using a script processor. |
+1 to allowing default pipelines for indices. |
+1 It would be very useful. |
+1 for this. It helps users that were affected by the removal of _timestamp field. |
another argument in favor of being able to specify a default pipeline: AWS allows to feed an elasticsearch instance from a Amazon Kinesis Firehose Stream. However, the document _id is set by the Firehose Stream. Firehose also controls the command that is used to send the data to the elasticsearch instance, i.e. it is not possible to add the ?pipeline query parameter. |
+1 especially as an option block in index templates, I guess this would be a perfect spot for it. |
+1 having this in the index template will be very useful
@zoellner I stumbled upon this issue while looking for the exact same option. Did you happen to figure out a way around this? |
@whiteboardmonk no, I've since stopped using Firehose Streams because of this issue. |
+1, _timestamp-replacement as the use-case |
Is there any updates on this? |
+1, for the default pipeline (regarding the use-case for _timestamp) |
+1 |
Being able to specify a default pipeline perhaps in an index template would be extremely useful for our case where we don't have control over the bulk put. We are using fluentd and its elasticsearch plugin and I don't believe there is a way for us to specify a pipeline using its output language. |
+1 for default pipeline. |
+1 to add another real world usecase: we have a tracing implementation that persists to elasticsearch. it keeps track of the time-stamps in microseconds instead of milliseconds. adding an additional field that does the conversion while indexing would be extremely helpful. There is no chance to control the trace collectors PUT requests to ES and therefore no chance to configure a pipeline via queryparams :/ |
@clintongormley Good point. Perhaps a good idea would be having an index-wide setting for a default pipeline, with some parameters controlling for which operations the default applies? (By operations I mean index or update or whatever.) |
I think I could get behind the following:
|
It seems like this could easily be a malformed request. For this corner case (that someone wants to get around the default pipeline), one could create a dummy pipeline that does nothing and specify that explicitly here? Then specifying pipeline with an empty string can return an error? |
Or, have a specially named value called |
+1 We sure could use this functionality as well. Has there been update yet from ES whether this is on the roadmap? I'm not finding one. |
Is this supported in ES 6.0? |
+1 I like the ingest pipeline, as it decouples me from any pre-processing of my logs in the source. |
+1 |
One more use case with default pipelines could be e.g. custom validation/postprocessing of Kibana objects in case of introduction of a pipeline on .kibana index. |
@clintongormley Why do you want to restrict something optional? |
Pinging @elastic/es-core-infra |
+1 |
4 similar comments
+1 |
+1 |
+1 |
+1 |
+1 requested by Student in Engineer II training. Use-case data validation. Q. This ^^discussion considers adding a pipeline to index settings. As an alternative, could a default pipeline be specified in an alias, which could be exposed for first ingest while allowing subsequent update or overwrite directly via the index or via an alternative alias using no (or a different) pipeline? |
* Add `default_pipeline` index setting * Empty string pipeline argument is interpreted as no pipeline * closes elastic#21101
* INGEST: Enable default pipelines * Add `default_pipeline` index setting * `_none` is interpreted as no pipeline * closes #21101
* INGEST: Enable default pipelines * Add `default_pipeline` index setting * `_none` is interpreted as no pipeline * closes elastic#21101
The more I use pipelines the more useful they would be come if I could specify a list of pipelines that automatically get run on a type or index level.
This would also save some overhead of specifying what pipeline to use when a huge percentage of use cases that are using pipelines will never change. This would also make the api easier to use.
Some questions would be, if I specify a list of pipelines on an index, what would happen if I specify a pipeline to use, Would it be a merged list or just the specified pipeline to run.
The text was updated successfully, but these errors were encountered: