5.0 Default pipelines #21101

niemyjski · 2016-10-24T19:56:23Z

The more I use pipelines the more useful they would be come if I could specify a list of pipelines that automatically get run on a type or index level.

This would also save some overhead of specifying what pipeline to use when a huge percentage of use cases that are using pipelines will never change. This would also make the api easier to use.

Some questions would be, if I specify a list of pipelines on an index, what would happen if I specify a pipeline to use, Would it be a merged list or just the specified pipeline to run.

clintongormley · 2016-11-05T16:56:36Z

The more I use pipelines the more useful they would be come if I could specify a list of pipelines that automatically get run on a type or index level.

I've been against default pipelines because pipelines should only be run on the first ingestion. When you update or overwrite a document, you may not want the default to run. For this reason I prefer pipelines to be manually specified.

djschny · 2016-11-12T01:49:59Z

With _timestamp removed if a user wants to add their own timestamp field, a pipeline processor is really the only way to do it. Having to force all clients to specify the same pipeline (or include it in theirs) is problematic. I reached for this within the first 30 minutes of using pipelines and feel it would be very helpful.

inqueue · 2017-02-09T16:16:57Z

When you update or overwrite a document, you may not want the default to run.

I agree for a non-logging use case. The ability to enable default pipelines for a logging use case would be very helpful where document updates are non-existent. Further, I would only want to enable default pipelines for certain indices or have the capability to do so. Could it be an index setting?

Example uses for logging might be password stripping with the set processor or field truncation using a script processor.

tipuban · 2017-03-07T11:54:09Z

+1 to allowing default pipelines for indices.
Ideally you would also be able to specify on what type of operation the pipeline would be run: insert-time, update or both.

cristimagda · 2017-03-07T12:04:54Z

Ideally you would also be able to specify on what type of operation the pipeline would be run: insert-time, update or both.

+1 It would be very useful.

marius-dr · 2017-04-19T12:51:06Z

+1 for this. It helps users that were affected by the removal of _timestamp field.

zoellner · 2017-05-17T00:13:53Z

another argument in favor of being able to specify a default pipeline:
There are scenarios where the PUT command (and some other document pre processing is outside the control of the ES operator)

AWS allows to feed an elasticsearch instance from a Amazon Kinesis Firehose Stream. However, the document _id is set by the Firehose Stream. Firehose also controls the command that is used to send the data to the elasticsearch instance, i.e. it is not possible to add the ?pipeline query parameter.
With a default ingest pipeline (based on index/type, ideally specified altogether in the index template) one could set the _id through a preprocessor based on the document _source.

titzi · 2017-05-31T09:53:23Z

+1 especially as an option block in index templates, I guess this would be a perfect spot for it.

whiteboardmonk · 2017-06-08T07:09:07Z

+1 having this in the index template will be very useful

AWS allows to feed an elasticsearch instance from a Amazon Kinesis Firehose Stream. However, the document _id is set by the Firehose Stream. Firehose also controls the command that is used to send the data to the elasticsearch instance, i.e. it is not possible to add the ?pipeline query parameter.
With a default ingest pipeline (based on index/type, ideally specified altogether in the index template) one could set the _id through a preprocessor based on the document _source.

@zoellner I stumbled upon this issue while looking for the exact same option. Did you happen to figure out a way around this?

zoellner · 2017-06-08T16:43:59Z

@whiteboardmonk no, I've since stopped using Firehose Streams because of this issue.

wpongra · 2017-08-07T14:01:37Z

+1, _timestamp-replacement as the use-case

niemyjski · 2017-08-10T15:07:43Z

Is there any updates on this?

mr-mos · 2017-08-22T09:42:45Z

+1, for the default pipeline (regarding the use-case for _timestamp)

redx177 · 2017-08-29T12:40:26Z

+1
I do understand @clintongormley argumentation against it. And it should be documented that adding a default pipeline will be executed for first ingestion as well as updates. But having the choice between specifying it on index level or per request gives the flexibility to use what ever is more appropriate for the current job.

chs-bnet · 2017-09-06T21:08:09Z

Being able to specify a default pipeline perhaps in an index template would be extremely useful for our case where we don't have control over the bulk put. We are using fluentd and its elasticsearch plugin and I don't believe there is a way for us to specify a pipeline using its output language.

zfanswer · 2017-09-08T06:08:52Z

+1 for default pipeline.
And the setting should likely be done in index side, not pipeline.
also add option to skip pipeline like ?skip_pipline=true for some interfaces, e.g. reindex, special case. May avoid the case @clintongormley mentioned at beginning.

de-robat · 2017-09-20T11:38:52Z

+1 to add another real world usecase: we have a tracing implementation that persists to elasticsearch. it keeps track of the time-stamps in microseconds instead of milliseconds. adding an additional field that does the conversion while indexing would be extremely helpful. There is no chance to control the trace collectors PUT requests to ES and therefore no chance to configure a pipeline via queryparams :/

dandrestor · 2017-10-05T14:18:28Z

@clintongormley Good point. Perhaps a good idea would be having an index-wide setting for a default pipeline, with some parameters controlling for which operations the default applies? (By operations I mean index or update or whatever.)

clintongormley · 2017-10-09T12:26:53Z

I think I could get behind the following:

an index setting which specifies the default pipeline to use for index or create operations only
update operations would not use the default pipeline
specifying ?pipeline=foo in an index request would result in the foo pipeline being applied instead of the default pipeline
specifying ?pipeline= in an index request would result in no pipeline being applied

rjernst · 2017-10-09T17:46:29Z

specifying ?pipeline= in an index request would result in no pipeline being applied

It seems like this could easily be a malformed request. For this corner case (that someone wants to get around the default pipeline), one could create a dummy pipeline that does nothing and specify that explicitly here? Then specifying pipeline with an empty string can return an error?

rjernst · 2017-10-09T17:47:16Z

Or, have a specially named value called _none?

stevenwall · 2017-12-12T16:42:09Z

+1

We sure could use this functionality as well.

Has there been update yet from ES whether this is on the roadmap? I'm not finding one.

zfanswer · 2017-12-14T02:32:47Z

Is this supported in ES 6.0?

kafis · 2018-03-01T12:57:44Z

+1

I like the ingest pipeline, as it decouples me from any pre-processing of my logs in the source.
But if I cant enable it by default, I am still thrown back to manipulate my sources (that I dont have control over necessarily) to use a specific ingest pipeline

prasadkhandagale · 2018-03-07T09:52:53Z

+1

sergii-sakharov · 2018-03-08T16:11:55Z

One more use case with default pipelines could be e.g. custom validation/postprocessing of Kibana objects in case of introduction of a pipeline on .kibana index.

SebC99 · 2018-03-15T07:14:55Z

@clintongormley Why do you want to restrict something optional?
I mean letting default pipeline be used in updates could be very useful for calculated fields (in our case suggest fields for completion), while those who wants to use a pipeline only at index time could still do it manually with index parameters.
Or we could specify a default one for index and a default one for update?
Again, as using default pipeline would be mandatory I believe there's no point to make it restricted.
My 2 cents :)

elasticmachine · 2018-03-15T15:19:17Z

Pinging @elastic/es-core-infra

kunna-ujet · 2018-03-18T09:25:23Z

+1

vanntomm · 2018-05-04T12:59:24Z

+1

trippd6 · 2018-05-10T14:29:08Z

+1

lukeplausin · 2018-05-22T15:54:37Z

+1

romanpierson · 2018-06-15T21:35:32Z

+1

djptek · 2018-07-11T13:21:45Z

+1

requested by Student in Engineer II training. Use-case data validation.

Q. This ^^discussion considers adding a pipeline to index settings. As an alternative, could a default pipeline be specified in an alias, which could be exposed for first ingest while allowing subsequent update or overwrite directly via the index or via an alternative alias using no (or a different) pipeline?

* Add `default_pipeline` index setting * Empty string pipeline argument is interpreted as no pipeline * closes elastic#21101

* INGEST: Enable default pipelines * Add `default_pipeline` index setting * `_none` is interpreted as no pipeline * closes #21101

* INGEST: Enable default pipelines * Add `default_pipeline` index setting * `_none` is interpreted as no pipeline * closes elastic#21101

* INGEST: Enable default pipelines * Add `default_pipeline` index setting * `_none` is interpreted as no pipeline * closes #21101

clintongormley added discuss :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP labels Nov 5, 2016

talevy added the help wanted adoptme label Mar 15, 2018

talevy added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP and removed :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP discuss labels Mar 15, 2018

talevy added the >feature label Mar 15, 2018

original-brownbear self-assigned this Jul 20, 2018

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Jul 23, 2018

INGEST: Enable default pipelines

c502262

* Add `default_pipeline` index setting * Empty string pipeline argument is interpreted as no pipeline * closes elastic#21101

original-brownbear mentioned this issue Jul 23, 2018

INGEST: Enable default pipelines #32286

Merged

original-brownbear closed this as completed in #32286 Aug 2, 2018

original-brownbear added a commit that referenced this issue Aug 2, 2018

INGEST: Enable default pipelines (#32286)

be31cc6

* INGEST: Enable default pipelines * Add `default_pipeline` index setting * `_none` is interpreted as no pipeline * closes #21101

original-brownbear mentioned this issue Aug 2, 2018

INGEST: Enable default pipelines (#32286) #32591

Merged

original-brownbear added a commit that referenced this issue Aug 2, 2018

INGEST: Enable default pipelines (#32286) (#32591)

bed5e4a

* INGEST: Enable default pipelines * Add `default_pipeline` index setting * `_none` is interpreted as no pipeline * closes #21101

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

5.0 Default pipelines #21101

5.0 Default pipelines #21101

niemyjski commented Oct 24, 2016

clintongormley commented Nov 5, 2016

djschny commented Nov 12, 2016

inqueue commented Feb 9, 2017 •

edited

Loading

tipuban commented Mar 7, 2017 •

edited

Loading

cristimagda commented Mar 7, 2017

marius-dr commented Apr 19, 2017

zoellner commented May 17, 2017

titzi commented May 31, 2017

whiteboardmonk commented Jun 8, 2017

zoellner commented Jun 8, 2017

wpongra commented Aug 7, 2017

niemyjski commented Aug 10, 2017

mr-mos commented Aug 22, 2017

redx177 commented Aug 29, 2017 •

edited

Loading

chs-bnet commented Sep 6, 2017

zfanswer commented Sep 8, 2017

de-robat commented Sep 20, 2017

dandrestor commented Oct 5, 2017

clintongormley commented Oct 9, 2017

rjernst commented Oct 9, 2017

rjernst commented Oct 9, 2017

stevenwall commented Dec 12, 2017

zfanswer commented Dec 14, 2017

kafis commented Mar 1, 2018

prasadkhandagale commented Mar 7, 2018

sergii-sakharov commented Mar 8, 2018

SebC99 commented Mar 15, 2018

elasticmachine commented Mar 15, 2018

kunna-ujet commented Mar 18, 2018

vanntomm commented May 4, 2018

trippd6 commented May 10, 2018

lukeplausin commented May 22, 2018

romanpierson commented Jun 15, 2018

djptek commented Jul 11, 2018 •

edited

Loading

5.0 Default pipelines #21101

5.0 Default pipelines #21101

Comments

niemyjski commented Oct 24, 2016

clintongormley commented Nov 5, 2016

djschny commented Nov 12, 2016

inqueue commented Feb 9, 2017 • edited Loading

tipuban commented Mar 7, 2017 • edited Loading

cristimagda commented Mar 7, 2017

marius-dr commented Apr 19, 2017

zoellner commented May 17, 2017

titzi commented May 31, 2017

whiteboardmonk commented Jun 8, 2017

zoellner commented Jun 8, 2017

wpongra commented Aug 7, 2017

niemyjski commented Aug 10, 2017

mr-mos commented Aug 22, 2017

redx177 commented Aug 29, 2017 • edited Loading

chs-bnet commented Sep 6, 2017

zfanswer commented Sep 8, 2017

de-robat commented Sep 20, 2017

dandrestor commented Oct 5, 2017

clintongormley commented Oct 9, 2017

rjernst commented Oct 9, 2017

rjernst commented Oct 9, 2017

stevenwall commented Dec 12, 2017

zfanswer commented Dec 14, 2017

kafis commented Mar 1, 2018

prasadkhandagale commented Mar 7, 2018

sergii-sakharov commented Mar 8, 2018

SebC99 commented Mar 15, 2018

elasticmachine commented Mar 15, 2018

kunna-ujet commented Mar 18, 2018

vanntomm commented May 4, 2018

trippd6 commented May 10, 2018

lukeplausin commented May 22, 2018

romanpierson commented Jun 15, 2018

djptek commented Jul 11, 2018 • edited Loading

inqueue commented Feb 9, 2017 •

edited

Loading

tipuban commented Mar 7, 2017 •

edited

Loading

redx177 commented Aug 29, 2017 •

edited

Loading

djptek commented Jul 11, 2018 •

edited

Loading