-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INGEST: Enable default pipelines #32286
INGEST: Enable default pipelines #32286
Conversation
* Add `default_pipeline` index setting * Empty string pipeline argument is interpreted as no pipeline * closes elastic#21101
Pinging @elastic/es-core-infra |
@@ -78,7 +78,7 @@ protected void doRun() throws Exception { | |||
innerExecute(indexRequest, getPipeline(indexRequest.getPipeline())); | |||
//this shouldn't be needed here but we do it for consistency with index api | |||
// which requires it to prevent double execution | |||
indexRequest.setPipeline(null); | |||
indexRequest.setPipeline(""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to change this since null
now triggers an empty loop because it causes the default pipeline to be set over and over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on what triggers this? I'm unsure about the original comment, but it seems like that should be fixed first, so that we can have null
as meaning "no pipeline" in the index request. Empty string should be an error. We need some type of literal string (which we disallow from pipeline names) to mean "no pipeline". In the original default pipelines issue, I suggested _none
or something like that. Using empty string is too easy to happen with a malformed request (like accidentally not serializing http params correctly into the request).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjernst the problem is that I need to somehow differentiate between "no pipeline so set default" (when the request first hits org.elasticsearch.action.bulk.TransportBulkAction#doExecute
) and "no pipeline, period" (when it hits org.elasticsearch.action.bulk.TransportBulkAction#doExecute
and I want to prevent the default to be applied again). Previously double execution here could be prevented by setting null
, but now I used null
to trigger setting the default and ""
to mean that no default should be set (since that is also what I'd see if someone used a url ending in ?pipeline=
).
So better move to _none
for when we want to prevent setting a pipeline and also use the default on ""
(i.e. URLs ending in ?pipeline=
)? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So better move to _none for when we want to prevent setting a pipeline and also use the default on "" (i.e. URLs ending in ?pipeline=)? :)
Sorry I don't understand what you are suggesting here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simply:
?pipeline=
means no pipeline is set i.e is equivalent to not ?pipeline
parameter in the uri => apply default pipeline
?pipeline=_none
=> don't run any pipeline
sorry, seems it's too late for my writing skills :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel strongly ?pipeline=
should be an error. It is too easy too have someone mean to override a pipeline, but then get the default (which is the opposite of the problem when using ?pipeline=
to mean don't run any pipeline).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjernst I like it :) Error for ?pipeline=
it is :) => on it
hasIndexRequestsWithPipelines = true; | ||
} | ||
} | ||
} else if (!pipeline.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prefer == false
instead of !
as the latter is susceptible to being missed when reading through code.
ignore: 404 | ||
|
||
--- | ||
"Test index with default pipeline": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While having rest tests is fine to ensure the param is passed through the rest layer correctly, we should first have unit tests.
@rjernst added unit tests and moved to |
@rjernst ping :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple comments.
IndexMetaData indexMetaData = indicesMetaData.get(indexRequest.index()); | ||
if (indexMetaData != null) { | ||
String defaultPipeline = IndexSettings.DEFAULT_PIPELINE.get(indexMetaData.getSettings()); | ||
if (defaultPipeline.isEmpty() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be possible if we are validating a non-empty string?
@@ -254,6 +254,9 @@ | |||
public static final Setting<Integer> MAX_REGEX_LENGTH_SETTING = Setting.intSetting("index.max_regex_length", | |||
1000, 1, Property.Dynamic, Property.IndexScope); | |||
|
|||
public static final Setting<String> DEFAULT_PIPELINE = | |||
Setting.simpleString("index.default_pipeline", "", Property.Dynamic, Property.IndexScope); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add validation to this setting so it is not empty. We can make the default the _none
constant.
continue; | ||
} | ||
String pipeline = indexRequest.getPipeline(); | ||
if (Strings.hasText(pipeline) && IngestService.NOOP_PIPELINE_NAME.equals(pipeline) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is hasText necessary if we have validated this is a non empty string?
@rjernst thanks for spotting that, fixed all the points you mentioned by moving to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @original-brownbear. LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looks fine.
Jenkins test this |
@rjernst thanks! => merging :) |
* INGEST: Enable default pipelines * Add `default_pipeline` index setting * `_none` is interpreted as no pipeline * closes elastic#21101
* master: HLRC: Move commercial clients from XPackClient (#32596) Add cluster UUID to Cluster Stats API response (#32206) Security: move User to protocol project (#32367) [TEST] Test for shard failures, add debug to testProfileMatchesRegular Minor fix for javadoc (applicable for java 11). (#32573) Painless: Move Some Lookup Logic to PainlessLookup (#32565) TEST: Avoid merges in testSeqNoAndCheckpoints [Rollup] Remove builders from HistoGroupConfig (#32533) Mutes failing SQL string function tests due to #32589 fixed elements in array of produced terms (#32519) INGEST: Enable default pipelines (#32286) Remove cluster state initial customs (#32501) Mutes LicensingDocumentationIT due to #32580 [ML] Remove multiple_bucket_spans (#32496) [ML] Rename JobProvider to JobResultsProvider (#32551) Correct minor typo in explain.asciidoc for HLRC Build: Add elastic maven to repos used by BuildPlugin (#32549) Clarify the error message when a pipeline agg is used in the 'order' parameter. (#32522) Revert "[test] turn on host io cache for opensuse (#32053)" Enable packaging tests on suse boxes [ML] Improve error when no available field exists for rule scope (#32550) [ML] Improve error for functions with limited rule condition support (#32548) Painless: Clean Up PainlessField (#32525) Add @AwaitsFix for #32554 Remove broken @link in Javadoc Scripting: Conditionally use java time api in scripting (#31441) [ML] Fix thread leak when waiting for job flush (#32196) (#32541) Add AwaitsFix to failing test - see #32546 Core: Minor size reduction for AbstractComponent (#32509) SQL: Added support for string manipulating functions with more than one parameter (#32356) [DOCS] Reloadable Secure Settings (#31713) Watcher: Reenable HttpSecretsIntegrationTests#testWebhookAction test (#32456) [Rollup] Remove builders from TermsGroupConfig (#32507) Use hostname instead of IP with SPNEGO test (#32514) Switch x-pack rolling restart to new style Requests (#32339) NETWORKING: Fix Netty Leaks by upgrading to 4.1.28 (#32511) [DOCS] Small fixes in rule configuration page (#32516) Painless: Clean up PainlessMethod (#32476) Build: Remove shadowing from benchmarks (#32475) Docs: Add all JDKs to CONTRIBUTING.md Add licensing enforcement for FIPS mode (#32437) SQL: Add test for handling of partial results (#32474) Mute testFilterCacheStats [ML][DOCS] Fix typo applied_to => applies_to Scripting: Fix painless compiler loader to know about context classes (#32385)
* 6.x: [Kerberos] Use canonical host name (#32588) Cross-cluster search: preserve cluster alias in shard failures (#32608) [TEST] Allow to run in FIPS JVM (#32607) Handle AlreadyClosedException when bumping primary term [Test] Add ckb to the list of unsupported languages (#32611) SCRIPTING: Move Aggregation Scripts to their own context (#32068) (#32629) [TEST] Enhance failure message when bulk updates have failures [ML] Add ML result classes to protocol library (#32587) Suppress LicensingDocumentationIT.testPutLicense in release builds (#32613) [Rollup] Improve ID scheme for rollup documents (#32558) Mutes failing SQL string function tests due to #32589 Suppress Wildfly test in FIPS JVMs (#32543) Add cluster UUID to Cluster Stats API response (#32206) [ML] Add some ML config classes to protocol library (#32502) [TEST]Split transport verification mode none tests (#32488) [Rollup] Remove builders from DateHistogramGroupConfig (#32555) [ML] Add Detector config classes to protocol library (#32495) [Rollup] Remove builders from MetricConfig (#32536) Fix race between replica reset and primary promotion (#32442) HLRC: Move commercial clients from XPackClient (#32596) Security: move User to protocol project (#32367) Minor fix for javadoc (applicable for java 11). (#32573) Painless: Move Some Lookup Logic to PainlessLookup (#32565) Core: Minor size reduction for AbstractComponent (#32509) INGEST: Enable default pipelines (#32286) (#32591) TEST: Avoid merges in testSeqNoAndCheckpoints [Rollup] Remove builders from HistoGroupConfig (#32533) fixed elements in array of produced terms (#32519) Mutes ReindexFailureTests.searchFailure dues to #28053 Mutes LicensingDocumentationIT due to #32580 Remove the SATA controller from OpenSUSE box [ML] Rename JobProvider to JobResultsProvider (#32551)
default_pipeline
index setting