[Search pipelines] Pass "adhocness" flag to processor factories #8164

msfroh · 2023-06-20T01:25:10Z

Description

A named search pipeline may be created with a PUT request, while an "anonymous" or "ad hoc" search pipeline can be defined in the search request source. In the latter case, we don't want to create any "resource-heavy" processors, since they're potentially increasing the cost of every search request, whereas named pipeline processors get reused.

This change passes a configuration flag to a processor factory if it's being called as part of an ad hoc pipeline. The factory can use that information to avoid allocating expensive resources (maybe by throwing an exception instead).

Related Issues

Resolves #8163

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff
Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2023-06-20T01:46:35Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/17977/
CommitID: b0deccb
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

github-actions · 2023-06-20T01:54:47Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/17978/
CommitID: d51d2fa
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

github-actions · 2023-06-20T18:22:03Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/18018/
CommitID: c591ca3
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

github-actions · 2023-06-20T19:07:58Z

Gradle Check (Jenkins) Run Completed with:

RESULT: UNSTABLE ❕
TEST FAILURES:

      1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchShardTaskCancellationWithHighCpu

URL: https://build.ci.opensearch.org/job/gradle-check/18023/
CommitID: 4c8a77e
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

codecov · 2023-06-20T19:09:06Z

Codecov Report

Merging #8164 (b391748) into main (c1c23b4) will increase coverage by 0.15%.
The diff coverage is 79.06%.

@@             Coverage Diff              @@
##               main    #8164      +/-   ##
============================================
+ Coverage     70.90%   71.05%   +0.15%     
- Complexity    56903    56994      +91     
============================================
  Files          4758     4758              
  Lines        269225   269236      +11     
  Branches      39407    39406       -1     
============================================
+ Hits         190881   191299     +418     
+ Misses        62256    61842     -414     
- Partials      16088    16095       +7

Impacted Files	Coverage Δ
.../pipeline/common/RenameFieldResponseProcessor.java	`94.59% <ø> (ø)`
...search/pipeline/common/ScriptRequestProcessor.java	`30.00% <0.00%> (-5.30%)`	⬇️
...eline/common/SearchPipelineCommonModulePlugin.java	`0.00% <ø> (ø)`
...pensearch/search/pipeline/PipelineWithMetrics.java	`90.65% <88.88%> (-0.44%)`	⬇️
...h/pipeline/common/FilterQueryRequestProcessor.java	`96.00% <100.00%> (+10.70%)`	⬆️
...a/org/opensearch/plugins/SearchPipelinePlugin.java	`80.00% <100.00%> (+80.00%)`	⬆️
...java/org/opensearch/search/pipeline/Processor.java	`100.00% <100.00%> (ø)`
...nsearch/search/pipeline/SearchPipelineService.java	`84.54% <100.00%> (ø)`

... and 463 files with indirect coverage changes

github-actions · 2023-06-20T23:26:27Z

Gradle Check (Jenkins) Run Completed with:

RESULT: UNSTABLE ❕
TEST FAILURES:

      1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchTaskCancellationWithHighCpu

URL: https://build.ci.opensearch.org/job/gradle-check/18053/
CommitID: 2e873c1
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions · 2023-07-02T05:54:14Z

Gradle Check (Jenkins) Run Completed with:

RESULT: UNSTABLE ❕
TEST FAILURES:

      3 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testNodeDropWithOngoingReplication

URL: https://build.ci.opensearch.org/job/gradle-check/18879/
CommitID: 502c703
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

A named search pipeline may be created with a PUT request, while an "anonymous" or "ad hoc" search pipeline can be defined in the search request source. In the latter case, we don't want to create any "resource-heavy" processors, since they're potentially increasing the cost of every search request, whereas names pipeline processors get reused. This change passes a configuration flag to a processor factory if it's being called as part of an ad hoc pipeline. The factory can use that information to avoid allocating expensive resources (maybe by throwing an exception instead). Signed-off-by: Michael Froh <froh@amazon.com>

@dblock

Thanks to @dblock for the suggestion to pass the pipeline creation source in a way that accounts for possible future pipeline sources (and lets us distinguish between actual named pipeline creation and the validation create() that executes before we write a pipeline definition to cluster state). Signed-off-by: Michael Froh <froh@amazon.com>

Signed-off-by: Michael Froh <froh@amazon.com>

github-actions · 2023-07-06T09:41:08Z

Gradle Check (Jenkins) Run Completed with:

RESULT: SUCCESS ✅
URL: https://build.ci.opensearch.org/job/gradle-check/19247/
CommitID: b391748

macohen · 2023-07-06T17:06:14Z

@dblock is this change acceptable? looking to merge for the 2.9 release.

opensearch-trigger-bot · 2023-07-06T21:10:41Z

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-8164-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 431b2464324a76e60fb446567504eae846f7b120
# Push it to GitHub
git push --set-upstream origin backport/backport-8164-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-8164-to-2.x.

dblock · 2023-07-06T21:26:47Z

Looks like it will need a manual backport, @msfroh

mingshl · 2023-07-07T01:02:10Z

Tried to help backport this, but it looks like it's blocking by this PR. We need #7283 getting to 2.x first then backport this one.

msfroh · 2023-07-07T09:56:58Z

Tried to help backport this, but it looks like it's blocking by this #7283. We need #7283 getting to 2.x first then backport this one.

Looks like the backport (#8512) was just merged a few hours ago.

@dblock

…search-project#8164) * [Search pipelines] Pass "adhocness" flag to processor factories A named search pipeline may be created with a PUT request, while an "anonymous" or "ad hoc" search pipeline can be defined in the search request source. In the latter case, we don't want to create any "resource-heavy" processors, since they're potentially increasing the cost of every search request, whereas names pipeline processors get reused. This change passes a configuration flag to a processor factory if it's being called as part of an ad hoc pipeline. The factory can use that information to avoid allocating expensive resources (maybe by throwing an exception instead). Signed-off-by: Michael Froh <froh@amazon.com> * Pass pipeline creation source as enum Thanks to @dblock for the suggestion to pass the pipeline creation source in a way that accounts for possible future pipeline sources (and lets us distinguish between actual named pipeline creation and the validation create() that executes before we write a pipeline definition to cluster state). Signed-off-by: Michael Froh <froh@amazon.com> * Move PipelineSource into PipelineContext and explicitly pass to create Signed-off-by: Michael Froh <froh@amazon.com> * Fix formatting on merge conflict resolution Signed-off-by: Michael Froh <froh@amazon.com> --------- Signed-off-by: Michael Froh <froh@amazon.com> (cherry picked from commit 431b246)

msfroh · 2023-07-07T10:54:17Z

Backport PR: #8522

@dblock

…search-project#8164) * [Search pipelines] Pass "adhocness" flag to processor factories A named search pipeline may be created with a PUT request, while an "anonymous" or "ad hoc" search pipeline can be defined in the search request source. In the latter case, we don't want to create any "resource-heavy" processors, since they're potentially increasing the cost of every search request, whereas names pipeline processors get reused. This change passes a configuration flag to a processor factory if it's being called as part of an ad hoc pipeline. The factory can use that information to avoid allocating expensive resources (maybe by throwing an exception instead). Signed-off-by: Michael Froh <froh@amazon.com> * Pass pipeline creation source as enum Thanks to @dblock for the suggestion to pass the pipeline creation source in a way that accounts for possible future pipeline sources (and lets us distinguish between actual named pipeline creation and the validation create() that executes before we write a pipeline definition to cluster state). Signed-off-by: Michael Froh <froh@amazon.com> * Move PipelineSource into PipelineContext and explicitly pass to create Signed-off-by: Michael Froh <froh@amazon.com> * Fix formatting on merge conflict resolution Signed-off-by: Michael Froh <froh@amazon.com> --------- Signed-off-by: Michael Froh <froh@amazon.com> (cherry picked from commit 431b246)

… (#8522) * [Search pipelines] Pass "adhocness" flag to processor factories A named search pipeline may be created with a PUT request, while an "anonymous" or "ad hoc" search pipeline can be defined in the search request source. In the latter case, we don't want to create any "resource-heavy" processors, since they're potentially increasing the cost of every search request, whereas names pipeline processors get reused. This change passes a configuration flag to a processor factory if it's being called as part of an ad hoc pipeline. The factory can use that information to avoid allocating expensive resources (maybe by throwing an exception instead). Backport from commit 431b246 --------- Signed-off-by: Michael Froh <froh@amazon.com> (cherry picked from commit 431b246)

@dblock

…search-project#8164) * [Search pipelines] Pass "adhocness" flag to processor factories A named search pipeline may be created with a PUT request, while an "anonymous" or "ad hoc" search pipeline can be defined in the search request source. In the latter case, we don't want to create any "resource-heavy" processors, since they're potentially increasing the cost of every search request, whereas names pipeline processors get reused. This change passes a configuration flag to a processor factory if it's being called as part of an ad hoc pipeline. The factory can use that information to avoid allocating expensive resources (maybe by throwing an exception instead). Signed-off-by: Michael Froh <froh@amazon.com> * Pass pipeline creation source as enum Thanks to @dblock for the suggestion to pass the pipeline creation source in a way that accounts for possible future pipeline sources (and lets us distinguish between actual named pipeline creation and the validation create() that executes before we write a pipeline definition to cluster state). Signed-off-by: Michael Froh <froh@amazon.com> * Move PipelineSource into PipelineContext and explicitly pass to create Signed-off-by: Michael Froh <froh@amazon.com> * Fix formatting on merge conflict resolution Signed-off-by: Michael Froh <froh@amazon.com> --------- Signed-off-by: Michael Froh <froh@amazon.com>

@dblock

…search-project#8164) * [Search pipelines] Pass "adhocness" flag to processor factories A named search pipeline may be created with a PUT request, while an "anonymous" or "ad hoc" search pipeline can be defined in the search request source. In the latter case, we don't want to create any "resource-heavy" processors, since they're potentially increasing the cost of every search request, whereas names pipeline processors get reused. This change passes a configuration flag to a processor factory if it's being called as part of an ad hoc pipeline. The factory can use that information to avoid allocating expensive resources (maybe by throwing an exception instead). Signed-off-by: Michael Froh <froh@amazon.com> * Pass pipeline creation source as enum Thanks to @dblock for the suggestion to pass the pipeline creation source in a way that accounts for possible future pipeline sources (and lets us distinguish between actual named pipeline creation and the validation create() that executes before we write a pipeline definition to cluster state). Signed-off-by: Michael Froh <froh@amazon.com> * Move PipelineSource into PipelineContext and explicitly pass to create Signed-off-by: Michael Froh <froh@amazon.com> * Fix formatting on merge conflict resolution Signed-off-by: Michael Froh <froh@amazon.com> --------- Signed-off-by: Michael Froh <froh@amazon.com>

@dblock

…search-project#8164) * [Search pipelines] Pass "adhocness" flag to processor factories A named search pipeline may be created with a PUT request, while an "anonymous" or "ad hoc" search pipeline can be defined in the search request source. In the latter case, we don't want to create any "resource-heavy" processors, since they're potentially increasing the cost of every search request, whereas names pipeline processors get reused. This change passes a configuration flag to a processor factory if it's being called as part of an ad hoc pipeline. The factory can use that information to avoid allocating expensive resources (maybe by throwing an exception instead). Signed-off-by: Michael Froh <froh@amazon.com> * Pass pipeline creation source as enum Thanks to @dblock for the suggestion to pass the pipeline creation source in a way that accounts for possible future pipeline sources (and lets us distinguish between actual named pipeline creation and the validation create() that executes before we write a pipeline definition to cluster state). Signed-off-by: Michael Froh <froh@amazon.com> * Move PipelineSource into PipelineContext and explicitly pass to create Signed-off-by: Michael Froh <froh@amazon.com> * Fix formatting on merge conflict resolution Signed-off-by: Michael Froh <froh@amazon.com> --------- Signed-off-by: Michael Froh <froh@amazon.com> Signed-off-by: sahil buddharaju <sahilbud@amazon.com>

@dblock

…search-project#8164) * [Search pipelines] Pass "adhocness" flag to processor factories A named search pipeline may be created with a PUT request, while an "anonymous" or "ad hoc" search pipeline can be defined in the search request source. In the latter case, we don't want to create any "resource-heavy" processors, since they're potentially increasing the cost of every search request, whereas names pipeline processors get reused. This change passes a configuration flag to a processor factory if it's being called as part of an ad hoc pipeline. The factory can use that information to avoid allocating expensive resources (maybe by throwing an exception instead). Signed-off-by: Michael Froh <froh@amazon.com> * Pass pipeline creation source as enum Thanks to @dblock for the suggestion to pass the pipeline creation source in a way that accounts for possible future pipeline sources (and lets us distinguish between actual named pipeline creation and the validation create() that executes before we write a pipeline definition to cluster state). Signed-off-by: Michael Froh <froh@amazon.com> * Move PipelineSource into PipelineContext and explicitly pass to create Signed-off-by: Michael Froh <froh@amazon.com> * Fix formatting on merge conflict resolution Signed-off-by: Michael Froh <froh@amazon.com> --------- Signed-off-by: Michael Froh <froh@amazon.com>

@dblock

…search-project#8164) * [Search pipelines] Pass "adhocness" flag to processor factories A named search pipeline may be created with a PUT request, while an "anonymous" or "ad hoc" search pipeline can be defined in the search request source. In the latter case, we don't want to create any "resource-heavy" processors, since they're potentially increasing the cost of every search request, whereas names pipeline processors get reused. This change passes a configuration flag to a processor factory if it's being called as part of an ad hoc pipeline. The factory can use that information to avoid allocating expensive resources (maybe by throwing an exception instead). Signed-off-by: Michael Froh <froh@amazon.com> * Pass pipeline creation source as enum Thanks to @dblock for the suggestion to pass the pipeline creation source in a way that accounts for possible future pipeline sources (and lets us distinguish between actual named pipeline creation and the validation create() that executes before we write a pipeline definition to cluster state). Signed-off-by: Michael Froh <froh@amazon.com> * Move PipelineSource into PipelineContext and explicitly pass to create Signed-off-by: Michael Froh <froh@amazon.com> * Fix formatting on merge conflict resolution Signed-off-by: Michael Froh <froh@amazon.com> --------- Signed-off-by: Michael Froh <froh@amazon.com> Signed-off-by: Shivansh Arora <hishiv@amazon.com>

msfroh force-pushed the adhocness_flag branch from b0deccb to d51d2fa Compare June 20, 2023 01:26

msfroh force-pushed the adhocness_flag branch from d51d2fa to c591ca3 Compare June 20, 2023 17:53

msfroh force-pushed the adhocness_flag branch from c591ca3 to 4c8a77e Compare June 20, 2023 18:29

msfroh force-pushed the adhocness_flag branch from 4c8a77e to 2e873c1 Compare June 20, 2023 22:50

msfroh marked this pull request as ready for review June 21, 2023 01:35

msfroh requested review from reta, anasalkouz, andrross, Bukhtawar, CEHENKLE, dblock, gbbafna, setiah, kartg, kotwanikunal, mch2, nknize, owaiskazi19, Rishikesh1159, ryanbogan, saratvemulapalli, shwetathareja, dreamer-89 and tlfeng as code owners June 21, 2023 01:35

msfroh force-pushed the adhocness_flag branch from e01d8a1 to 502c703 Compare July 2, 2023 05:21

macohen assigned msfroh Jul 3, 2023

mingshl added the v2.9.0 'Issues and PRs related to version v2.9.0' label Jul 3, 2023

mingshl requested a review from dblock July 3, 2023 17:43

msfroh added 4 commits July 6, 2023 09:09

Move PipelineSource into PipelineContext and explicitly pass to create

b25f677

Signed-off-by: Michael Froh <froh@amazon.com>

Fix formatting on merge conflict resolution

b391748

Signed-off-by: Michael Froh <froh@amazon.com>

msfroh force-pushed the adhocness_flag branch from 502c703 to b391748 Compare July 6, 2023 09:09

dblock approved these changes Jul 6, 2023

View reviewed changes

dblock merged commit 431b246 into opensearch-project:main Jul 6, 2023

dblock added the backport 2.x Backport to 2.x branch label Jul 6, 2023

mingshl mentioned this pull request Jul 10, 2023

[Search pipelines] Pass "adhocness" flag to processor factories (#8164) #8522

Merged

6 tasks

msfroh mentioned this pull request Sep 1, 2023

[Search Pipelines] Add request-scoped state shared between processors #9405

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Search pipelines] Pass "adhocness" flag to processor factories #8164

[Search pipelines] Pass "adhocness" flag to processor factories #8164

msfroh commented Jun 20, 2023 •

edited

Loading

github-actions bot commented Jun 20, 2023

github-actions bot commented Jun 20, 2023

github-actions bot commented Jun 20, 2023

github-actions bot commented Jun 20, 2023

codecov bot commented Jun 20, 2023 •

edited

Loading

github-actions bot commented Jun 20, 2023

github-actions bot commented Jul 2, 2023

github-actions bot commented Jul 6, 2023

macohen commented Jul 6, 2023

opensearch-trigger-bot bot commented Jul 6, 2023

dblock commented Jul 6, 2023

mingshl commented Jul 7, 2023

msfroh commented Jul 7, 2023

msfroh commented Jul 7, 2023

[Search pipelines] Pass "adhocness" flag to processor factories #8164

[Search pipelines] Pass "adhocness" flag to processor factories #8164

Conversation

msfroh commented Jun 20, 2023 • edited Loading

Description

Related Issues

Check List

github-actions bot commented Jun 20, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Jun 20, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Jun 20, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Jun 20, 2023

Gradle Check (Jenkins) Run Completed with:

codecov bot commented Jun 20, 2023 • edited Loading

Codecov Report

github-actions bot commented Jun 20, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Jul 2, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Jul 6, 2023

Gradle Check (Jenkins) Run Completed with:

macohen commented Jul 6, 2023

opensearch-trigger-bot bot commented Jul 6, 2023

dblock commented Jul 6, 2023

mingshl commented Jul 7, 2023

msfroh commented Jul 7, 2023

msfroh commented Jul 7, 2023

msfroh commented Jun 20, 2023 •

edited

Loading

codecov bot commented Jun 20, 2023 •

edited

Loading