Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wait_for_completion parameter to resize, open, and forcemerge APIs (#6228) #6434

Merged

Conversation

gaobinlong
Copy link
Collaborator

@gaobinlong gaobinlong commented Feb 22, 2023

Description

Relates to #6228 .
The main changes of this PR are:

  1. Add wait_for_completion query parameter to shrink/split/clone/open/forcemerge APIs, when the parameter is set to false, the API will return a task immediately and then run the long running operation in background, when the long running operation completes the result of the generated task will be stored into the .tasks index. The parameter wait_for_completion defaults to true, so it will not impact the old behavior of the API which will block until the operation completes or timeout. Additionally, except forcemerge API, a parameter named task_execution_timeout is added to shrink/split/clone/open APIs, that's because for these long running operations, we have to wait the shards of the index specified by users to be STARTED, but if the shards cannot be assigned then the generated task will not finish, so we must have a timeout parameter to control the task's max execution time, when the timeout expires, the task's result will be stored into the .tasks index. The task_execution_timeout parameter defaults to 1h.
  2. For shrink/split/clone APIs, move some request validation in advance so that users can get validation errors immediately when they call these APIs with the parameter wait_for_completion setting to false, if we don't do so, users have to check the generated task's status by _tasks API and then they can see the validation errors.
  3. Fix a bug found in shrink/split/clone API, the query parameter timeout and cluster_manager_timeout take no effect, that's because the value of the both parameters are not passed to TargetIndexRequest, so the value user specified are lost. Maybe it's hard to test both of the two parameters so we don't have this test case.
  4. Add some unit tests and yaml rest tests for the changes.

Issues Resolved

#6228

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.test.rest.ClientYamlTestSuiteIT.test {p0=indices.stats/10_index/Index - blank}
      1 org.opensearch.cluster.allocation.AwarenessAllocationIT.testThreeZoneOneReplicaWithForceZoneValueAndLoadAwareness

@gaobinlong
Copy link
Collaborator Author

gaobinlong commented Feb 28, 2023

@andrross can you help to take a look at this PR?

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockIsRemovedWhenAnyNodesNotExceedHighWatermarkWithAutoReleaseEnabled

@gaobinlong
Copy link
Collaborator Author

The failed test
org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockIsRemovedWhenAnyNodesNotExceedHighWatermarkWithAutoReleaseEnabled is a flaky test and there is an issue opened.

Copy link
Member

@andrross andrross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but I honestly don't know this code particularly well. @reta Do you have a few minutes to give it a second look?

CHANGELOG.md Outdated
@@ -8,6 +8,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- Support for HTTP/2 (server-side) ([#3847](https://github.com/opensearch-project/OpenSearch/pull/3847))
- Add getter for path field in NestedQueryBuilder ([#4636](https://github.com/opensearch-project/OpenSearch/pull/4636))
- Allow mmap to use new JDK-19 preview APIs in Apache Lucene 9.4+ ([#5151](https://github.com/opensearch-project/OpenSearch/pull/5151))
- Add wait_for_completion parameter to resize&open&forcemerge APIs ([#6434](https://github.com/opensearch-project/OpenSearch/pull/6434))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to backport this to 2.x? It seems backwards compatible to me since it is just adding features to the APIs, unless I'm missing something. If the intention is to backport, then the changelog entry should go in the [Unreleased 2.x] section.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, nitpick to add spaces and probably use commas instead of multiple ampersands: "resize, open, and forcemerge"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we want to backport this PR to 2.x, I've changed the location and modified the description according to your suggestion, really appreciate it, and could you help to add a backport 2.x label?

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
@gaobinlong gaobinlong changed the title Add wait_for_completion parameter to resize&open&forcemerge APIs (#6228) Add wait_for_completion parameter to resize, open, and forcemerge APIs (#6228) Mar 10, 2023
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@@ -119,9 +120,29 @@ public ActionRequestValidationException validate() {
if (targetIndexRequest.settings().getByPrefix("index.sort.").isEmpty() == false) {
validationException = addValidationError("can't override index sort when resizing an index", validationException);
}
if (IndexMetadata.INDEX_ROUTING_PARTITION_SIZE_SETTING.exists(targetIndexRequest.settings())) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrross, do you think this will be a problem? The code change here, moves some simple validations from the transport layer to the rest layer, so that users can get the validation errors immediately when they call the resize APIs asynchronously(set wait_for_completion to false), will also change some validation errors' content a little when users call the resize APIs synchronously, for example, the old error is

{ "error" : { "root_cause" : [ { "type" : "illegal_argument_exception", "reason" : "cannot provide a routing partition size value when resizing an index" } ], "type" : "illegal_argument_exception", "reason" : "cannot provide a routing partition size value when resizing an index" }, "status" : 400 },

the new error is

{ "error":{ "root_cause":[ { "type":"action_request_validation_exception", "reason":"Validation Failed: 1: cannot provide a routing partition size value when resizing an index;" } ], "type":"action_request_validation_exception", "reason":"Validation Failed: 1: cannot provide a routing partition size value when resizing an index;" }, "status":400 },

the type and reason inside the error are changed, and that's the only impact on the existing function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems fine to me. The HTTP status code is the same, and from the Java side ActionRequestValidationException is a subclass of IllegalArgumentException.

@andrross andrross added the backport 2.x Backport to 2.x branch label Mar 10, 2023
Signed-off-by: Gao Binlong <gbinlong@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@gaobinlong
Copy link
Collaborator Author

@reta could you help to take a second look at this PR?

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@gaobinlong
Copy link
Collaborator Author

@andrross it seems that reta doesn't have much time to review this PR, could you help to have someone else to review this? We want to incorporate this PR into release 2.7 and some other functionality depends on this change, so it would be fine if we merge this PR in advance.

@andrross andrross merged commit 3fec567 into opensearch-project:main Mar 27, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-6434-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 3fec56756851f23c169f545282c3a44da4f43423
# Push it to GitHub
git push --set-upstream origin backport/backport-6434-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-6434-to-2.x.

gaobinlong added a commit to gaobinlong/OpenSearch that referenced this pull request Mar 28, 2023
opensearch-project#6434)

* Add wait_for_completion parameter to resize&open&forcemerge APIs (opensearch-project#6228)

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Modify changelog

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* change header of new file

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* modify changelog

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
(cherry picked from commit 3fec567)

Modify the yaml test file

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
andrross added a commit that referenced this pull request Mar 29, 2023
… forcemerge APIs (#6855)

* Add wait_for_completion parameter to resize, open, and forcemerge APIs (#6434)

* Add wait_for_completion parameter to resize&open&forcemerge APIs (#6228)

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Modify changelog

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* change header of new file

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* modify changelog

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
(cherry picked from commit 3fec567)

Modify the yaml test file

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Modify package name

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Co-authored-by: Andrew Ross <andrross@amazon.com>
mitrofmep pushed a commit to mitrofmep/OpenSearch that referenced this pull request Apr 5, 2023
opensearch-project#6434)

* Add wait_for_completion parameter to resize&open&forcemerge APIs (opensearch-project#6228)

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Modify changelog

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* change header of new file

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* modify changelog

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Signed-off-by: Valentin Mitrofanov <mitrofmep@gmail.com>
@reta reta mentioned this pull request Jul 17, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants