Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support batch ingestion in bulk API (#12457) #13306

Merged
merged 18 commits into from
Apr 30, 2024
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- [Streaming Indexing] Ensure support of the new transport by security plugin ([#13174](https://github.com/opensearch-project/OpenSearch/pull/13174))
- Add cluster setting to dynamically configure the buckets for filter rewrite optimization. ([#13179](https://github.com/opensearch-project/OpenSearch/pull/13179))
- [Remote Store] Add capability of doing refresh as determined by the translog ([#12992](https://github.com/opensearch-project/OpenSearch/pull/12992))
- Add support for batch ingestion capability ([#12457](https://github.com/opensearch-project/OpenSearch/issues/12457))
chishui marked this conversation as resolved.
Show resolved Hide resolved

### Dependencies
- Bump `org.apache.commons:commons-configuration2` from 2.10.0 to 2.10.1 ([#12896](https://github.com/opensearch-project/OpenSearch/pull/12896))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -167,3 +167,43 @@ teardown:
index: test_index
id: test_id3
- match: { _source: {"f1": "v2", "f2": 47, "field1": "value1"}}

---
"Test bulk API with batch enabled":
chishui marked this conversation as resolved.
Show resolved Hide resolved
- skip:
version: " - 2.13.99"
reason: "Added in 2.14.0"

- do:
bulk:
refresh: true
batch_ingestion_option: "enabled"
maximum_batch_size: 2
pipeline: "pipeline1"
body:
- '{"index": {"_index": "test_index", "_id": "test_id1"}}'
- '{"text": "text1"}'
- '{"index": {"_index": "test_index", "_id": "test_id2"}}'
- '{"text": "text2"}'
- '{"index": {"_index": "test_index", "_id": "test_id3"}}'
- '{"text": "text3"}'
- '{"index": {"_index": "test_index", "_id": "test_id4"}}'
- '{"text": "text4"}'
- '{"index": {"_index": "test_index", "_id": "test_id5", "pipeline": "pipeline2"}}'
- '{"text": "text5"}'
- '{"index": {"_index": "test_index", "_id": "test_id6", "pipeline": "pipeline2"}}'
- '{"text": "text6"}'

- match: { errors: false }

- do:
get:
index: test_index
id: test_id5
- match: { _source: {"text": "text5", "field2": "value2"}}

- do:
get:
index: test_index
id: test_id3
- match: { _source: { "text": "text3", "field1": "value1" } }
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,14 @@
"require_alias": {
"type": "boolean",
"description": "Sets require_alias for all incoming documents. Defaults to unset (false)"
},
"maximum_batch_size": {
chishui marked this conversation as resolved.
Show resolved Hide resolved
"type": "int",
"description": "set the batch size"
chishui marked this conversation as resolved.
Show resolved Hide resolved
},
"batch_ingestion_option": {
"type": "string",
"description": "A signal to enable batch ingestion"
chishui marked this conversation as resolved.
Show resolved Hide resolved
}
},
"body":{
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
/*
* SPDX-License-Identifier: Apache-2.0
*
* The OpenSearch Contributors require contributions made to
* this file be licensed under the Apache-2.0 license or a
* compatible open source license.
*/

package org.opensearch.action.bulk;

import org.opensearch.common.annotation.PublicApi;

/**
* An enum for batch ingestion option.
*/
@PublicApi(since = "2.14.0")
public enum BatchIngestionOption {
NONE("disabled"),
chishui marked this conversation as resolved.
Show resolved Hide resolved
ENABLED("enabled");

private final String value;

public String getValue() {
return this.value;
}

BatchIngestionOption(String value) {
this.value = value;
}

static BatchIngestionOption from(String value) {
if (value == null || value.isBlank()) {
return NONE;
}
for (BatchIngestionOption option : values()) {
if (option.getValue().equals(value)) {
return option;
}
}
dbwiddis marked this conversation as resolved.
Show resolved Hide resolved
throw new IllegalArgumentException("Unknown value for batch ingestion option: [" + value + "].");

Check warning on line 40 in server/src/main/java/org/opensearch/action/bulk/BatchIngestionOption.java

View check run for this annotation

Codecov / codecov/patch

server/src/main/java/org/opensearch/action/bulk/BatchIngestionOption.java#L40

Added line #L40 was not covered by tests
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@

import org.apache.lucene.util.Accountable;
import org.apache.lucene.util.RamUsageEstimator;
import org.opensearch.Version;
import org.opensearch.action.ActionRequest;
import org.opensearch.action.ActionRequestValidationException;
import org.opensearch.action.CompositeIndicesRequest;
Expand Down Expand Up @@ -80,7 +81,7 @@
private static final long SHALLOW_SIZE = RamUsageEstimator.shallowSizeOfInstance(BulkRequest.class);

private static final int REQUEST_OVERHEAD = 50;

private static final Version MINIMAL_VERSION_SUPPORT_BATCH = Version.V_2_14_0;
/**
* Requests that are part of this request. It is only possible to add things that are both {@link ActionRequest}s and
* {@link WriteRequest}s to this but java doesn't support syntax to declare that everything in the array has both types so we declare
Expand All @@ -96,6 +97,8 @@
private String globalRouting;
private String globalIndex;
private Boolean globalRequireAlias;
private BatchIngestionOption batchIngestionOption = BatchIngestionOption.NONE;
private int maximumBatchSize = 1;

private long sizeInBytes = 0;

Expand All @@ -107,6 +110,10 @@
requests.addAll(in.readList(i -> DocWriteRequest.readDocumentRequest(null, i)));
refreshPolicy = RefreshPolicy.readFrom(in);
timeout = in.readTimeValue();
if (in.getVersion().onOrAfter(MINIMAL_VERSION_SUPPORT_BATCH)) {
chishui marked this conversation as resolved.
Show resolved Hide resolved
batchIngestionOption = in.readEnum(BatchIngestionOption.class);
maximumBatchSize = in.readInt();

Check warning on line 115 in server/src/main/java/org/opensearch/action/bulk/BulkRequest.java

View check run for this annotation

Codecov / codecov/patch

server/src/main/java/org/opensearch/action/bulk/BulkRequest.java#L114-L115

Added lines #L114 - L115 were not covered by tests
}
}

public BulkRequest(@Nullable String globalIndex) {
Expand Down Expand Up @@ -346,6 +353,46 @@
return this;
}

/**
* Convert string version of batch option and convert it to {@link BatchIngestionOption}
* @param batchOption a string input from request
* @return {@link BulkRequest}
*/
public BulkRequest batchIngestionOption(String batchOption) {
this.batchIngestionOption = BatchIngestionOption.from(batchOption);
dbwiddis marked this conversation as resolved.
Show resolved Hide resolved
return this;
}

/**
* Get batch ingestion option
* @return {@link BatchIngestionOption}
*/
public BatchIngestionOption batchIngestionOption() {
return this.batchIngestionOption;
}

/**
* Set maximum batch size
* @param size maximum batch size from input
* @return {@link BulkRequest}
*/
public BulkRequest maximumBatchSize(int size) {
if (size > 1) {
this.maximumBatchSize = size;
} else {
this.maximumBatchSize = 1;
}
chishui marked this conversation as resolved.
Show resolved Hide resolved
return this;
}

/**
* Get maximum batch size
* @return maximum batch size
*/
public int maximumBatchSize() {
chishui marked this conversation as resolved.
Show resolved Hide resolved
return this.maximumBatchSize;
}

/**
* Note for internal callers (NOT high level rest client),
* the global parameter setting is ignored when used with:
Expand Down Expand Up @@ -453,6 +500,10 @@
out.writeCollection(requests, DocWriteRequest::writeDocumentRequest);
refreshPolicy.writeTo(out);
out.writeTimeValue(timeout);
if (out.getVersion().onOrAfter(MINIMAL_VERSION_SUPPORT_BATCH)) {
chishui marked this conversation as resolved.
Show resolved Hide resolved
out.writeEnum(batchIngestionOption);
out.writeInt(maximumBatchSize);

Check warning on line 505 in server/src/main/java/org/opensearch/action/bulk/BulkRequest.java

View check run for this annotation

Codecov / codecov/patch

server/src/main/java/org/opensearch/action/bulk/BulkRequest.java#L504-L505

Added lines #L504 - L505 were not covered by tests
}
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -923,7 +923,8 @@ public boolean isForceExecution() {
}
},
bulkRequestModifier::markItemAsDropped,
executorName
executorName,
original
);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,14 @@ public void before() {
current.incrementAndGet();
}

/**
* Invoke before the given operation begins in multiple items at the same time.
* @param n number of items
*/
public void beforeN(int n) {
current.addAndGet(n);
}

/**
* Invoked upon completion (success or failure) of the given operation
* @param currentTime elapsed time of the operation
Expand All @@ -46,13 +54,35 @@ public void after(long currentTime) {
time.inc(currentTime);
}

/**
* Invoked upon completion (success or failure) of the given operation for multiple items.
* @param n number of items completed
* @param currentTime elapsed time of the operation
*/
public void afterN(int n, long currentTime) {
current.addAndGet(-n);
for (int i = 0; i < n; ++i) {
time.inc(currentTime);
}
}

/**
* Invoked upon failure of the operation.
*/
public void failed() {
failed.inc();
}

/**
* Invoked upon failure of the operation on multiple items.
* @param n number of items on operation.
*/
public void failedN(int n) {
for (int i = 0; i < n; ++i) {
failed.inc();
}
}

public void add(OperationMetrics other) {
// Don't try copying over current, since in-flight requests will be linked to the existing metrics instance.
failed.inc(other.failed.count());
Expand Down
Loading
Loading