Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Connector-V2][ES] Support dsl filter #4130

Merged
merged 9 commits into from
Mar 16, 2023
43 changes: 26 additions & 17 deletions docs/en/connector-v2/source/Elasticsearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,23 +19,24 @@ support version >= 2.x and < 8.x.

## Options

| name | type | required | default value |
|-------------------------|---------|----------|---------------|
| hosts | array | yes | - |
| username | string | no | - |
| password | string | no | - |
| index | string | yes | - |
| source | array | no | - |
| scroll_time | string | no | 1m |
| scroll_size | int | no | 100 |
| schema | | no | - |
| tls_verify_certificate | boolean | no | true |
| tls_verify_hostnames | boolean | no | true |
| tls_keystore_path | string | no | - |
| tls_keystore_password | string | no | - |
| tls_truststore_path | string | no | - |
| tls_truststore_password | string | no | - |
| common-options | | no | - |
| name | type | required | default value |
|-------------------------|---------|----------|-------------------|
| hosts | array | yes | - |
| username | string | no | - |
| password | string | no | - |
| index | string | yes | - |
| source | array | no | - |
| query | json | no | {"match_all": {}} |
| scroll_time | string | no | 1m |
| scroll_size | int | no | 100 |
| schema | | no | - |
| tls_verify_certificate | boolean | no | true |
| tls_verify_hostnames | boolean | no | true |
| tls_keystore_path | string | no | - |
| tls_keystore_password | string | no | - |
| tls_truststore_path | string | no | - |
| tls_truststore_password | string | no | - |
| common-options | | no | - |

### hosts [array]

Expand All @@ -59,6 +60,11 @@ The fields of index.
You can get the document id by specifying the field `_id`.If sink _id to other index,you need specify an alias for _id due to the Elasticsearch limit.
If you don't config source, you must config `schema`.

### query [json]

Elasticsearch DSL.
You can control the range of data read

### scroll_time [String]

Amount of time Elasticsearch will keep the search context alive for scroll requests.
Expand Down Expand Up @@ -109,6 +115,7 @@ Elasticsearch {
hosts = ["localhost:9200"]
index = "seatunnel-*"
source = ["_id","name","age"]
query = {"range":{"firstPacket":{"gte":1669225429990,"lte":1669225429990}}}
}
```

Expand Down Expand Up @@ -136,6 +143,7 @@ Elasticsearch {
c_timestamp = timestamp
}
}
query = {"range":{"firstPacket":{"gte":1669225429990,"lte":1669225429990}}}
}
```

Expand Down Expand Up @@ -188,4 +196,5 @@ source {

- Add Elasticsearch Source Connector
- [Feature] Support https protocol & compatible with opensearch ([3997](https://github.com/apache/incubator-seatunnel/pull/3997))
- [Feature] Support DSL

1 change: 1 addition & 0 deletions release-note.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
- [ALL]Add FieldMapper Transform #3781
### Connectors
- [Elasticsearch] Support https protocol & compatible with opensearch
- [Elasticsearch] Support DSL
- [Hbase] Add hbase sink connector #4049
### Formats
- [Canal]Support read canal format message #3950
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -298,10 +298,12 @@ public void close() {
* @param scrollSize fetch documents count in one request
*/
public ScrollResult searchByScroll(
String index, List<String> source, String scrollTime, int scrollSize) {
String index,
List<String> source,
Map<String, Object> query,
String scrollTime,
int scrollSize) {
Map<String, Object> param = new HashMap<>();
Map<String, Object> query = new HashMap<>();
query.put("match_all", new HashMap<String, String>());
param.put("query", query);
param.put("_source", source);
param.put("sort", new String[] {"_doc"});
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,10 @@
import org.apache.seatunnel.api.configuration.Option;
import org.apache.seatunnel.api.configuration.Options;

import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class SourceConfig {

Expand Down Expand Up @@ -51,4 +54,12 @@ public class SourceConfig {
.defaultValue(100)
.withDescription(
"Maximum number of hits to be returned with each Elasticsearch scroll request");

public static final Option<Map> QUERY =
Options.key("query")
.objectType(Map.class)
.defaultValue(
Collections.singletonMap("match_all", new HashMap<String, String>()))
.withDescription(
"Elasticsearch query language. You can control the range of data read");
}
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,14 @@

import java.io.Serializable;
import java.util.List;
import java.util.Map;

@Data
@AllArgsConstructor
public class SourceIndexInfo implements Serializable {
private String index;
private List<String> source;
private Map<String, Object> query;
private String scrollTime;
private int scrollSize;
}
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
import static org.apache.seatunnel.connectors.seatunnel.elasticsearch.config.EsClusterConnectionConfig.TLS_VERIFY_HOSTNAME;
import static org.apache.seatunnel.connectors.seatunnel.elasticsearch.config.EsClusterConnectionConfig.USERNAME;
import static org.apache.seatunnel.connectors.seatunnel.elasticsearch.config.SourceConfig.INDEX;
import static org.apache.seatunnel.connectors.seatunnel.elasticsearch.config.SourceConfig.QUERY;
import static org.apache.seatunnel.connectors.seatunnel.elasticsearch.config.SourceConfig.SCROLL_SIZE;
import static org.apache.seatunnel.connectors.seatunnel.elasticsearch.config.SourceConfig.SCROLL_TIME;
import static org.apache.seatunnel.connectors.seatunnel.elasticsearch.config.SourceConfig.SOURCE;
Expand All @@ -55,6 +56,7 @@ public OptionRule optionRule() {
PASSWORD,
SCROLL_TIME,
SCROLL_SIZE,
QUERY,
TLS_VERIFY_CERTIFICATE,
TLS_VERIFY_HOSTNAME,
TLS_KEY_STORE_PATH,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ public void pollNext(Collector<SeaTunnelRow> output) throws Exception {
esRestClient.searchByScroll(
sourceIndexInfo.getIndex(),
sourceIndexInfo.getSource(),
sourceIndexInfo.getQuery(),
sourceIndexInfo.getScrollTime(),
sourceIndexInfo.getScrollSize());
outputFromScrollResult(scrollResult, sourceIndexInfo.getSource(), output);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,10 @@ private List<ElasticsearchSourceSplit> getElasticsearchSplit() {
if (pluginConfig.hasPath(SourceConfig.SCROLL_SIZE.key())) {
scrollSize = pluginConfig.getInt(SourceConfig.SCROLL_SIZE.key());
}
Map query = SourceConfig.QUERY.defaultValue();
if (pluginConfig.hasPath(SourceConfig.QUERY.key())) {
query = (Map) pluginConfig.getAnyRef(SourceConfig.QUERY.key());
}

List<IndexDocsCount> indexDocsCounts =
esRestClient.getIndexDocsCount(pluginConfig.getString(SourceConfig.INDEX.key()));
Expand All @@ -162,7 +166,11 @@ private List<ElasticsearchSourceSplit> getElasticsearchSplit() {
new ElasticsearchSourceSplit(
String.valueOf(indexDocsCount.getIndex().hashCode()),
new SourceIndexInfo(
indexDocsCount.getIndex(), source, scrollTime, scrollSize)));
indexDocsCount.getIndex(),
source,
query,
scrollTime,
scrollSize)));
}
return splits;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
package org.apache.seatunnel.e2e.connector.elasticsearch;

import org.apache.seatunnel.shade.com.fasterxml.jackson.core.JsonProcessingException;
import org.apache.seatunnel.shade.com.fasterxml.jackson.databind.JsonNode;
import org.apache.seatunnel.shade.com.fasterxml.jackson.databind.ObjectMapper;

import org.apache.seatunnel.common.utils.JsonUtils;
Expand Down Expand Up @@ -125,8 +126,9 @@ public void testElasticsearch(TestContainer container)
Container.ExecResult execResult =
container.executeJob("/elasticsearch/elasticsearch_source_and_sink.conf");
Assertions.assertEquals(0, execResult.getExitCode());
List<String> sinData = readSinkData();
Assertions.assertIterableEquals(testDataset, sinData);
List<String> sinkData = readSinkData();
// for DSL is: {"range":{"c_int":{"gte":10,"lte":20}}}
Assertions.assertIterableEquals(mapTestDatasetForDSL(), sinkData);
}

private List<String> generateTestDataSet()
Expand Down Expand Up @@ -197,7 +199,15 @@ private List<String> readSinkData() throws InterruptedException {
"c_bytes",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add e2e testcase(test job config)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed @hailin0

"c_date",
"c_timestamp");
ScrollResult scrollResult = esRestClient.searchByScroll("st_index2", source, "1m", 1000);
HashMap<String, Object> rangeParam = new HashMap<>();
rangeParam.put("gte", 10);
rangeParam.put("lte", 20);
HashMap<String, Object> range = new HashMap<>();
range.put("c_int", rangeParam);
Map<String, Object> query = new HashMap<>();
query.put("range", range);
ScrollResult scrollResult =
esRestClient.searchByScroll("st_index2", source, query, "1m", 1000);
scrollResult
.getDocs()
.forEach(
Expand All @@ -216,6 +226,21 @@ private List<String> readSinkData() throws InterruptedException {
return docs;
}

private List<String> mapTestDatasetForDSL() {
return testDataset.stream()
.map(JsonUtils::parseObject)
.filter(
node -> {
if (node.hasNonNull("c_int")) {
int cInt = node.get("c_int").asInt();
return cInt >= 10 && cInt <= 20;
}
return false;
})
.map(JsonNode::toString)
.collect(Collectors.toList());
}

@AfterEach
@Override
public void tearDown() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ source {
tls_verify_hostname = false

index = "st_index"
query = {"range":{"c_int":{"gte":10,"lte":20}}}
schema = {
fields {
c_map = "map<string, tinyint>"
Expand Down