-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Segment Replication + Remote Store] GA performance test #8874
Labels
distributed framework
enhancement
Enhancement or improvement to existing feature or request
Indexing:Replication
Issues and PRs related to core replication framework eg segrep
Storage
Issues and PRs relating to data and metadata storage
v2.10.0
Comments
tlfeng
added
enhancement
Enhancement or improvement to existing feature or request
distributed framework
v2.10.0
labels
Jul 25, 2023
Bukhtawar
added
Indexing:Replication
Issues and PRs related to core replication framework eg segrep
Storage
Issues and PRs relating to data and metadata storage
labels
Jul 27, 2023
Found 3 kinds of S3 throttling symptom, when running tests with 10x larger shard size.
2
3
S3 throttling during translog upload is tracked in issue #7390, and throttling during segment upload is tracked in issue #7389. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
distributed framework
enhancement
Enhancement or improvement to existing feature or request
Indexing:Replication
Issues and PRs related to core replication framework eg segrep
Storage
Issues and PRs relating to data and metadata storage
v2.10.0
The issue is used to track the performance testing that defined in issue #8109, which needs to be completed before the general availability for using remote storage with segment replication.
Metrics that should be captured in addition what OSB reports:
All clusters should have 3 dedicated cluster manager nodes.
Small cluster = ~3 nodes
Large cluster = ~10 nodes
Use m5.xlarge node type for consistency.
Test Scenario:
Based on the comment #8109 (comment)
The below are commands used to run the benchmark test, take the first test scenario as an example:
The command to deploy CDK application of OpenSearch cluster:
cdk deploy "*" --require-approval never \ -c securityDisabled=true -c minDistribution=true -c region=us-west-2 \ -c distributionUrl='https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.10.0/latest/linux/x64/tar/builds/opensearch/dist/opensearch-min-2.10.0-linux-x64.tar.gz' \ -c cpuArch='x64' -c singleNodeCluster=false \ -c dataNodeCount=10 -c dataNodeStorage=$((100+$(shuf -i 1-20 -n 1))) \ -c distVersion=2.10.0 -c serverAccessType=prefixList -c restrictServerAccessTo=pl-f8a64391 \ -c additionalConfig='{ "opensearch.experimental.feature.segment_replication_experimental.enabled": true, "cluster.indices.replication.strategy": "SEGMENT", "opensearch.experimental.feature.remote_store.enabled": true, "s3.client.default.endpoint": "s3.us-west-2.amazonaws.com" }' \ -c vpcId='vpc-0648c0d077c3ea997' -c securityGroupId='sg-0d1ace406e4977a79' -c suffix='10nodes-500gb-1016' \ -c use50PercentHeap=true -c enableRemoteStore=true -c dataInstanceType=m5.xlarge \ -c storageVolumeType=gp3
The command to generate workload data:
expand-data-corpus.py --corpus-size 100 --output-file-suffix 100gb
The command to run benchmark for the test scenario:
opensearch-benchmark execute-test --workload=http_logs \ --pipeline=benchmark-only --target-hosts=opens-clust-FB9PYML03F8V-7a9f77cc6decb7eb.elb.us-west-2.amazonaws.com:80 \ --workload-params='{"index_settings":{"number_of_shards": 10, "number_of_replicas": 9 }, "generated_corpus": "t"}' \ --include-tasks=delete-index,create-index,check-cluster-health,index-append \ --telemetry=node-stats,segment-replication-stats \ --user-tag="replication_type:segment,remote_store:enabled,node_count:10,shard_count:10,replica_count:9,shard_size_in_gb:5"
The text was updated successfully, but these errors were encountered: