-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add expand data corpus instructions #8807
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Archer <naarcha@amazon.com>
Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged. Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer. When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review. |
|
||
# Expanding the data corpus of a workload | ||
|
||
This tutorial shows you how to use the [`expand-data-corpus.py`](https://github.com/opensearch-project/opensearch-benchmark/blob/main/scripts/expand-data-corpus.py) script to increase the size of the data corpus for a OpenSearch Becnhmark workload. This can help assist in running the `https_logs` Benchmark with a larger scale, for instance, with clusters containing multiple data nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We can simplify the last sentence:
This is helpful when running time-series workloads like
http_logs
against a large scale OpenSearch cluster.
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would recommend getting feedback from @gkamat as he has more experience with this and might have additional comments
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
To use this tutorial, make sure you fulfill the following prerequisites: | ||
|
||
1. Python 3.x or greater installed. | ||
2. The `http_logs` workload data corpus already in your load generation host where benchmark is running. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corpus is already available in your load generation host where OSB is running.
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Naarcha-AWS Please see my comments and changes and tag me for approval once addressed. Thanks!
@@ -0,0 +1,83 @@ | |||
--- | |||
layout: default | |||
title: Expand data corpus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be "Expanding a data corpus"?
grand_parent: User guide | ||
--- | ||
|
||
# Expanding the data corpus of a workload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Expanding a workload data corpus"?
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
To use this tutorial, make sure you fulfill the following prerequisites: | ||
|
||
1. Python 3.x or greater installed. | ||
2. The `http_logs` workload data corpus is already in your load generation host where OpenSearch Benchmark is running. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like "is already stored on the load generation host running OpenSearch Benchmark"?
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Signed-off-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Nathan Bower <nbower@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Naarcha-AWS LGTM!
Fixes opensearch-project/opensearch-benchmark#672
Checklist
For more information on following Developer Certificate of Origin and signing off your commits, please check here.