New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add expand data corpus instructions #8807

Open

Naarcha-AWS wants to merge 16 commits into main from expand-data-corpus

+83 −0

Collaborator

Naarcha-AWS commented Nov 25, 2024

Fixes opensearch-project/opensearch-benchmark#672

Checklist

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.


          Add expand data corpus instructions

b3fb0ed

Signed-off-by: Archer <naarcha@amazon.com>

Naarcha-AWS added 3 - Tech review benchmark backport 2.18 labels

Naarcha-AWS self-assigned this

Naarcha-AWS requested review from kolchfa-aws, vagimeli, AMoo-Miki, natebower, dlvenable and epugh as code owners

November 25, 2024 19:10

github-actions bot commented Nov 25, 2024

Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged.

Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer.

When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review.

IanHoang reviewed

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated


		# Expanding the data corpus of a workload

		This tutorial shows you how to use the [`expand-data-corpus.py`](https://github.com/opensearch-project/opensearch-benchmark/blob/main/scripts/expand-data-corpus.py) script to increase the size of the data corpus for a OpenSearch Becnhmark workload. This can help assist in running the `https_logs` Benchmark with a larger scale, for instance, with clusters containing multiple data nodes.

Contributor

IanHoang Nov 25, 2024

Nit: We can simplify the last sentence:

This is helpful when running time-series workloads like http_logs against a large scale OpenSearch cluster.

IanHoang reviewed

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved


          Apply suggestions from code review

e575773

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved


          Apply suggestions from code review

56a3c75

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

IanHoang suggested changes

View reviewed changes

Contributor

IanHoang left a comment

Would recommend getting feedback from @gkamat as he has more experience with this and might have additional comments


          Merge branch 'main' into expand-data-corpus

729185a

gkamat suggested changes

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS added 3 commits

December 3, 2024 10:12


          Apply suggestions from code review

3f7fda9

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>


          Update _benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md

b8210d1

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>


          Merge branch 'main' into expand-data-corpus

b5cc889

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved


          Apply suggestions from code review

2824c85

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved


          Apply suggestions from code review

e0d6e1c

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved


          Apply suggestions from code review

4be8fb8

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

gkamat suggested changes

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated

+              To use this tutorial, make sure you fulfill the following prerequisites:
+. Python 3.x or greater installed.
+. The `http_logs` workload data corpus already in your load generation host where benchmark is running.

Contributor

gkamat Dec 10, 2024

corpus is already available in your load generation host where OSB is running.

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved


          Apply suggestions from code review

1383d5a

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved


          Apply suggestions from code review

dbfabf0

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

gkamat approved these changes

View reviewed changes

Naarcha-AWS added 4 - Doc review and removed 3 - Tech review labels


          Merge branch 'main' into expand-data-corpus

baf4e23

natebower requested changes

View reviewed changes

Collaborator

natebower left a comment

@Naarcha-AWS Please see my comments and changes and tag me for approval once addressed. Thanks!

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated

@@ @@ -0,0 +1,83 @@ @@
+              ---
+              layout: default
+              title: Expand data corpus

Collaborator

natebower Dec 13, 2024

Should this be "Expanding a data corpus"?

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated

+              grand_parent: User guide
+              ---
+              # Expanding the data corpus of a workload

Collaborator

natebower Dec 13, 2024

"Expanding a workload data corpus"?

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated

+              To use this tutorial, make sure you fulfill the following prerequisites:
+. Python 3.x or greater installed.
+. The `http_logs` workload data corpus is already in your load generation host where OpenSearch Benchmark is running.

Collaborator

natebower Dec 13, 2024

Something like "is already stored on the load generation host running OpenSearch Benchmark"?

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS commented

View reviewed changes

_benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md Outdated Show resolved Hide resolved

Naarcha-AWS and others added 3 commits

December 13, 2024 10:56


          Apply suggestions from code review

0f522c9

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>


          Update _benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md

fd2f4a9

Signed-off-by: Nathan Bower <nbower@amazon.com>


          Update _benchmark/user-guide/optimizing-benchmarks/expand-data-corpus.md

2aa64e8

Signed-off-by: Nathan Bower <nbower@amazon.com>

natebower approved these changes

View reviewed changes

Collaborator

natebower left a comment

@Naarcha-AWS LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

IanHoang IanHoang requested changes

natebower natebower approved these changes

gkamat gkamat approved these changes

kolchfa-aws Awaiting requested review from kolchfa-aws kolchfa-aws is a code owner

vagimeli Awaiting requested review from vagimeli

AMoo-Miki Awaiting requested review from AMoo-Miki AMoo-Miki is a code owner

dlvenable Awaiting requested review from dlvenable dlvenable is a code owner

epugh Awaiting requested review from epugh epugh is a code owner

Labels

4 - Doc review backport 2.18 benchmark