Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md to include additional information on expand-data-corpus.py beta feature (#92) #93

Merged
merged 2 commits into from
Aug 9, 2023

Conversation

IanHoang
Copy link
Collaborator

@IanHoang IanHoang commented Aug 3, 2023

Description

This PR quickly adds an extra sentence explaining what directory expand-data-corpus.py influences.

Issues Resolved

#92

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…pus.py beta feature (opensearch-project#92)

Signed-off-by: Ian Hoang <hoangia@amazon.com>
@IanHoang IanHoang requested a review from gkamat as a code owner August 3, 2023 19:55
@ayushav12
Copy link

Let's add some more details in the documentation about provision of defining private workloads and how it differs from "default" workload.

@IanHoang
Copy link
Collaborator Author

IanHoang commented Aug 4, 2023

I'll also add info in the README.md there's already a parameter to specify repository to make the changes in.

Usage:

    Use the -h option to view script usage and options.

options:
  -h, --help            show this help message and exit
  -w WORKLOAD, --workload WORKLOAD
                        workload name, default: http_logs
  -r WORKLOAD_REPOSITORY, --workload-repository WORKLOAD_REPOSITORY
                        workload name, default: default
  -c CORPUS_SIZE, --corpus-size CORPUS_SIZE
                        size of corpus to generate in GB
  -o OUTPUT_FILE_SUFFIX, --output-file-suffix OUTPUT_FILE_SUFFIX
                        suffix for output file name, documents-SUFFIX.json, default: generated
  -f INPUT_FILE, --input-file INPUT_FILE
                        [EXPERT] input file name, default: documents-241998.json
  -n NUMBER_OF_DOCS, --number-of-docs NUMBER_OF_DOCS
                        [EXPERT] number of documents to generate
  -i INTERVAL, --interval INTERVAL
                        [EXPERT] interval between consecutive timestamps, use a negative number to specify multiple docs per timestamp
  -t START_TIMESTAMP, --start-timestamp START_TIMESTAMP
                        [EXPERT] start timestamp, default: 893964618
  -b BATCH_SIZE, --batch-size BATCH_SIZE
                        [EXPERT] batch size per OSB client thread, default: 50000

@@ -64,6 +64,7 @@ Once a corpus has been generated, it can be used for a test by supplying the fol

If there are multiple generated corpora files, they are all used concurrently. Ingestion of the generated and the default corpora are mutually exclusive in any single OSB run. Once ingested, however, queries packaged with this workload will operate on the entire loaded data set.

To see how this feature changes http_logs workload, visit the `~/.benchmark/benchmarks/workloads/default/http_logs/` directory and the `~/.benchmark/data/http_logs` directory.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "the" http_logs workload

Signed-off-by: Ian Hoang <hoangia@amazon.com>
@IanHoang
Copy link
Collaborator Author

IanHoang commented Aug 9, 2023

Let's add some more details in the documentation about provision of defining private workloads and how it differs from "default" workload.

@ayushav12 this is a good idea. However, we can add this to the official documentation as this README is pertains to the official workloads (default workloads) that come with OSB

@IanHoang IanHoang merged commit 4d07ee5 into opensearch-project:main Aug 9, 2023
gkamat pushed a commit to gkamat/opensearch-benchmark-workloads that referenced this pull request Aug 10, 2023
…pus.py beta feature (opensearch-project#92) (opensearch-project#93)

* Update README.md to include additional information on expand-data-corpus.py beta feature (opensearch-project#92)

Signed-off-by: Ian Hoang <hoangia@amazon.com>

* Revised README to better specify expand-data-corpus.py feature

Signed-off-by: Ian Hoang <hoangia@amazon.com>

---------

Signed-off-by: Ian Hoang <hoangia@amazon.com>
Co-authored-by: Ian Hoang <hoangia@amazon.com>
gkamat pushed a commit to gkamat/opensearch-benchmark-workloads that referenced this pull request Aug 10, 2023
…pus.py beta feature (opensearch-project#92) (opensearch-project#93)

* Update README.md to include additional information on expand-data-corpus.py beta feature (opensearch-project#92)

Signed-off-by: Ian Hoang <hoangia@amazon.com>

* Revised README to better specify expand-data-corpus.py feature

Signed-off-by: Ian Hoang <hoangia@amazon.com>

---------

Signed-off-by: Ian Hoang <hoangia@amazon.com>
Co-authored-by: Ian Hoang <hoangia@amazon.com>
Signed-off-by: Govind Kamat <govkamat@amazon.com>
gkamat pushed a commit that referenced this pull request Aug 11, 2023
…pus.py beta feature (#92) (#93)

* Update README.md to include additional information on expand-data-corpus.py beta feature (#92)

Signed-off-by: Ian Hoang <hoangia@amazon.com>

* Revised README to better specify expand-data-corpus.py feature

Signed-off-by: Ian Hoang <hoangia@amazon.com>

---------

Signed-off-by: Ian Hoang <hoangia@amazon.com>
Co-authored-by: Ian Hoang <hoangia@amazon.com>
Signed-off-by: Govind Kamat <govkamat@amazon.com>
gkamat pushed a commit to gkamat/opensearch-benchmark-workloads that referenced this pull request Aug 13, 2023
…pus.py beta feature (opensearch-project#92) (opensearch-project#93)

* Update README.md to include additional information on expand-data-corpus.py beta feature (opensearch-project#92)

Signed-off-by: Ian Hoang <hoangia@amazon.com>

* Revised README to better specify expand-data-corpus.py feature

Signed-off-by: Ian Hoang <hoangia@amazon.com>

---------

Signed-off-by: Ian Hoang <hoangia@amazon.com>
Co-authored-by: Ian Hoang <hoangia@amazon.com>
Signed-off-by: Govind Kamat <govkamat@amazon.com>
gkamat pushed a commit that referenced this pull request Aug 14, 2023
…pus.py beta feature (#92) (#93)

* Update README.md to include additional information on expand-data-corpus.py beta feature (#92)

Signed-off-by: Ian Hoang <hoangia@amazon.com>

* Revised README to better specify expand-data-corpus.py feature

Signed-off-by: Ian Hoang <hoangia@amazon.com>

---------

Signed-off-by: Ian Hoang <hoangia@amazon.com>
Co-authored-by: Ian Hoang <hoangia@amazon.com>
Signed-off-by: Govind Kamat <govkamat@amazon.com>
gkamat pushed a commit to gkamat/opensearch-benchmark-workloads that referenced this pull request Nov 18, 2023
…pus.py beta feature (opensearch-project#92) (opensearch-project#93)

* Update README.md to include additional information on expand-data-corpus.py beta feature (opensearch-project#92)

Signed-off-by: Ian Hoang <hoangia@amazon.com>

* Revised README to better specify expand-data-corpus.py feature

Signed-off-by: Ian Hoang <hoangia@amazon.com>

---------

Signed-off-by: Ian Hoang <hoangia@amazon.com>
Co-authored-by: Ian Hoang <hoangia@amazon.com>
gkamat pushed a commit to gkamat/opensearch-benchmark-workloads that referenced this pull request Nov 18, 2023
…pus.py beta feature (opensearch-project#92) (opensearch-project#93)

* Update README.md to include additional information on expand-data-corpus.py beta feature (opensearch-project#92)

Signed-off-by: Ian Hoang <hoangia@amazon.com>

* Revised README to better specify expand-data-corpus.py feature

Signed-off-by: Ian Hoang <hoangia@amazon.com>

---------

Signed-off-by: Ian Hoang <hoangia@amazon.com>
Co-authored-by: Ian Hoang <hoangia@amazon.com>
Signed-off-by: Govind Kamat <govkamat@amazon.com>
gkamat pushed a commit to gkamat/opensearch-benchmark-workloads that referenced this pull request Nov 18, 2023
…pus.py beta feature (opensearch-project#92) (opensearch-project#93)

* Update README.md to include additional information on expand-data-corpus.py beta feature (opensearch-project#92)

Signed-off-by: Ian Hoang <hoangia@amazon.com>

* Revised README to better specify expand-data-corpus.py feature

Signed-off-by: Ian Hoang <hoangia@amazon.com>

---------

Signed-off-by: Ian Hoang <hoangia@amazon.com>
Co-authored-by: Ian Hoang <hoangia@amazon.com>
Signed-off-by: Govind Kamat <govkamat@amazon.com>
gkamat pushed a commit that referenced this pull request Nov 20, 2023
…pus.py beta feature (#92) (#93)

* Update README.md to include additional information on expand-data-corpus.py beta feature (#92)

Signed-off-by: Ian Hoang <hoangia@amazon.com>

* Revised README to better specify expand-data-corpus.py feature

Signed-off-by: Ian Hoang <hoangia@amazon.com>

---------

Signed-off-by: Ian Hoang <hoangia@amazon.com>
Co-authored-by: Ian Hoang <hoangia@amazon.com>
Signed-off-by: Govind Kamat <govkamat@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants