Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create-Workload Improvements: Write Test Procedures and Operations into Separate Directories and Files #397

Conversation

AkshathRaghav
Copy link
Contributor

Description

Changing the way create-workload creates its files.

Issues Resolved

#376

Testing

  • Tested creating workload
  • Tested executing the test
╰─$ opensearch-benchmark create-workload \
--workload=flights \
--target-host="https://127.0.0.1:9200" \
--client-options="basic_auth_user:'admin',basic_auth_password:'admin'" \
--output-path=~/Workbench/Workloads \
--indices=opensearch_dashboards_sample_data_flights \
--number-of-docs opensearch_dashboards_sample_data_flights:2500 \
--client-options="timeout:300,use_ssl:true,verify_certs:false,basic_auth_user:'admin',basic_auth_password:'admin'" \


   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/

[INFO] Connected to OpenSearch cluster [opensearch-node1] version [2.11.0].


[INFO] Workload flights has been created. Run it with: opensearch-benchmark --workload-path=/home/aksha/Workbench/Workloads/flights

-------------------------------
[INFO] SUCCESS (took 0 seconds)
-------------------------------
╰─$ opensearch-benchmark execute-test \   
--pipeline="benchmark-only" \
--workload-path="/home/aksha/Workbench/Workloads/flights" \
--target-host="https://127.0.0.1:9200" \
--client-options="timeout:300,use_ssl:true,verify_certs:false,basic_auth_user:'admin',basic_auth_password:'admin'"

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/

[INFO] Executing test with workload [flights], test_procedure [append-no-conflicts] and provision_config_instance ['external'] with version [2.11.0].

[WARNING] merges_total_time is 3351 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] indexing_total_time is 5146 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] refresh_total_time is 13438 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] flush_total_time is 531 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
Running delete-index                                                           [100% done]
Running create-index                                                           [100% done]
Running cluster-health                                                         [100% done]

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------
            
|                                                         Metric |   Task |       Value |   Unit |
|---------------------------------------------------------------:|-------:|------------:|-------:|
|                     Cumulative indexing time of primary shards |        |   0.0567833 |    min |
|             Min cumulative indexing time across primary shards |        |           0 |    min |
|          Median cumulative indexing time across primary shards |        | 0.000141667 |    min |
|             Max cumulative indexing time across primary shards |        |     0.05535 |    min |
|            Cumulative indexing throttle time of primary shards |        |           0 |    min |
|    Min cumulative indexing throttle time across primary shards |        |           0 |    min |
| Median cumulative indexing throttle time across primary shards |        |           0 |    min |
|    Max cumulative indexing throttle time across primary shards |        |           0 |    min |
|                        Cumulative merge time of primary shards |        |   0.0559833 |    min |
|                       Cumulative merge count of primary shards |        |         112 |        |
|                Min cumulative merge time across primary shards |        |           0 |    min |
|             Median cumulative merge time across primary shards |        |           0 |    min |
|                Max cumulative merge time across primary shards |        |   0.0559833 |    min |
|               Cumulative merge throttle time of primary shards |        |           0 |    min |
|       Min cumulative merge throttle time across primary shards |        |           0 |    min |
|    Median cumulative merge throttle time across primary shards |        |           0 |    min |
|       Max cumulative merge throttle time across primary shards |        |           0 |    min |
|                      Cumulative refresh time of primary shards |        |     0.21205 |    min |
|                     Cumulative refresh count of primary shards |        |        1613 |        |
|              Min cumulative refresh time across primary shards |        |           0 |    min |
|           Median cumulative refresh time across primary shards |        | 0.000116667 |    min |
|              Max cumulative refresh time across primary shards |        |    0.207283 |    min |
|                        Cumulative flush time of primary shards |        |  0.00643333 |    min |
|                       Cumulative flush count of primary shards |        |          12 |        |
|                Min cumulative flush time across primary shards |        |           0 |    min |
|             Median cumulative flush time across primary shards |        |    0.000375 |    min |
|                Max cumulative flush time across primary shards |        |  0.00306667 |    min |
|                                        Total Young Gen GC time |        |           0 |      s |
|                                       Total Young Gen GC count |        |           0 |        |
|                                          Total Old Gen GC time |        |           0 |      s |
|                                         Total Old Gen GC count |        |           0 |        |
|                                                     Store size |        |  0.00329801 |     GB |
|                                                  Translog size |        |  0.00471612 |     GB |
|                                         Heap used for segments |        |           0 |     MB |
|                                       Heap used for doc values |        |           0 |     MB |
|                                            Heap used for terms |        |           0 |     MB |
|                                            Heap used for norms |        |           0 |     MB |
|                                           Heap used for points |        |           0 |     MB |
|                                    Heap used for stored fields |        |           0 |     MB |
|                                                  Segment count |        |          17 |        |


--------------------------------
[INFO] SUCCESS (took 19 seconds)
--------------------------------

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
@IanHoang
Copy link
Collaborator

IanHoang commented Oct 18, 2023

The sample run with the newly-created workload indicates that there's something wrong since it only runs through the delete-index, create-index, and cluster-health operations (and since ingestion and default search queries are not populating the report at the end).

EDIT: There's not a bug but the default test procedure that OSB is running is incorrectly formatted and not running default ingestion and search operations.

@@ -0,0 +1,22 @@
{
"operation": {
Copy link
Collaborator

@IanHoang IanHoang Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operations file includes extraneous fields for each operations.
Ingest operations should include at minimum name, operation-type, bulk-size, and ingest-percentage fields.

      "name": "index",
      "operation-type": "bulk",
      "bulk-size": {{bulk_size | default(10000)}},
      "ingest-percentage": {{ingest_percentage | default(100)}}
    }

Each search operation should just include a json name, operation-type, index, and body fields.

        "name": "default",
        "operation-type": "search",
        "index": {{ indices | map(attribute='name') | list | join(',') | tojson }},
        "body": {
          "query": {
            "match_all": {}
          }
        }

Fields like search_clients or bulk_indexing_clients belong in test_procedures file. For reference, see NYC_Taxis workload's operations file: https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/main/nyc_taxis/operations/default.json

"clients": {{bulk_indexing_clients | default(8)}}
},{% endraw %}
{
"operation": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For custom operations, we need to remove the default operation. This should only be included in the default operations file. Users might not want a match_all query included in their workload if they are already providing their own queries.

{
"operation": {
"name": "{{query.name}}",
"operation-type": "{{query['operation-type']}}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just need to specify the name field and not the operation-type, index, and body since the operations are already defined in the operations/default.json directory / file. Instead, we should add parameters that the user can insert such as warmup-iterations, iterations, and search_clients. Use this search operation as reference: https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/main/nyc_taxis/test_procedures/default.json#L56-L69

"retry-until-success": true
}
}
{% endraw -%}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why the test is only running delete-index, create-index, and cluster-health. It needs default ingestion operation and search operation.

def get_doc_outpath(outdir, name, suffix=""):
return os.path.join(outdir, f"{name}-documents{suffix}.json")
def get_doc_outpath(outdir, suffix=""):
return os.path.join(outdir, f"documents{suffix}.json")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was name removed?

@@ -107,14 +109,16 @@ def extract(client, outdir, index_pattern):

index_obj = extract_index_mapping_and_settings(client, index_pattern)
for index, details in index_obj.items():
filename = f"{index}.json"
filename = f"index.json"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you simplify this to just be named as index.json?


# user logger.info and print indices and corpora
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we be implementing a logger.info to do this?

extract_template(templates_path, "base-workload.json.j2"),
template_vars,
)

if custom_queries:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend moving this to a single function in the same module to reduce boilerplate code.

Copy link
Collaborator

@IanHoang IanHoang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments and please address the CI failures. You can run make lint before publishing PR to ensure that you adhering to lint standards.

Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
@AkshathRaghav
Copy link
Contributor Author

New output with the changes:

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/

[INFO] Preparing file offset table for [/home/aksha/Workbench/Workloads/flights/opensearch_dashboards_sample_data_flights-documents.json] ... [OK]
[INFO] Executing test with workload [flights], test_procedure [append-no-conflicts] and provision_config_instance ['external'] with version [2.11.0].

[WARNING] merges_total_time is 364 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] indexing_total_time is 914 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] refresh_total_time is 1597 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] flush_total_time is 1306 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
Running delete-index                                                           [100% done]
Running create-index                                                           [100% done]
Running cluster-health                                                         [100% done]
Running index-append                                                           [100% done]
Running refresh-after-index                                                    [100% done]
Running default                                                                [100% done]

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------
            
|                                                         Metric |         Task |       Value |   Unit |
|---------------------------------------------------------------:|-------------:|------------:|-------:|
|                     Cumulative indexing time of primary shards |              |      0.0128 |    min |
|             Min cumulative indexing time across primary shards |              |           0 |    min |
|          Median cumulative indexing time across primary shards |              |           0 |    min |
|             Max cumulative indexing time across primary shards |              |  0.00773333 |    min |
|            Cumulative indexing throttle time of primary shards |              |           0 |    min |
|    Min cumulative indexing throttle time across primary shards |              |           0 |    min |
| Median cumulative indexing throttle time across primary shards |              |           0 |    min |
|    Max cumulative indexing throttle time across primary shards |              |           0 |    min |
|                        Cumulative merge time of primary shards |              |  0.00671667 |    min |
|                       Cumulative merge count of primary shards |              |           9 |        |
|                Min cumulative merge time across primary shards |              |           0 |    min |
|             Median cumulative merge time across primary shards |              |           0 |    min |
|                Max cumulative merge time across primary shards |              |  0.00671667 |    min |
|               Cumulative merge throttle time of primary shards |              |           0 |    min |
|       Min cumulative merge throttle time across primary shards |              |           0 |    min |
|    Median cumulative merge throttle time across primary shards |              |           0 |    min |
|       Max cumulative merge throttle time across primary shards |              |           0 |    min |
|                      Cumulative refresh time of primary shards |              |   0.0267333 |    min |
|                     Cumulative refresh count of primary shards |              |         222 |        |
|              Min cumulative refresh time across primary shards |              |           0 |    min |
|           Median cumulative refresh time across primary shards |              | 8.33333e-06 |    min |
|              Max cumulative refresh time across primary shards |              |   0.0237167 |    min |
|                        Cumulative flush time of primary shards |              |   0.0200833 |    min |
|                       Cumulative flush count of primary shards |              |          21 |        |
|                Min cumulative flush time across primary shards |              |           0 |    min |
|             Median cumulative flush time across primary shards |              |           0 |    min |
|                Max cumulative flush time across primary shards |              |      0.0144 |    min |
|                                        Total Young Gen GC time |              |       0.013 |      s |
|                                       Total Young Gen GC count |              |           2 |        |
|                                          Total Old Gen GC time |              |           0 |      s |
|                                         Total Old Gen GC count |              |           0 |        |
|                                                     Store size |              |  0.00538529 |     GB |
|                                                  Translog size |              |  0.00403114 |     GB |
|                                         Heap used for segments |              |           0 |     MB |
|                                       Heap used for doc values |              |           0 |     MB |
|                                            Heap used for terms |              |           0 |     MB |
|                                            Heap used for norms |              |           0 |     MB |
|                                           Heap used for points |              |           0 |     MB |
|                                    Heap used for stored fields |              |           0 |     MB |
|                                                  Segment count |              |          38 |        |
|                                                 Min Throughput | index-append |     20895.7 | docs/s |
|                                                Mean Throughput | index-append |     20895.7 | docs/s |
|                                              Median Throughput | index-append |     20895.7 | docs/s |
|                                                 Max Throughput | index-append |     20895.7 | docs/s |
|                                        50th percentile latency | index-append |     94.1311 |     ms |
|                                       100th percentile latency | index-append |      107.81 |     ms |
|                                   50th percentile service time | index-append |     94.1311 |     ms |
|                                  100th percentile service time | index-append |      107.81 |     ms |
|                                                     error rate | index-append |           0 |      % |
|                                                 Min Throughput |      default |       78.98 |  ops/s |
|                                                Mean Throughput |      default |       78.98 |  ops/s |
|                                              Median Throughput |      default |       78.98 |  ops/s |
|                                                 Max Throughput |      default |       78.98 |  ops/s |
|                                       100th percentile latency |      default |     12.2618 |     ms |
|                                  100th percentile service time |      default |     12.2618 |     ms |
|                                                     error rate |      default |           0 |      % |


--------------------------------
[INFO] SUCCESS (took 42 seconds)
--------------------------------

Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
test_procedures_path = os.path.join(output_path, "test_procedures")

try:
shutil.rmtree(output_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To help readers understand this section quickly, would recommend inserting a logging statement here such as:

logger.info(f"Removing existing workload [{workload_name}] in path [{output_path}]")

Comment on lines +210 to +213
def write_template(output_path, template_file):
template = extract_template(templates_path, template_file + ".json.j2")
with open(output_path, "w") as f:
f.write(template.render(template_vars))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you choose to encapsulate write_template() but not extract_template()?

For readability purposes, I'm wondering if this might be better condensed into a single function called extract_and_write_template() or if write_template should be moved out of render_templates() and placed next to extract_template()

"operation": "refresh"
},
{
"name": "search-after-index",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still include basic workload parameters such as warmup-iterations, iterations, target-throughput, and `clients. See https://github.com/rishabh6788/opensearch-benchmark-workloads/blob/main/http_logs/test_procedures/default.json#L60-L71 for reference. Doesn't have to be identical to this since we're trying to consolidate this format, as seen in this issue opensearch-project/opensearch-benchmark-workloads#120

Comment on lines +4 to +5
"bulk-size": {{bulk_size}},
"ingest-percentage": {{ingest_percentage}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still include the same Jinja defaults as the default-operations.json.j2 file.

"match_all": {}
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to include a way to add custom queries. Some users might use the --custom-queries parameter in OSB and attach a JSON file containing queries they plan to run. We need to add those here dynamically

@IanHoang IanHoang changed the title Create-Workload Improvements Create-Workload Improvements: Write Test Procedures and Operations into Separate Directories and Files Nov 13, 2023
@IanHoang IanHoang closed this Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

2 participants