-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create-Workload Improvements: Write Test Procedures and Operations into Separate Directories and Files #397
Conversation
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
The sample run with the newly-created workload indicates that there's something wrong since it only runs through the EDIT: There's not a bug but the default test procedure that OSB is running is incorrectly formatted and not running default ingestion and search operations. |
@@ -0,0 +1,22 @@ | |||
{ | |||
"operation": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operations file includes extraneous fields for each operations.
Ingest operations should include at minimum name
, operation-type
, bulk-size
, and ingest-percentage
fields.
"name": "index",
"operation-type": "bulk",
"bulk-size": {{bulk_size | default(10000)}},
"ingest-percentage": {{ingest_percentage | default(100)}}
}
Each search operation should just include a json name
, operation-type
, index
, and body
fields.
"name": "default",
"operation-type": "search",
"index": {{ indices | map(attribute='name') | list | join(',') | tojson }},
"body": {
"query": {
"match_all": {}
}
}
Fields like search_clients
or bulk_indexing_clients
belong in test_procedures file. For reference, see NYC_Taxis workload's operations file: https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/main/nyc_taxis/operations/default.json
"clients": {{bulk_indexing_clients | default(8)}} | ||
},{% endraw %} | ||
{ | ||
"operation": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For custom operations, we need to remove the default
operation. This should only be included in the default operations file. Users might not want a match_all query included in their workload if they are already providing their own queries.
{ | ||
"operation": { | ||
"name": "{{query.name}}", | ||
"operation-type": "{{query['operation-type']}}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just need to specify the name
field and not the operation-type
, index
, and body
since the operations are already defined in the operations/default.json
directory / file. Instead, we should add parameters that the user can insert such as warmup-iterations
, iterations
, and search_clients
. Use this search operation as reference: https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/main/nyc_taxis/test_procedures/default.json#L56-L69
"retry-until-success": true | ||
} | ||
} | ||
{% endraw -%} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why the test is only running delete-index
, create-index
, and cluster-health
. It needs default ingestion operation and search operation.
def get_doc_outpath(outdir, name, suffix=""): | ||
return os.path.join(outdir, f"{name}-documents{suffix}.json") | ||
def get_doc_outpath(outdir, suffix=""): | ||
return os.path.join(outdir, f"documents{suffix}.json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was name
removed?
@@ -107,14 +109,16 @@ def extract(client, outdir, index_pattern): | |||
|
|||
index_obj = extract_index_mapping_and_settings(client, index_pattern) | |||
for index, details in index_obj.items(): | |||
filename = f"{index}.json" | |||
filename = f"index.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you simplify this to just be named as index.json
?
|
||
# user logger.info and print indices and corpora |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we be implementing a logger.info to do this?
extract_template(templates_path, "base-workload.json.j2"), | ||
template_vars, | ||
) | ||
|
||
if custom_queries: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend moving this to a single function in the same module to reduce boilerplate code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments and please address the CI failures. You can run make lint
before publishing PR to ensure that you adhering to lint standards.
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
New output with the changes:
|
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
Signed-off-by: AkshathRaghav <akshathraghav.r@gmail.com>
test_procedures_path = os.path.join(output_path, "test_procedures") | ||
|
||
try: | ||
shutil.rmtree(output_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To help readers understand this section quickly, would recommend inserting a logging statement here such as:
logger.info(f"Removing existing workload [{workload_name}] in path [{output_path}]")
def write_template(output_path, template_file): | ||
template = extract_template(templates_path, template_file + ".json.j2") | ||
with open(output_path, "w") as f: | ||
f.write(template.render(template_vars)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you choose to encapsulate write_template()
but not extract_template()
?
For readability purposes, I'm wondering if this might be better condensed into a single function called extract_and_write_template()
or if write_template
should be moved out of render_templates()
and placed next to extract_template()
"operation": "refresh" | ||
}, | ||
{ | ||
"name": "search-after-index", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still include basic workload parameters such as warmup-iterations
, iterations
, target-throughput
, and `clients. See https://github.com/rishabh6788/opensearch-benchmark-workloads/blob/main/http_logs/test_procedures/default.json#L60-L71 for reference. Doesn't have to be identical to this since we're trying to consolidate this format, as seen in this issue opensearch-project/opensearch-benchmark-workloads#120
"bulk-size": {{bulk_size}}, | ||
"ingest-percentage": {{ingest_percentage}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still include the same Jinja defaults as the default-operations.json.j2
file.
"match_all": {} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to include a way to add custom queries. Some users might use the --custom-queries
parameter in OSB and attach a JSON file containing queries they plan to run. We need to add those here dynamically
Description
Changing the way create-workload creates its files.
Issues Resolved
#376
Testing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.