Make query randomization more flexible #712
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Controllable query randomization was added in #455, and was limited to randomizing
range
queries with fieldsgte
/gt
,lte
/lt
, and optionallyformat
. This is limiting since we might need to randomize other queries for some use cases, like benchmarking the query cache.This PR lets the user register different query names and parameter names to be randomized for each operation. This is done by adding
registry.register_query_randomization_info(...)
in workload.py. See an example.In this example we have the operation:
And in workload.py we run
registry.register_query_randomization_info("bbox", "geo_bounding_box", [["top_left"], ["bottom_right"]], [])
.The first argument,
"bbox"
, is the operation name. The second argument,"geo_bounding_box"
, is the query type name.The third argument is a list of lists:
[["top_left"], ["bottom_right"]]
. Each entry in the outer list represents one parameter name that will be randomized. It's a list because we may have multiple different versions of the same name that represent roughly the same thing. For example,"gte"
or"gt"
. In this case there's just one option for each parameter name. At least one version of each parameter name must be present in the original query for it to be randomized.The last argument is a list of optional parameters. If an optional parameter is present in the random standard value source, it will be put into the randomized version of the query. If it's not in the source, it's ignored. There are no optional parameters in this example, but the typical use case would be
"format"
in a range query.If nothing is registered, it falls back to the default query randomization info object, which randomizes range queries as was done before this PR. This is equivalent to registering
registry.register_query_randomization_info(<operation_name>, "range", [["gte", "gt"], ["lte", "lt"]], ["format"])
.The dict returned by the random standard value source should match the parameter names you are trying to randomize. For example the standard value source for the above example is:
Issues Resolved
#711
Testing
Adds UTs. Also tested manually with a workload designed for testing the query cache, which uses a non-default QueryRandomizationInfo.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.