Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make query randomization more flexible #712

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

peteralfonsi
Copy link
Contributor

Description

Controllable query randomization was added in #455, and was limited to randomizing range queries with fields gte/gt, lte/lt, and optionally format. This is limiting since we might need to randomize other queries for some use cases, like benchmarking the query cache.

This PR lets the user register different query names and parameter names to be randomized for each operation. This is done by adding registry.register_query_randomization_info(...) in workload.py. See an example.

In this example we have the operation:

{
  "name": "bbox", 
  "operation-type": "search", 
  "index": "nyc_taxis",
  "body": { 
    "size": 0,
    "query": {
      "geo_bounding_box": {
        "pickup_location": {
          "top_left": [-74.27, 40.92],
          "bottom_right": [-73.68, 40.49]
        }
      }
    }
  }
}

And in workload.py we run registry.register_query_randomization_info("bbox", "geo_bounding_box", [["top_left"], ["bottom_right"]], []).

The first argument, "bbox", is the operation name. The second argument, "geo_bounding_box", is the query type name.

The third argument is a list of lists: [["top_left"], ["bottom_right"]]. Each entry in the outer list represents one parameter name that will be randomized. It's a list because we may have multiple different versions of the same name that represent roughly the same thing. For example, "gte" or "gt". In this case there's just one option for each parameter name. At least one version of each parameter name must be present in the original query for it to be randomized.

The last argument is a list of optional parameters. If an optional parameter is present in the random standard value source, it will be put into the randomized version of the query. If it's not in the source, it's ignored. There are no optional parameters in this example, but the typical use case would be "format" in a range query.

If nothing is registered, it falls back to the default query randomization info object, which randomizes range queries as was done before this PR. This is equivalent to registering registry.register_query_randomization_info(<operation_name>, "range", [["gte", "gt"], ["lte", "lt"]], ["format"]).

The dict returned by the random standard value source should match the parameter names you are trying to randomize. For example the standard value source for the above example is:

def bounding_box_source(): 
    top_longitude = random.uniform(-74.27, -73.68)
    top_latitude = random.uniform(40.49, 40.92)

    bottom_longitude = random.uniform(top_longitude, -73.68)
    bottom_latitude = random.uniform(40.49, top_latitude)

    return { 
        "top_left":[top_longitude, top_latitude],
        "bottom_right":[bottom_longitude, bottom_latitude]
    }

Issues Resolved

#711

Testing

  • New functionality includes testing

Adds UTs. Also tested manually with a workload designed for testing the query cache, which uses a non-default QueryRandomizationInfo.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Peter Alfonsi added 3 commits December 9, 2024 17:26
Signed-off-by: Peter Alfonsi <petealft@amazon.com>
Signed-off-by: Peter Alfonsi <petealft@amazon.com>
Signed-off-by: Peter Alfonsi <petealft@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant