Make query randomization more flexible #712

peteralfonsi · 2024-12-12T18:58:49Z

Description

Controllable query randomization was added in #455, and was limited to randomizing range queries with fields gte/gt, lte/lt, and optionally format. This is limiting since we might need to randomize other queries for some use cases, like benchmarking the query cache.

This PR lets the user register different query names and parameter names to be randomized for each operation. This is done by adding registry.register_query_randomization_info(...) in workload.py. See an example.

In this example we have the operation:

{
  "name": "bbox", 
  "operation-type": "search", 
  "index": "nyc_taxis",
  "body": { 
    "size": 0,
    "query": {
      "geo_bounding_box": {
        "pickup_location": {
          "top_left": [-74.27, 40.92],
          "bottom_right": [-73.68, 40.49]
        }
      }
    }
  }
}

And in workload.py we run registry.register_query_randomization_info("bbox", "geo_bounding_box", [["top_left"], ["bottom_right"]], []).

The first argument, "bbox", is the operation name. The second argument, "geo_bounding_box", is the query type name.

The third argument is a list of lists: [["top_left"], ["bottom_right"]]. Each entry in the outer list represents one parameter name that will be randomized. It's a list because we may have multiple different versions of the same name that represent roughly the same thing. For example, "gte" or "gt". In this case there's just one option for each parameter name. At least one version of each parameter name must be present in the original query for it to be randomized.

The last argument is a list of optional parameters. If an optional parameter is present in the random standard value source, it will be put into the randomized version of the query. If it's not in the source, it's ignored. There are no optional parameters in this example, but the typical use case would be "format" in a range query.

If nothing is registered, it falls back to the default query randomization info object, which randomizes range queries as was done before this PR. This is equivalent to registering registry.register_query_randomization_info(<operation_name>, "range", [["gte", "gt"], ["lte", "lt"]], ["format"]).

The dict returned by the random standard value source should match the parameter names you are trying to randomize. For example the standard value source for the above example is:

def bounding_box_source(): 
    top_longitude = random.uniform(-74.27, -73.68)
    top_latitude = random.uniform(40.49, 40.92)

    bottom_longitude = random.uniform(top_longitude, -73.68)
    bottom_latitude = random.uniform(40.49, top_latitude)

    return { 
        "top_left":[top_longitude, top_latitude],
        "bottom_right":[bottom_longitude, bottom_latitude]
    }

Issues Resolved

#711

Testing

New functionality includes testing

Adds UTs. Also tested manually with a workload designed for testing the query cache, which uses a non-default QueryRandomizationInfo.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Peter Alfonsi <petealft@amazon.com>

Peter Alfonsi added 3 commits December 9, 2024 17:26

Adds logic to randomize non-range queries

9f06639

Signed-off-by: Peter Alfonsi <petealft@amazon.com>

Added registration of target_keys_info

0911850

Signed-off-by: Peter Alfonsi <petealft@amazon.com>

Made nomenclature clearer

841ab27

Signed-off-by: Peter Alfonsi <petealft@amazon.com>

peteralfonsi requested review from IanHoang, gkamat, beaioun, cgchinmay, rishabh6788, VijayanB and OVI3D0 as code owners December 12, 2024 18:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make query randomization more flexible #712

Make query randomization more flexible #712

peteralfonsi commented Dec 12, 2024

Make query randomization more flexible #712

Are you sure you want to change the base?

Make query randomization more flexible #712

Conversation

peteralfonsi commented Dec 12, 2024

Description

Issues Resolved

Testing