We propose a self-driving approach to online index selection that eschews the DBA and query optimiser, and instead learns the benefits of viable structures through strategic exploration and direct performance observation.
Cite: R Malinga Perera, Bastian Oetomo, Benjamin IP Rubinstein, and Renata Borovica-Gajic. DBA bandits: Self-driving index tuning under ad-hoc, analytical workloads with safety guarantees. In 2021 IEEE 37th International Conference on Data Engineering (ICDE), pages 600–611. IEEE, 2021.
- Setting up the environment
- You need to generate data for your benchmark
- You need to generate queries for you benchmark
- You need to create a workload file for your benchmark (example workload files can be found in
resources/workloads
folder) - Notice that we have included the predicates and payload of those queries in the workload file
- Add DB connection details to
config/db.conf
- Setting up your experiment. Our framework allows you easily setup experiments in
config/exp.conf
- See the examples in
config/exp.conf
- Check the explanation under 'Experiment Config Explained' below
- See the examples in
- Program starting point is
simulation/sim_run_experiment.py
. This is the file that you will be running. Simply add a list of experiments you need to run and execute the file - Results
- Results will be saved under in a sub-folder under
experiments
folder. Sub-folder name will be the name you provided for the experiment. - Results will include graphs, CSV of main results, pickle fill of all the data.
- Results will be saved under in a sub-folder under
- [tpc_h_static_10_MAB]
- reps = 1
- rounds = 25
- hyp_rounds = 0
- workload_shifts = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
- queries_start = [0, 21, 42, 63, 84, 105, 126, 147, 168, 189, 210, 231, 252, 273, 294, 315, 336, 357, 378, 399, 420, 441, 462, 483, 504]
- queries_end = [21, 42, 63, 84, 105, 126, 147, 168, 189, 210, 231, 252, 273, 294, 315, 336, 357, 378, 399, 420, 441, 462, 483, 504, 525]
- ta_runs = [1]
- ta_workload = optimal
- workload_file = \resources\workloads\tpc_h_static_100.json
- config_shifts = [0]
- config_start = [0]
- config_end = [20]
- max_memory = 25000
- input_alpha = 1
- input_lambda = 0.5
- time_weight = 5
- memory_weight = 0
- components = ["MAB"]
- mab_versions = ["simulation.sim_c3ucb_vR"]
- Name of the experiment
- Number of times we run the experiment
- Number of bandit rounds in the experiment
- Ignore
- Places where the workload shifts (even in static we run different templates in rounds, so it is simply 1 to number of rounds)0
- For each round we run a set of queries from workload file starting from
queries_start
index to ... queries_end
index- Rounds where we invoke PDTool
optimal
for static and dynamic shifting,last_run
for dynamic random- Location of the workload file
- Ignore
- Ignore
- Ignore
- Memory limit in MB
- Bandit parameter
alpha
- Bandit parameter
lambda
- Ignore
- Ignore
- Components you need to compare in this experiment (subset of
MAB
,TA_OPTIMAL
,NO_INDEX
, ...) - MAB version we are running (name of the mab file)