LearnedWMP: Workload Memory Prediction Using Distribution of Query Templates - Experiment Datasets and Code Assets
This repository has the source code for the following paper (under review):
Quader, Shaikh, Andres Jaramillo, Sumona Mukhopadhyay, Ghadeer Abuoda, Calisto Zuzarte, David Kalmuk, Marin Litoiu, and Manos Papagelis. "LearnedWMP: Workload Memory Prediction Using Distribution of Query Templates."
This Git repository contains the source code for the research paper referenced in this README. It also provides instructions for running the code included in this repository. Below is an overview of the repository structure and its contents.
The repository is organized into two main folders:
models
templates
This folder contains code and data for generating learning templates using two distinct approaches:
- Approach 1: Generating templates from query text.
- Approach 2: Generating templates from query plan features.
These approaches enable flexible and efficient creation of templates for various workloads.
The models
folder is further divided into four subdirectories, each tailored for specific training tasks:
-
job_query
:- Contains data and notebooks training models at the level of individual queries from Join Order Benchmark (JOB) queries.
-
job_workload
:- Includes data and notebooks for training models at the workload level (i.e., a batch of queries) using queries from Join Order Benchmark (JOB) queries.
-
tpcds_query
:- Provides datasets and notebooks for training query-level models for TPC-DS queries.
-
tpcds_workload
:- Contains resources for workload-level model training using TPC-DS queries.
- Begin by exploring the folders to understand their purpose and content.
- Follow the provided notebooks and datasets in each folder to replicate the training and template generation approaches described. The following is the suggested order of running code:
- First,
templates
folder - Second,
models
folder
To run the Jupyter Notebooks from the LearnedWMP repository, follow these steps:
-
Clone the Repository
git clone https://github.com/shaikhq/learnedwmp.git cd learnedwmp
-
Set Up a Virtual Environment
python -m venv .venv
The recommended python level is python 3.12.3.
-
Activate the Virtual Environment
- On Windows:
.venv\Scripts\activate
- On macOS/Linux:
source .venv/bin/activate
- On Windows:
-
Install Required Packages
pip install -r requirements.txt
-
Launch Jupyter Notebook
jupyter notebook
This command will open the Jupyter Notebook interface in your default web browser.
-
Open and Run Notebooks In the Jupyter interface, navigate to the notebook you wish to run and click on it to open. Execute the cells sequentially to run the code.
Note:
Ensure that your environment meets all prerequisites specified in the requirements.txt
file.