LearnedWMP: Workload Memory Prediction Using Distribution of Query Templates - Experiment Datasets and Code Assets

This repository has the source code for the following paper (under review):

Quader, Shaikh, Andres Jaramillo, Sumona Mukhopadhyay, Ghadeer Abuoda, Calisto Zuzarte, David Kalmuk, Marin Litoiu, and Manos Papagelis. "LearnedWMP: Workload Memory Prediction Using Distribution of Query Templates."

Repository Overview

This Git repository contains the source code for the research paper referenced in this README. It also provides instructions for running the code included in this repository. Below is an overview of the repository structure and its contents.

Repository Structure

The repository is organized into two main folders:

models
templates

`templates` Folder

This folder contains code and data for generating learning templates using two distinct approaches:

Approach 1: Generating templates from query text.
Approach 2: Generating templates from query plan features.

These approaches enable flexible and efficient creation of templates for various workloads.

`models` Folder

The models folder is further divided into four subdirectories, each tailored for specific training tasks:

job_query:
- Contains data and notebooks training models at the level of individual queries from Join Order Benchmark (JOB) queries.
job_workload:
- Includes data and notebooks for training models at the workload level (i.e., a batch of queries) using queries from Join Order Benchmark (JOB) queries.
tpcds_query:
- Provides datasets and notebooks for training query-level models for TPC-DS queries.
tpcds_workload:
- Contains resources for workload-level model training using TPC-DS queries.

Instructions

Begin by exploring the folders to understand their purpose and content.
Follow the provided notebooks and datasets in each folder to replicate the training and template generation approaches described. The following is the suggested order of running code:

First, templates folder
Second, models folder

Steps to running the notebooks

To run the Jupyter Notebooks from the LearnedWMP repository, follow these steps:

Clone the Repository

git clone https://github.com/shaikhq/learnedwmp.git
cd learnedwmp

Set Up a Virtual Environment
```
python -m venv .venv
```

The recommended python level is python 3.12.3.

Activate the Virtual Environment
- On Windows:
```
.venv\Scripts\activate
```
- On macOS/Linux:
```
source .venv/bin/activate
```
Install Required Packages
```
pip install -r requirements.txt
```
Launch Jupyter Notebook
```
jupyter notebook
```
This command will open the Jupyter Notebook interface in your default web browser.
Open and Run Notebooks In the Jupyter interface, navigate to the notebook you wish to run and click on it to open. Execute the cells sequentially to run the code.

Note: Ensure that your environment meets all prerequisites specified in the requirements.txt file.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
models		models
templates		templates
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LearnedWMP: Workload Memory Prediction Using Distribution of Query Templates - Experiment Datasets and Code Assets

Repository Overview

Repository Structure

`templates` Folder

`models` Folder

Instructions

Steps to running the notebooks

About

Releases

Packages

Languages

shaikhq/learnedwmp

Folders and files

Latest commit

History

Repository files navigation

LearnedWMP: Workload Memory Prediction Using Distribution of Query Templates - Experiment Datasets and Code Assets

Repository Overview

Repository Structure

templates Folder

models Folder

Instructions

Steps to running the notebooks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`templates` Folder

`models` Folder

Packages