PostBOUND

PostBOUND is a framework for studying query optimization algorithms for (relational) database systems. It provides tools to easily implement prototypes of new optimization algorithms and to compare them in a transparent and reproducible way. It is implemented as a Python tool that takes an input query, applies a user-configured optimization pipeline to that query and produces an annotated output query that enforces the selected query plan during execution on a native database system. This repository provides the actual Python implementation of the PostBOUND framework (located in the postbound directory), along with a number of utilities that automate some of the tedious parts of evaluating an optimization algorithm on a specific benchmark. Those utilities are mostly focused on setting up different popular database management systems and loading commonly used databases and benchmarks for them.

Getting Started

Tip

PostBOUND can also as an all-in-one Docker container. See Dockerfile for details.

All package requirements can be installed from the requirements.txt file. To use PostBOUND in different projects, it can also be build and installed as a local package using pip. This is generally the recommended way to go and can be automated using the tools/setup-py-venv.sh script.

In addition to the Python packages, PostBOUND also needs a database connection in order to optimize and execute queries. Currently, PostgreSQL and MySQL are supported, with the MySQL features being a bit more limited due to restrictions of the system. The root directory of the PostBOUND repository contains setup utilities for some database systems, databases and workloads.

The best way to familiarize yourself with PostBOUND is to study the examples and the documentation of the used classes and functions. A high-level documentation is also being worked on, but still subject to change and not entirely up-to-date. Therefore, the Python documentation of the source code is more extensive. Consult it for the specifics of how to use a specific feature and take a look at the examples to get an idea of when to use it and which features are available. The best starting point for the in-code documentation is the __init__.py file in the postbound source directory.

We also published a paper¹ which explains the concepts that motivated the initial versions of PostBOUND. Notice however, that at the time of its publication the framework had a much more limited scope and was heavily expanded since then. More specifically, PostBOUND is no longer limited to upper bound-driven optimization strategies and much more independent of specific database systems.

Example

The following snippet gives a glimpse of the different parts of the framework and how they can interact. The specific example implements the UES upper-bound optimization algorithm² to obtain an optimized join order for the queries of the Join Order Benchmark³ and applies them to a Postgres database instance.

##
## Step 0: imports
##

import postbound as pb
from postbound.optimizer import presets

##
## Step 1: System setup
##
postgres_instance = pb.postgres.connect()
presets.apply_standard_system_options()
job_workload = pb.workloads.job()
ues_settings = presets.fetch("ues")

##
## Step 2: Optimization pipeline setup
##
optimization_pipeline = pb.TwoStageOptimizationPipeline(postgres_instance)
optimization_pipeline.load_settings(ues_settings)
optimization_pipeline.build()

##
## Step 3: Query optimization
##
input_query = job_workload["1a"]
optimized_query = optimization_pipeline.optimize_query(input_query)

##
## Step 4: Query execution
##
query_result = postgres_instance.execute_query(optimized_query)
print(query_result)

This examples can be almost executed as-is, there is only one setup step missing: in Step 1, PostBOUND is asked to connect to a Postgres database, but no information is provided on how this connection should be obtained. By default, PostBOUND reads this information from a hidden config file. In the case of Postgres, this file is called .psycopg_connection and it has to be placed in the same directory from which the code is executed. The Postgres file has to contain a connection string that can be used to establish a database connection, such as dbname=<my db> user=<my user> host=localhost. Consult the documentation on the database systems for more info on what information is required and how it should be stored. As a final note, the database setup utilities that are shipped with PostBOUND also contain scripts that automatically generate valid connect files for the system. These files than only need to be moved to the correct location.

Repo Structure

The repository is structured as follows. The postbound directory contains the actual source code, all other folders are concerned with "supporting" aspects (which are important nevertheless.). Almost all of the subdirectories contain further READMEs that explain their purpose and structure in more detail.

Folder	Description
`postbound`	Contains the source code of the PostBOUND framework
`docs`	contains the high-level documentation as well as infrastructure to export the source code documentation
`examples`	contains general examples for typical usage scenarios. These should be run from the root directory, e.g. as `python3 -m examples.example-01-basic-workflow`
`tests`	contains the unit tests and integration tests for the framework implementatino. These should also be run from the root directory, e.g. as `python3 -m unittest tests`
`db-support`	Contains utilities to setup instances of the respective database systems and contain system-specific scripts to import popular benchmarks for them
`workloads`	Contains the raw SQL queries of some popular benchmarks
`tools`	Provides different other utilities that are not directly concerned with specific database systems, but rather with common problems encoutered when benchmarking query optimizers

Package Structure

The postbound directory contains the actual source code of the framework. On a high-level, the PostBOUND framework is structured as follows:

Package	Description
`optimizer`	provides the different optimization strategies, interfaces and some pre-defined algorithms
`qal`	provides the query abstraction used throughout PostBOUND, as well as logic to parse and transform query instances
`db`	contains all parts of PostBOUND that concern database interaction. That includes retrieving data from different database systems, as well as generating optimized queries to execute on the database system
`experiments`	provides tools to conveniently load benchmarks and to measure their execution time for different optimization settings
`util`	contains algorithms and types that do not belong to specific parts of PostBOUND and are more general in nature
`vis`	contains utilities to visualize different concepts in query optimization (join orders, join graphs, query execution plans, ...)

The actual optimization pipelines is defined in the postbound module at the package root. Depending on the specific use-case, different pipelines are available.

Dockerfile

You can install your own PostBOUND library along with a local Postgres instance using the Dockerfile. See an example below. Depending on your specific use case, databases for different workloads can be imported. The image can also directly run an instance of pg_lab instead of vanilla Postgres. pg_lab provides advanced hinting support (e.g. for materialization or cardinality hints for base tables) and offers additional extension points for the query optimizer (e.g. hooks for the different cost functions).

The Dockerfile can be considered a blueprint of the target system design for PostBOUND, i.e. including a local database server. You can adjust the build process with the following arguments:

Argument	Allowed values	Description
`TIMEZONE`	Any valid timezone identifier, defaults to `UTC`	Timezone of the Docker container (and hence the Postgres server). It is probably best to just use the value of `cat /etc/timezone`
`USERNAME`	Any valid UNIX username, defaults to `postbound`.	The username within the Docker container. This will also be the Postgres user.
`SETUP_IMDB`	`true` or `false` (default)	Whether an IMDB instance should be created as part of the Postgres setup.
`SETUP_STATS`	`true` or `false` (default)	Whether a Stats instance should be created as part of the Postgres setup.
`SETUP_STACK`	`true` or `false` (default)	Whether a Stack instance should be created as part of the Postgres setup.
`OPTIMIZE_PG_CONFIG`	`true` or `false` (default)	Whether the Postgres configuration parameters should be automatically set based on your hardware platform. Rules are based on PGTune by le0pard.
`PG_DISK_TYPE`	`SSD` (default) or `HDD`	In case the Postgres server is automatically configured (see `OPTIMIZE_PG_CONFIG`) this indicates the kind of storage for the actual database. In turn, this influences the relative cost of sequential access and index-based access for the query optimizer.
`USE_PGLAB`	`true` or `false` (default)	Whether to initialize a pg_lab server instead of a normal Postgres server. pg_lab provides advanced hinting capabilities and offers additional extension points for the query optimizer.

The PostBOUND source code is located at /postbound. If pg_lab is being used, the corresponding files are located at /pg_lab. The container automatically exposes the Postgres port 5432 and provides a volume mountpoint at /postbound/public. This mountpoint can be used to easily get experiment scripts into the container and to export results back out again.

Example setup:

$ docker build \
    --build-arg TIMEZONE=$(cat /etc/timezone) \
    --build-arg USE_PGLAB=true \
    --build-arg OPTIMIZE_PG_CONFIG=true \
    --build-arg SETUP_STATS=true \
    -t postbound \
    .
$ docker run -dt \
    --shm-size 4G \
    --name postbound \
    --volume $PWD/postbound-docker:/postbound/public \
    --expose 5432 \
    postbound
$ docker exec -it postbound /bin/bash

Literature

Bergmann et al.: "PostBOUND: PostgreSQL with Upper Bound SPJ Query Optimization", BTW'2023 (paper) ↩
Hertzschuch et al.: "Simplicity Done Right for Join Ordering", CIDR'21 (paper, GitHub) ↩
Leis et al.: "How Good are Query Optimizers, Really?", PVLDB'15 (paper) ↩

Name		Name	Last commit message	Last commit date
Latest commit History 1,369 Commits
db-support		db-support
docs		docs
examples		examples
postbound		postbound
tests		tests
tools		tools
workloads		workloads
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PostBOUND

Getting Started

Example

Repo Structure

Package Structure

Dockerfile

Literature

About

Languages

License

rbergm/PostBOUND

Folders and files

Latest commit

History

Repository files navigation

PostBOUND

Getting Started

Example

Repo Structure

Package Structure

Dockerfile

Literature

Footnotes

About

Resources

License

Stars

Watchers

Forks

Languages