This repository contains code, embeddings of SQL queries, models, and result artifacts from the preliminary work of LLMSteer. Analysis was done using python version 3.11.4
. See requirements.txt
for list of dependencies. Queries were executed using PostgreSQL version 16.1
.
Abstract: Recent work in database query optimization has used complex machine learning strategies, such as customized reinforcement learning schemes. Surprisingly, we show that LLM embeddings of query text contain useful semantic information for query optimization. Specifically, we show that a simple binary classifier deciding between alternative query plans, trained only on a small number of labeled embedded query vectors, can outperform existing heuristic systems. Although we only present some preliminary results, an LLM-powered query optimizer could provide significant benefits, both in terms of performance and simplicity.
Data used in this work is accessible here and can be unzipped with the following command: tar --zstd -xvf llmsteer_data.tar.zst
The list of hints used in this work originate from Bao and can be found in the online appendix here
Full paper can be found on arXiv here
You can find me at peterai.me