You should use this installation guide if you are planning on setting up a SUQL-powered agent. Downloading from source makes it easy to modify few-shot prompts and quickly set up a working agent. If you are planning on using SUQL in a larger codebase and only need an entry point to the SUQL compiler, we recommend using install_pip.md instead.
To get started, run:
git clone https://github.com/stanford-oval/suql.git
Installing PostgreSQL database
-
Follow the instruction there to install a postgreSQL database. For instance, if you are installing on Ubuntu, then follow section
PostgreSQL Apt Repository
at https://www.postgresql.org/download/linux/ubuntu/. -
After that, the SUQL compiler needs to make use of python functions within postgreSQL. This is done via the
postgresql-plpython3
language. If you are using Ubuntu, simply runsudo apt-get install postgresql-plpython3-<your_psql_version>
. -
Then, in your database's command line (incurred via
psql <your_database_name>
), doCREATE EXTENSION plpython3u;
. This loads this language into the current db.
-
Install dependencies via
conda create -n suql python=<any one of 3.8/3.9/3.10/3.11>; conda activate suql; pip install -r requirements.txt
; -
Install
faiss
byconda install -c conda-forge faiss-gpu
. We recommend the GPU version for accelerated indexing speed. -
Run
python -m spacy download en_core_web_sm
;
Here is a breakdown of what you need to do to set up SUQL on your domain:
-
In the command line for your database (e.g.
psql restaurants
), copy and paste all content undercustom_functions.sql
. This will define the necessary free text functions (includinganswer
andsummary
) under your PostgreSQL database.- Note that these functions are making calls to a local address, by default 8500, defined in
src/suql/free_text_fcns_server.py
(see step 5 below). If you change this address make sure to also change the address below.
- Note that these functions are making calls to a local address, by default 8500, defined in
-
The user-facing parts of this system should only require SELECT privilege (it would be safe to grant only SELECT privilege for GPT-generated queries). By default, this user is named
select_user
with passwordselect_user
inpostgresql_connection.py
.- If you are ok with this user name + login, run the following code in your
psql
command line to create this role:
- If you are ok with this user name + login, run the following code in your
CREATE ROLE select_user WITH PASSWORD 'select_user';
GRANT SELECT ON ALL TABLES IN SCHEMA public TO select_user;
ALTER ROLE select_user LOGIN;
- If you wish to choose another credential:
- This user also appears once in
src/suql/sql_free_text_support/execute_free_text_sql.py
under functionsuql_execute
of the SUQL compiler, so that intermediate tables created can be queried. Please also change it to match your user name. - This user also appears in
src/suql/faiss_embedding.py
, please make sure to also change there.
- This user also appears once in
- A few parts of the SUQL compiler require the permission to CREATE a temporary table. These temporary tables are immediately deleted after the compiler finishes processing a query. By default, this user is named
creator_role
with passwordcreator_role
. Similar to above:- If you are ok with this user name + login, run the following code in your
psql
command line to create this role:
- If you are ok with this user name + login, run the following code in your
CREATE ROLE creator_role WITH PASSWORD 'creator_role';
GRANT CREATE ON DATABASE <your-db-name> TO creator_role;
GRANT CREATE ON SCHEMA public TO creator_role;
ALTER ROLE creator_role LOGIN;
- If you wish to choose another credential:
- You would need to create a user with the privilege to CREATE tables under your database, and change
creator_role
to match that user and password. - This user appears once in
src/suql/sql_free_text_support/execute_free_text_sql.py
under functionsuql_execute
of the SUQL compiler.
- You would need to create a user with the privilege to CREATE tables under your database, and change
In all future terminals, run export PYTHONPATH="${PYTHONPATH}:<path to and including src/suql">"
, e.g. export PYTHONPATH="${PYTHONPATH}:~/suql/src/suql"
.
- Set up an embedding server for the SUQL compiler to query. Go to
src/suql/faiss_embedding.py
, and modify the lines
embedding_store.add(
table_name="restaurants",
primary_key_field_name="_id",
free_text_field_name="reviews",
db_name="restaurants",
user="select_user",
password="select_user"
)
under if __name__ == "__main__":
to match your database with its column names. Then, run python suql/faiss_embedding.py
under the src
folder.
- For instance, this line instructs the SUQL compiler to set up an embedding server for the
restaurants
database, which has_id
column as the unique row identifier, for thepopular_dishes
column (such column need to be of typeTEXT
orTEXT[]
, or other fixed-length strings/list of strings) under tablerestaurants
. This is executed with user privilegeuser="select_user"
andpassword="select_user"
; - By default, this will be set up on port 8501, which is then called by
src/suql/execute_free_text_sql.py
. In case you need to use another port, please change both addresses. - Check API documentation on more details, including options to disable caching.
- Set up the backend server for the
answer
,summary
functions. In a separate terminal, first set up your LLM API key environment variable following the litellm provider doc (e.g., for OpenAI, runexport OPENAI_API_KEY=[your OpenAI API key here]
). Then, runpython suql/free_text_fcns_server.py
under thesrc
folder.- As you probably noticed, the code in
custom_functions.sql
is just making queries to this server, which handles the LLM API calls. If you changed the address incustom_functions.sql
, then also update the address underif __name__ == "__main__":
.
- As you probably noticed, the code in
We are very close to a fully-working LLM-powered agent!
-
Write a semantic parser prompt that asks LLMs to generate SUQL. Look at
src/suql/prompts/parser_suql.prompt
for an example on how this is done on restaurants.- You should change the schema declaration and few-shot examples to match your domain. Make sure to include examples on how to use the
answer
function; - Feel free to incorporate other prompting techniques from text2sql advances.
- You should change the schema declaration and few-shot examples to match your domain. Make sure to include examples on how to use the
-
In many cases, it is ideal to limit what kind of user query would your agent respond to. There is a classifier to determine whether a user utterance requires database access. See
src/suql/prompts/if_db_classification.prompt
for an example on how this is done on restaurants.- If you decide to keep this, then modify the examples to match your domain;
- If you decide to delete this, then simply set the line
enable_classifier=True
to beenable_classifier=False
.
-
In a separate terminal from the two servers above, set up your LLM API key environment variable following the litellm provider doc (e.g., for OpenAI, run
export OPENAI_API_KEY=[your OpenAI API key here]
). Test withpython suql/agent.py
under thesrc
folder. You should be able to interact with your agent on your CLI!