This repository contains code for replicating the experiments in "Composing SMArt Data Services through Large Language Models" paper.
- Create a virtual environment and install the dependencies
conda create -n pyllm python=3.9
conda activate pyllm
pip install -r requirements.txt
- Create a
.env
file in the root directory of the project and add the following line
OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
-
Define the query in the
json
file.
As an example:"q5": { "query": "Please provide a table for the upcoming 30 cardboard pieces processed by the diecutter with ID 7, detailing (i) how many cardboard pieces are defect-free and (ii) how many contain defects.", }
-
In the main file, specify the
<query_number>
to be executed.
As an example:... if __name__ == "__main__": q = "q5" ...
-
Run the LLM:
cd src python main.py
-
The LLM will generate a
temp_pipeline.py
file with the Python pipeline leveraging the properdata services
to generate the requested information.
Given the Example, the LLM will generate a schema as follows.+----+--------------------+---------------------+ | | no_defects_count | with_errors_count | |----+--------------------+---------------------| | 0 | 17 | 13 | +----+--------------------+---------------------+
To run the experiments, execute the following command:
cd src
python run_evaluation.py
The script will create different .csv
in evaluation folder containing the results of the run and computed metrics.
evaluation folder contains the results of the experiments:
- COSMADS:
- COSMADS (w/o similar pipelines):
- COSMADS (w/o pipelines):
- GitHub Copilot: