Reproducible materials from the RAG decomposition blog post on how to break down tables into better chunk formats, improving retrieval.
Make sure to have poetry installed and install the project dependencies:
poetry install
Run run_experiment.sh
:
cd experiment
poetry run ./run_experiment.sh
It will generate all chunks from the three approaches (decomposition, raw parsing with PyPDF and reverse engineering), calculate the embeddings and generate the final plot for comparisons.
Refer to the contents of run_experiment.sh
to separately run parts of this process.