This is the official implementation of CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks.
This framework gives you the ability to verify LLM answers, especially for intricate open-ended tasks such as consolidation, summarization, and extraction of knowledge. CheckEmbed implements verification by running the LLMs' answers through an embedding model and comparing the corresponding answer-level embeddings. This reduction of a complex textual answer to a single embedding facilites a straightforward, fast, and meaningful verification, while showcasing significant improvements in accuracy, cost-effectiveness, and runtime performance compared to existing token-, sentence-, and fact-level schemes such as BERTScore or SelfCheckGPT.
In order to use this framework, you need to have a working installation of Python 3.8 or newer.
Before running either of the following two installation methods, make sure to activate your Python environment (if any) beforehand.
If you are a user and you just want to use CheckEmbed
, you can install it from source:
git clone https://github.com/spcl/CheckEmbed.git
cd CheckEmbed
pip install .
# If you want to use a CUDA GPU, please install the following environment as well.
pip install ".[cuda]"
If you are a developer and you want to modify the code, you can install it in editable mode from source:
git clone https://github.com/spcl/CheckEmbed.git
cd CheckEmbed
pip install -e .
# If you want to use a CUDA GPU, please install the following environment as well.
pip install -e ".[cuda]"
In order to use parts of the framework, you need to have access to an LLM and/or an embedding model.
Please follow the instructions in the READMEs of the respective modules to configure the LLMs and embedding models of your choice.
Please create a copy of config_template.json
named config.json
in the CheckEmbed directory and update its details according to your needs.
The paper gives a high-level overview of the framework and its components. In order to understand the framework in more detail, you can read the documentation of the individual modules. Especially the Scheduler module is important for understanding how to make the most out of the framework as well as the Operation module for the interpretation of the results.
The examples directory contains several examples of use cases that can be solved using the framework, including the ones presented in the paper.
It is a great starting point for learning how to use the framework to solve real problems.
Each example contains a README.md
file with instructions on how to run it and play with it.
You can run the experiments from the paper by following the instructions in the examples directory. However, if you just want to inspect and replot the results, you can use the paper directory.
If you find this repository valuable, please give it a star! Got any questions or feedback? Feel free to reach out and open an issue. Using this in your work? Please reference us using the provided citation:
@misc{besta2024checkembed,
title = {{CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks}},
author = {Besta, Maciej and Paleari, Lorenzo and Kubicek, Ales and Nyczyk, Piotr and Gerstenberger, Robert and Iff, Patrick and Lehmann, Tomasz and Niewiadomski, Hubert and Hoefler, Torsten},
year = 2024,
month = Jun,
eprinttype = {arXiv},
eprint = {2406.02524}
}