-
Notifications
You must be signed in to change notification settings - Fork 152
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 71d307c
Showing
21 changed files
with
2,854 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# Vim | ||
*.swp | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# mac | ||
.DS_Store | ||
|
||
# Other | ||
.vscode | ||
*.tsv | ||
*.pt | ||
gpt*.txt | ||
*.env | ||
local/ | ||
local_* | ||
build/ | ||
*.egg-info/ | ||
!/data/*.json | ||
/dist/ | ||
checklist.md | ||
finetuning_ckpts/ | ||
* copy* | ||
.idea | ||
assertion.log | ||
*.log | ||
*.db |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
The MIT License | ||
|
||
Copyright (c) Parth Sarthi | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in | ||
all copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN | ||
THE SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,192 @@ | ||
<p align="center"> | ||
<img align="center" src="raptor.jpg" width="1000px" /> | ||
</p> | ||
<p align="left"> | ||
|
||
## RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval | ||
|
||
**RAPTOR** introduces a novel approach to retrieval-augmented language models by constructing a recursive tree structure from documents. This allows for more efficient and context-aware information retrieval across large texts, addressing common limitations in traditional language models. | ||
|
||
|
||
|
||
For detailed methodologies and implementations, refer to the original paper: | ||
|
||
- [RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval](https://arxiv.org/abs/2401.18059) | ||
|
||
[![Paper page](https://huggingface.co/datasets/huggingface/badges/resolve/main/paper-page-sm.svg)](https://huggingface.co/papers/2401.18059) | ||
|
||
## Installation | ||
|
||
Before using RAPTOR, ensure Python 3.8+ is installed. Clone the RAPTOR repository and install necessary dependencies: | ||
|
||
```bash | ||
git clone https://github.com/parthsarthi03/RAPTOR.git | ||
cd RAPTOR | ||
pip install -r requirements.txt | ||
``` | ||
|
||
## Basic Usage | ||
|
||
To get started with RAPTOR, follow these steps: | ||
|
||
### Setting Up RAPTOR | ||
|
||
First, set your OpenAI API key and initialize the RAPTOR configuration: | ||
|
||
```python | ||
import os | ||
os.environ["OPENAI_API_KEY"] = "your-openai-api-key" | ||
|
||
from raptor import RetrievalAugmentation | ||
|
||
# Initialize with default configuration. For advanced configurations, check the documentation. [WIP] | ||
RA = RetrievalAugmentation() | ||
``` | ||
|
||
### Adding Documents to the Tree | ||
|
||
Add your text documents to RAPTOR for indexing: | ||
|
||
```python | ||
with open('sample.txt', 'r') as file: | ||
text = file.read() | ||
RA.add_documents(text) | ||
``` | ||
|
||
### Answering Questions | ||
|
||
You can now use RAPTOR to answer questions based on the indexed documents: | ||
|
||
```python | ||
question = "How did Cinderella reach her happy ending?" | ||
answer = RA.answer_question(question=question) | ||
print("Answer: ", answer) | ||
``` | ||
|
||
### Saving and Loading the Tree | ||
|
||
Save the constructed tree to a specified path: | ||
|
||
```python | ||
SAVE_PATH = "Demo/cinderalla" | ||
RA.save(SAVE_PATH) | ||
``` | ||
|
||
Load the saved tree back into RAPTOR: | ||
|
||
```python | ||
RA = RetrievalAugmentation(tree=SAVE_PATH) | ||
answer = RA.answer_question(question=question) | ||
``` | ||
|
||
|
||
### Extending RAPTOR with other Models | ||
|
||
RAPTOR is designed to be flexible and allows you to integrate any models for summarization, question-answering (QA), and embedding generation. Here is how to extend RAPTOR with your own models: | ||
|
||
#### Custom Summarization Model | ||
|
||
If you wish to use a different language model for summarization, you can do so by extending the `BaseSummarizationModel` class. Implement the `summarize` method to integrate your custom summarization logic: | ||
|
||
```python | ||
from raptor import BaseSummarizationModel | ||
|
||
class CustomSummarizationModel(BaseSummarizationModel): | ||
def __init__(self): | ||
# Initialize your model here | ||
pass | ||
|
||
def summarize(self, context, max_tokens=150): | ||
# Implement your summarization logic here | ||
# Return the summary as a string | ||
summary = "Your summary here" | ||
return summary | ||
``` | ||
|
||
#### Custom QA Model | ||
|
||
For custom QA models, extend the `BaseQAModel` class and implement the `answer_question` method. This method should return the best answer found by your model given a context and a question: | ||
|
||
```python | ||
from raptor import BaseQAModel | ||
|
||
class CustomQAModel(BaseQAModel): | ||
def __init__(self): | ||
# Initialize your model here | ||
pass | ||
|
||
def answer_question(self, context, question): | ||
# Implement your QA logic here | ||
# Return the answer as a string | ||
answer = "Your answer here" | ||
return answer | ||
``` | ||
|
||
#### Custom Embedding Model | ||
|
||
To use a different embedding model, extend the `BaseEmbeddingModel` class. Implement the `create_embedding` method, which should return a vector representation of the input text: | ||
|
||
```python | ||
from raptor import BaseEmbeddingModel | ||
|
||
class CustomEmbeddingModel(BaseEmbeddingModel): | ||
def __init__(self): | ||
# Initialize your model here | ||
pass | ||
|
||
def create_embedding(self, text): | ||
# Implement your embedding logic here | ||
# Return the embedding as a numpy array or a list of floats | ||
embedding = [0.0] * embedding_dim # Replace with actual embedding logic | ||
return embedding | ||
``` | ||
|
||
#### Integrating Custom Models with RAPTOR | ||
|
||
After implementing your custom models, integrate them with RAPTOR as follows: | ||
|
||
```python | ||
from raptor import RetrievalAugmentation, RetrievalAugmentationConfig | ||
|
||
# Initialize your custom models | ||
custom_summarizer = CustomSummarizationModel() | ||
custom_qa = CustomQAModel() | ||
custom_embedding = CustomEmbeddingModel() | ||
|
||
# Create a config with your custom models | ||
custom_config = RetrievalAugmentationConfig( | ||
summarization_model=custom_summarizer, | ||
qa_model=custom_qa, | ||
embedding_model=custom_embedding | ||
) | ||
|
||
# Initialize RAPTOR with your custom config | ||
RA = RetrievalAugmentation(config=custom_config) | ||
``` | ||
|
||
Check out `demo.ipynb` for examples on how to specify your own summarization/QA models, such as Llama/Mistral/Gemma, and Embedding Models such as SBERT, for use with RAPTOR. | ||
|
||
Note: More examples and ways to configure RAPTOR are forthcoming. Advanced usage and additional features will be provided in the documentation and repository updates. | ||
|
||
## Contributing | ||
|
||
RAPTOR is an open-source project, and contributions are welcome. Whether you're fixing bugs, adding new features, or improving documentation, your help is appreciated. | ||
|
||
## License | ||
|
||
RAPTOR is released under the MIT License. See the LICENSE file in the repository for full details. | ||
|
||
## Citation | ||
|
||
If RAPTOR assists in your research, please cite it as follows: | ||
|
||
```bibtex | ||
@inproceedings{sarthi2024raptor, | ||
title={RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval}, | ||
author={Sarthi, Parth and Abdullah, Salman and Tuli, Aditi and Khanna, Shubh and Goldie, Anna and Manning, Christopher D.}, | ||
booktitle={International Conference on Learning Representations (ICLR)}, | ||
year={2024} | ||
} | ||
``` | ||
|
||
Stay tuned for more examples, configuration guides, and updates. |
Oops, something went wrong.