Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
parthsarthi03 committed Feb 27, 2024
0 parents commit 71d307c
Show file tree
Hide file tree
Showing 21 changed files with 2,854 additions and 0 deletions.
33 changes: 33 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# Vim
*.swp

# Jupyter Notebook
.ipynb_checkpoints

# mac
.DS_Store

# Other
.vscode
*.tsv
*.pt
gpt*.txt
*.env
local/
local_*
build/
*.egg-info/
!/data/*.json
/dist/
checklist.md
finetuning_ckpts/
* copy*
.idea
assertion.log
*.log
*.db
21 changes: 21 additions & 0 deletions LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
The MIT License

Copyright (c) Parth Sarthi

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
192 changes: 192 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
<p align="center">
<img align="center" src="raptor.jpg" width="1000px" />
</p>
<p align="left">

## RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

**RAPTOR** introduces a novel approach to retrieval-augmented language models by constructing a recursive tree structure from documents. This allows for more efficient and context-aware information retrieval across large texts, addressing common limitations in traditional language models.



For detailed methodologies and implementations, refer to the original paper:

- [RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval](https://arxiv.org/abs/2401.18059)

[![Paper page](https://huggingface.co/datasets/huggingface/badges/resolve/main/paper-page-sm.svg)](https://huggingface.co/papers/2401.18059)

## Installation

Before using RAPTOR, ensure Python 3.8+ is installed. Clone the RAPTOR repository and install necessary dependencies:

```bash
git clone https://github.com/parthsarthi03/RAPTOR.git
cd RAPTOR
pip install -r requirements.txt
```

## Basic Usage

To get started with RAPTOR, follow these steps:

### Setting Up RAPTOR

First, set your OpenAI API key and initialize the RAPTOR configuration:

```python
import os
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"

from raptor import RetrievalAugmentation

# Initialize with default configuration. For advanced configurations, check the documentation. [WIP]
RA = RetrievalAugmentation()
```

### Adding Documents to the Tree

Add your text documents to RAPTOR for indexing:

```python
with open('sample.txt', 'r') as file:
text = file.read()
RA.add_documents(text)
```

### Answering Questions

You can now use RAPTOR to answer questions based on the indexed documents:

```python
question = "How did Cinderella reach her happy ending?"
answer = RA.answer_question(question=question)
print("Answer: ", answer)
```

### Saving and Loading the Tree

Save the constructed tree to a specified path:

```python
SAVE_PATH = "Demo/cinderalla"
RA.save(SAVE_PATH)
```

Load the saved tree back into RAPTOR:

```python
RA = RetrievalAugmentation(tree=SAVE_PATH)
answer = RA.answer_question(question=question)
```


### Extending RAPTOR with other Models

RAPTOR is designed to be flexible and allows you to integrate any models for summarization, question-answering (QA), and embedding generation. Here is how to extend RAPTOR with your own models:

#### Custom Summarization Model

If you wish to use a different language model for summarization, you can do so by extending the `BaseSummarizationModel` class. Implement the `summarize` method to integrate your custom summarization logic:

```python
from raptor import BaseSummarizationModel

class CustomSummarizationModel(BaseSummarizationModel):
def __init__(self):
# Initialize your model here
pass

def summarize(self, context, max_tokens=150):
# Implement your summarization logic here
# Return the summary as a string
summary = "Your summary here"
return summary
```

#### Custom QA Model

For custom QA models, extend the `BaseQAModel` class and implement the `answer_question` method. This method should return the best answer found by your model given a context and a question:

```python
from raptor import BaseQAModel

class CustomQAModel(BaseQAModel):
def __init__(self):
# Initialize your model here
pass

def answer_question(self, context, question):
# Implement your QA logic here
# Return the answer as a string
answer = "Your answer here"
return answer
```

#### Custom Embedding Model

To use a different embedding model, extend the `BaseEmbeddingModel` class. Implement the `create_embedding` method, which should return a vector representation of the input text:

```python
from raptor import BaseEmbeddingModel

class CustomEmbeddingModel(BaseEmbeddingModel):
def __init__(self):
# Initialize your model here
pass

def create_embedding(self, text):
# Implement your embedding logic here
# Return the embedding as a numpy array or a list of floats
embedding = [0.0] * embedding_dim # Replace with actual embedding logic
return embedding
```

#### Integrating Custom Models with RAPTOR

After implementing your custom models, integrate them with RAPTOR as follows:

```python
from raptor import RetrievalAugmentation, RetrievalAugmentationConfig

# Initialize your custom models
custom_summarizer = CustomSummarizationModel()
custom_qa = CustomQAModel()
custom_embedding = CustomEmbeddingModel()

# Create a config with your custom models
custom_config = RetrievalAugmentationConfig(
summarization_model=custom_summarizer,
qa_model=custom_qa,
embedding_model=custom_embedding
)

# Initialize RAPTOR with your custom config
RA = RetrievalAugmentation(config=custom_config)
```

Check out `demo.ipynb` for examples on how to specify your own summarization/QA models, such as Llama/Mistral/Gemma, and Embedding Models such as SBERT, for use with RAPTOR.

Note: More examples and ways to configure RAPTOR are forthcoming. Advanced usage and additional features will be provided in the documentation and repository updates.

## Contributing

RAPTOR is an open-source project, and contributions are welcome. Whether you're fixing bugs, adding new features, or improving documentation, your help is appreciated.

## License

RAPTOR is released under the MIT License. See the LICENSE file in the repository for full details.

## Citation

If RAPTOR assists in your research, please cite it as follows:

```bibtex
@inproceedings{sarthi2024raptor,
title={RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval},
author={Sarthi, Parth and Abdullah, Salman and Tuli, Aditi and Khanna, Shubh and Goldie, Anna and Manning, Christopher D.},
booktitle={International Conference on Learning Representations (ICLR)},
year={2024}
}
```

Stay tuned for more examples, configuration guides, and updates.
Loading

0 comments on commit 71d307c

Please sign in to comment.