This repository contains the code for the paper: Transformers are Multi-State RNNs by Matanel Oren*, Michael Hassid*, Yossi Adi and Roy Schwartz.
First set the environment:
pip install transformers==4.36.2 sentencepiece
git clone https://github.com/schwartz-lab-NLP/TOVA.git
Next, use the following example code (currently supports LLaMA and Mistral only):
from transformers import AutoTokenizer, AutoModelForCausalLM
from TOVA import TOVACache, enable_tova_caching
tokenizer = AutoTokenizer.from_pretrained("your_model")
model = AutoModelForCausalLM.from_pretrained("your_model")
prompt = "Enter your prompt here"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
# use TOVA
enable_tova_caching(model)
multi_state_size = 512
cache = TOVACache(multi_state_size)
output = model.generate(input_ids, past_key_values=cache)
@misc{oren2024transformers,
title={Transformers are Multi-State {RNNs}},
author={Matanel Oren and Michael Hassid and Yossi Adi and Roy Schwartz},
year={2024},
note = {{arXiv}:2401.06104},
url = {https://arxiv.org/abs/2401.06104},
}