Avengers!

This repo serves as a workshop repo to experiment on Transformers-alternatives that use constant cache size and thus improve inference efficiencies.

Teaming

We are developing this in the open using data available in the community with the primary data sources from Dolma and FineWeb-Edu. The checkpoints will be made available to the community as well for further studies. So far, we have been in initial discussions with Minjia Zhang and Charith Mendis from UIUC, Tri Dao from Princeton, and Albert Gu from CMU. Anyone interested, please reach out to Raghu Ganti (rganti@us.ibm.com).

Roadmap

We put together a brief roadmap on studying these architectures and welcome input on suggestions, which we will take into account when we run the jobs.

Access

For access to the model checkpoints and WANDB trackers, please open an issue.

Status

07/20/2024: 3B Model is done 2T tokens and training 9.5B model on the same data mix
07/08/2024: Current model being trained: Mamba v2 3B to 2T tokens on Dolma data

Models

Llama-3 (Transformers baseline)
Mamba-2 (with support for hybrid Mamba - mix of Mamba layers and Attn layers)
Telescoping Cache Llama-3 (constant-cache-size implementation of Transformers)

Installation

We leverage the following repos for our experiments:

Models

Mamba: provides the model implementation for Mamba-2:

  git clone https://github.com/Dao-AILab/causal-conv1d.git
  cd causal-conv1d && pip install -e . && cd ..
  git clone https://github.com/state-spaces/mamba.git
  cd mamba && pip install -e . && cd ..
  pip install flash-attn --no-build-isolation

Foundation-Model-Stack: provides the model implementation for Llama-3.

git clone https://github.com/foundation-model-stack/foundation-model-stack.git
cd foundation-model-stack && pip install -e . && cd ..

Foundation-Model-Stack-Sandbox: provides the model implementation for Telescoping Cache Llama-3. (temporary location, to be moved to main Foundation-Model-Stack)

git clone -b telescoping-cache https://github.com/daviswer/foundation-model-stack-sandbox.git
cd foundation-model-stack-sandbox && pip install -e . && cd ..

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Avengers!

Teaming

Roadmap

Access

Status

Models

Installation

Models

Training Stack

Run

Models available -- COMING SOON!

Files

README.md

Latest commit

History

README.md

File metadata and controls

Avengers!

Teaming

Roadmap

Access

Status

Models

Installation

Models

Training Stack

Run

Models available -- COMING SOON!