Welcome to ProtBench! This library is designed to make benchmarking protein language models easy and modular. Whether you're adding new models, datasets, or using downstream models, this library has you covered. With support for embedding extraction and saving embeddings to disk, you can streamline your workflow and focus on what matters most: advancing your research.
- Ease of Use: Simple and intuitive API for benchmarking protein language models.
- Modular Design: Easily add new models and datasets for benchmarking.
- Downstream Models: Support for integrating and benchmarking downstream models (Currently, supports ConvBERT only).
- LoRA Integration: Use LoRA (Low-Rank Adaptation) for efficient benchmarking.
- Embedding Extraction: Extract embeddings and save them to disk for later use.
To install the library, simply use pip:
git clone git@github.com:Proteinea/protbench.git
pip install -e .
Here are some simple examples to get you started: Example directories:
- ESM2: protbench/examples/train_with_convbert_esm2.py
- ANKH: protbench/examples/train_with_convbert.py
- ESM2 with LoRA: protbench/examples/train_with_lora_esm2.py
- Ankh with LoRA: protbench/examples/train_with_lora.py
Will be added soon.
We welcome contributions from the community. If you'd like to contribute, please fork the repository and submit a pull request.