Skip to content

allenai/OLMo-core

Repository files navigation

OLMo-core

Building blocks for OLMo modeling and training

Examples || Docs || PyPI || Beaker Images || License || Changelog

Installation

First install PyTorch according to the instructions specific to your operating system. Then you can install from PyPI with:

pip install ai2-olmo-core

Official training scripts

Official training scripts for various model sizes can be found in src/scripts/train/. Throughput numbers are reported below.

Model size Context Length Script Throughput1 MFU
1B 4K OLMo-1B.py 45-47K TPS 39-41%
7B 4K OLMo-7B.py 9.7-10K TPS 47-48%
13B 4K OLMo-13B.py 4.4-4.6K TPS 41-42%

Development

After cloning OLMo-core and setting up a Python virtual environment, install the codebase from source with:

pip install -e .[all]

The Python library source code is located in src/olmo_core. The corresponding tests are located in src/test. The library docs are located in docs. You can build the docs locally with make docs.

Code checks:

  • We use pytest to run tests. You can run all tests with pytest -v src/test. You can also point pytest at a specific test file to run it individually.
  • We use isort and black for code formatting. Ideally you should integrate these into your editor, but you can also run them manually or configure them with a pre-commit hook. To validate that all files are formatted correctly, run make style-check.
  • We use ruff as our primary linter. You can run it with make lint-check.
  • We use mypy as our type checker. You can run it with make type-check.

Footnotes

  1. Throughput numbers reported in tokens per second per device, measured on a cluster of H100 GPUs.