Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor into monorepo structure #230

Merged
merged 4 commits into from
Feb 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitmodules
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[submodule "llm-foundry"]
path = llm-foundry
path = LUMI/llm-foundry
url = https://github.com/rlrs/llm-foundry
branch = lumi
File renamed without changes.
File renamed without changes.
File renamed without changes.
1 change: 0 additions & 1 deletion llm-foundry
Submodule llm-foundry deleted from f89ce6
4 changes: 0 additions & 4 deletions src/dfm/__init__.py

This file was deleted.

Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
2 changes: 0 additions & 2 deletions src/dfm/test_sample.py

This file was deleted.

12 changes: 7 additions & 5 deletions scripts/lumi/README.md → training/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
# Model training on LUMI

## Dataset preparation

From a jsonl file (such as da-gigaword), something like `python scripts/data/convert_dataset_json.py --path /path/to/da-gigaword.jsonl.tar.gz --out_root ./da-gigaword-mds --concat_tokens 4096 --tokenizer mistralai/Mistral-7B-v0.1 --test_size 0.02` will generate the necessary Mosaic streaming dataset. Takes ~2 hours for da-gigaword, which is a bit slow. When done, copy this folder to LUMI scratch and configure data path in the training YAML, e.g. `scripts/lumi/yamls/continue-mistral-7b.yaml`.

## LUMI setup and training

1. SSH into LUMI
3. Enter project: `cd /scratch/project_465000670/danish-foundation-models`
2. Enter container: `singularity run --cleanenv --bind /scratch/project_465000670/ /project/project_465000670/pytorch_rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1.sif`
5. Set up virtual environment: `./scripts/lumi/make_venv.sh`
6. Exit container
7. Run training: `./scripts/lumi/continue_mistral_mosaic.sh`
2. Enter project: `cd /scratch/project_465000670/danish-foundation-models`
3. Enter container: `singularity run --cleanenv --bind /scratch/project_465000670/ /project/project_465000670/pytorch_rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1.sif`
4. Set up virtual environment: `./scripts/lumi/make_venv.sh`
5. Exit container
6. Run training: `./scripts/lumi/continue_mistral_mosaic.sh`
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading