CroissantLLM: Training Repository

Installation Instructions

As a pre-requisite, make sure you have ducttape and (mini)conda installed.

First, clone this repository and its submodules:

git clone --recurse-submodules git@github.com:CoderPat/croissant-llm-training.git

Then, to create a new conda environment with all the necessary dependencies, run the following command:

export CONDA_HOME="/path/to/(mini)conda3"
bash setup/conda.sh

Running pipelines

The core experimentation and training pipelines rely on ducttape, and are defined in main.tape. Configuration files for different models and datasets are defined in configs/.

Start by creating a configuration with user-dependent variables (like the output folder) in associated configs/*_uservars.conf associated with your chosen .tconf. E.g, for the configs/croissant_llm.tconf configuration, create a configs/croissant_llm_uservars.conf file with the following content:

global {
    ducttape_output=/path/to/output
    repo=/path/to/croissant-llm-training

    (...)
    # use a simple shell submitter 
    # we are forced to explicitly set the submitter parameters
    # to make it compatible with other submitters (ie the slurm submitter)
    submitter=shell
    dump_account=none
    dump_partition=none
    (...)
}

We provide a template for our user variables used in JeanZay.

Then, you can ran the one of the specified pipelines in main.tape by running ducttape with the corresponding configuration file:

conda activate towerllm-env
ducttape main.tape -C configs/croissant_llm.conf

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Megatron-DeepSpeed @ 27f5b92		Megatron-DeepSpeed @ 27f5b92
configs		configs
scripts		scripts
.gitmodules		.gitmodules
README.md		README.md
main.tape		main.tape

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CroissantLLM: Training Repository

Installation Instructions

Running pipelines

About

Releases

Packages

Languages

CoderPat/croissant-llm-training

Folders and files

Latest commit

History

Repository files navigation

CroissantLLM: Training Repository

Installation Instructions

Running pipelines

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages