Skip to content

Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.

License

Notifications You must be signed in to change notification settings

EIFY/bigscience

This branch is 1 commit ahead of, 196 commits behind bigscience-workshop/bigscience:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

7b02940 · Aug 1, 2022
May 13, 2021
May 13, 2021
Jun 6, 2022
Jul 13, 2022
Apr 29, 2022
Jun 17, 2021
Oct 5, 2021
Jun 29, 2022
Jul 19, 2022
Aug 9, 2021
May 13, 2021
Jun 28, 2022
Aug 1, 2022
May 13, 2021
Jan 6, 2022
Apr 25, 2022
Aug 16, 2021
May 13, 2021
May 13, 2021
May 13, 2021
Mar 14, 2022
Mar 1, 2022
Aug 22, 2021
Jul 5, 2021
May 13, 2021
May 13, 2021

Repository files navigation

bigscience

Research workshop on large language models - The Summer of Language Models 21

At the moment we have 2 code repos:

  1. https://github.com/bigscience-workshop/Megatron-DeepSpeed - this is our flagship code base
  2. https://github.com/bigscience-workshop/bigscience - (this repo) for everything else - docs, experiments, etc.

Currently, the most active segments of this repo are:

  • JZ - Lots of information about our work environment which helps evaluate, plan and get things done
  • Experiments - many experiments are being done. Documentation, result tables, scripts and logs are all there
  • Datasets info
  • Train - all the information about the current trainings (see below for the most important ones)

We have READMEs for specific aspects, such as:

Trainings

While we keep detailed chronicles of experiments and findings for some of the main trainings, here is a doc that contains a summary of the most important findings: Lessons learned

Train 1 - 13B - unmodified Megatron gpt2 - baseline

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour:

perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -sI $u]=~/content-length: (\d+)/; \
print qx[curl -sr $b-$e -L $u] if $e>$b; $b=$e; sleep 300}' \
https://huggingface.co/bigscience/tr1-13B-logs/resolve/main/main_log.txt

Train 3

Architecture and scaling baseline runs: no fancy tricks, just GPT2. Here are links to the respective tensorboards:

Size 1B3 760M 350M 125M
C4 + low warmup a b c
OSCAR + low warmup f
C4 + high warmup e
OSCAR + high warmup d (current baseline) g h i
Pile + high warmup m j k l

Train 8

104B - unmodified Megatron gpt2 - with extra-wide hidden size to learn how to deal with training instabilities

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour:

perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -sI $u]=~/content-length: (\d+)/; \
print qx[curl -sr $b-$e -L $u] if $e>$b; $b=$e; sleep 300}' \
https://cdn-lfs.huggingface.co/bigscience/tr8-104B-logs/b2cc478d5ae7c9ec937ea2db1d2fe09de593fa2ec38c171d6cc5dca094cd79f9

Train 11

This is the current main training

tr11-176B-ml

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour:

perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -LsI $u]=~/2 200.*?content-length: (\d+)/s; \
print qx[curl -Lsr $b-$e $u] if $e>$b; $b=$e; sleep 300}' \
https://huggingface.co/bigscience/tr11-176B-ml-logs/resolve/main/logs/main/main_log.txt

About

Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 72.5%
  • Python 27.4%
  • Makefile 0.1%