This branch is 1 commit ahead of, 196 commits behind bigscience-workshop/bigscience:master.

Name	Name	Last commit message	Last commit date
Latest commit EIFY "added by us" placement Aug 1, 2022 7b02940 · Aug 1, 2022 History 965 Commits
.github	.github	cookiecutter plus some cleanup	May 13, 2021
bigscience	bigscience	cookiecutter plus some cleanup	May 13, 2021
data	data	Added size calculation scripts	Jun 6, 2022
evaluation	evaluation	Fix generation script (bigscience-workshop#54 )	Jul 13, 2022
experiments	experiments	fixes	Apr 29, 2022
finetune	finetune	samyam's estimation notes	Jun 17, 2021
inference	inference	Update and rename modeling_gpt2_alibi.py to modeling_gpt2_alibi_prefi…	Oct 5, 2021
jz	jz	add username reporting	Jun 29, 2022
math	math	inference tflops math; update param math for our specific arch	Jul 19, 2022
megatron-notes	megatron-notes	new insights	Aug 9, 2021
tests	tests	cookiecutter plus some cleanup	May 13, 2021
tools	tools	typo	Jun 28, 2022
train	train	"added by us" placement	Aug 1, 2022
.editorconfig	.editorconfig	cookiecutter plus some cleanup	May 13, 2021
.gitignore	.gitignore	Evaluation scripts (bigscience-workshop#24 )	Jan 6, 2022
CODEOWNERS	CODEOWNERS	Create CODEOWNERS	Apr 25, 2022
CONTRIBUTING.md	CONTRIBUTING.md	contribute	Aug 16, 2021
LICENSE	LICENSE	cookiecutter plus some cleanup	May 13, 2021
MANIFEST.in	MANIFEST.in	cookiecutter plus some cleanup	May 13, 2021
Makefile	Makefile	cookiecutter plus some cleanup	May 13, 2021
README.md	README.md	updates	Mar 14, 2022
TODO.md	TODO.md	updates	Mar 1, 2022
pytorch-notes.md	pytorch-notes.md	doc	Aug 22, 2021
requirements_dev.txt	requirements_dev.txt	Update requirements_dev.txt	Jul 5, 2021
setup.cfg	setup.cfg	cookiecutter plus some cleanup	May 13, 2021
setup.py	setup.py	fix	May 13, 2021

Repository files navigation

bigscience

Research workshop on large language models - The Summer of Language Models 21

At the moment we have 2 code repos:

https://github.com/bigscience-workshop/Megatron-DeepSpeed - this is our flagship code base
https://github.com/bigscience-workshop/bigscience - (this repo) for everything else - docs, experiments, etc.

Currently, the most active segments of this repo are:

JZ - Lots of information about our work environment which helps evaluate, plan and get things done
Experiments - many experiments are being done. Documentation, result tables, scripts and logs are all there
Datasets info
Train - all the information about the current trainings (see below for the most important ones)

We have READMEs for specific aspects, such as:

hub integration

Trainings

While we keep detailed chronicles of experiments and findings for some of the main trainings, here is a doc that contains a summary of the most important findings: Lessons learned

Train 1 - 13B - unmodified Megatron gpt2 - baseline

the full spec and discussions
the training script
checkpoints and logs:
- tensorboard
- logs
chronicles

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour:

perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -sI $u]=~/content-length: (\d+)/; \
print qx[curl -sr $b-$e -L $u] if $e>$b; $b=$e; sleep 300}' \
https://huggingface.co/bigscience/tr1-13B-logs/resolve/main/main_log.txt

Train 3

Architecture and scaling baseline runs: no fancy tricks, just GPT2. Here are links to the respective tensorboards:

Size	1B3	760M	350M	125M
C4 + low warmup	a	b	c
OSCAR + low warmup	f
C4 + high warmup	e
OSCAR + high warmup	d (current baseline)	g	h	i
Pile + high warmup	m	j	k	l

Train 8

104B - unmodified Megatron gpt2 - with extra-wide hidden size to learn how to deal with training instabilities

the full spec and discussions
the training script
checkpoints and logs:
- tensorboard
- logs
chronicles

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour:

perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -sI $u]=~/content-length: (\d+)/; \
print qx[curl -sr $b-$e -L $u] if $e>$b; $b=$e; sleep 300}' \
https://cdn-lfs.huggingface.co/bigscience/tr8-104B-logs/b2cc478d5ae7c9ec937ea2db1d2fe09de593fa2ec38c171d6cc5dca094cd79f9

Train 11

This is the current main training

tr11-176B-ml

the full spec and discussions
the training script
checkpoints and logs:
- tensorboard
- logs
chronicles-prequel
chronicles

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour:

perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -LsI $u]=~/2 200.*?content-length: (\d+)/s; \
print qx[curl -Lsr $b-$e $u] if $e>$b; $b=$e; sleep 300}' \
https://huggingface.co/bigscience/tr11-176B-ml-logs/resolve/main/logs/main/main_log.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bigscience

Trainings

Train 1 - 13B - unmodified Megatron gpt2 - baseline

Train 3

Train 8

Train 11

About

Releases

Packages

Languages

License

EIFY/bigscience

Folders and files

Latest commit

History

Repository files navigation

bigscience

Trainings

Train 1 - 13B - unmodified Megatron gpt2 - baseline

Train 3

Train 8

Train 11

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages