[Deepspeed] add many more models to the model zoo test #12695

stas00 · 2021-07-14T04:20:18Z

This PR continues figuring out how to make various models work with Deepspeed (a lot of fixes happen on the Deepspeed side), most models just work out of the box - the main purpose of this PR is to test as many models as possible. so there are no fixes to add.

update coverage to albert, bart, bert, bigbird_pegasus, big_bird, blenderbot, deberta, deberta_v2, distilbert, electra, flaubert, fsmt, funnel, gpt2, gptj, gpt_neo, layoutlm, led, longformer, marian, mbart, mobilebert, mpnet, pegasus, prophetnet, roberta, squeezebert, t5, t5_v1, vit, xlm_roberta, xlnet

Thanks to @LysandreJik for creating the tiny test models for many of HF models!

Some models I couldn't cover for a variety of reasons unrelated to Deepspeed (missing tokenizers, missing tiny models, missing example scripts to exercise these). But their status is documented in the script. Over time more will be tested.

Blocking events - all resolved:

[zero3] fix reference counting in backward over multiple forwards microsoft/DeepSpeed#1227 (fixes reference counting)
[zero_to_fp32] fix padding removal microsoft/DeepSpeed#1380 (fixes zero_to_fp32 recovery of uneven param shapes)
[SinusoidalPositionalEmbedding] incorrect dtype when resizing in forward #13665 (fixes positional embeddings: m2m_100 and others)
Improve z3 trace management microsoft/DeepSpeed#1916 (comment) (fixes tracing)
0.6.4 Deepspeed release that includes all the merged PRs

HuggingFaceDocBuilderDev · 2022-04-05T18:38:25Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Very nice work, thanks a lot @stas00 !

sameeravithana · 2022-05-10T21:01:34Z

nice work @stas00, have you tested Perceiver with DeepSpeed.

stas00 · 2022-05-10T22:38:53Z

Would be glad to do that, @sameeravithana- in order to do that I need is a Trainer-based example script that I can test with.

As you can see from this map:

transformers/tests/deepspeed/test_model_zoo.py

Lines 231 to 270 in 4a419d4

    
               tasks = dict( 
        
                   trans=f""" 
        
                   {scripts_dir}/translation/run_translation.py 
        
                   --train_file {data_dir_wmt}/train.json 
        
                   --source_lang en 
        
                   --target_lang ro 
        
                   """, 
        
                   sum=f""" 
        
                   {scripts_dir}/summarization/run_summarization.py 
        
                   --train_file {data_dir_xsum}/sample.json 
        
                   --max_source_length 12 
        
                   --max_target_length 12 
        
                   --lang en 
        
                   """, 
        
                   clm=f""" 
        
                   {scripts_dir}/language-modeling/run_clm.py 
        
                   --train_file {FIXTURE_DIRECTORY}/sample_text.txt 
        
                   --block_size 8 
        
                   """, 
        
                   mlm=f""" 
        
                   {scripts_dir}/language-modeling/run_mlm.py 
        
                   --train_file {FIXTURE_DIRECTORY}/sample_text.txt 
        
                   """, 
        
                   qa=f""" 
        
                   {scripts_dir}/question-answering/run_qa.py 
        
                   --train_file {data_dir_samples}/SQUAD/sample.json 
        
                   """, 
        
                   clas=f""" 
        
                   {scripts_dir}/text-classification/run_glue.py 
        
                   --train_file {data_dir_samples}/MRPC/train.csv 
        
                   --max_seq_length 12 
        
                   --task_name MRPC 
        
                   """, 
        
                   img_clas=f""" 
        
                   {scripts_dir}/image-classification/run_image_classification.py 
        
                       --dataset_name hf-internal-testing/cats_vs_dogs_sample 
        
                       --remove_unused_columns False 
        
                       --max_steps 10 
        
                       --feature_extractor_name {DS_TESTS_DIRECTORY}/vit_feature_extractor.json 
        
                   """,

I have each model tested by one of HF Trainer examples. Is there one that can be used with perceiver?

…2695) * model zoo take 2 * add deberta * new param for zero2 * doc update * doc update * add layoutlm * bump deepspeed * add deberta-v2, funnel, longformer * new models * style * add t5_v1 * update TAPAS status * reorg problematic models * move doc to another PR * style * fix checkpoint check test * making progress on more models running * cleanup * new version * cleanup

stas00 added 7 commits July 13, 2021 21:17

model zoo take 2

f71beed

add deberta

a0d39c7

new param for zero2

4fb263c

doc update

fda0692

doc update

92a804b

add layoutlm

ff3dd7a

bump deepspeed

d790721

stas00 mentioned this pull request Aug 4, 2021

Training hangs at the very start while using deepspeed #12989

Closed

huggingface deleted a comment from github-actions bot Aug 28, 2021

stas00 added DeepSpeed WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress labels Aug 28, 2021

stas00 added 4 commits September 15, 2021 16:44

Merge remote-tracking branch 'origin/master' into ds-model-zoo-2

d84623d

Merge remote-tracking branch 'origin/master' into ds-model-zoo-2

779d3a8

Merge remote-tracking branch 'origin/master' into ds-model-zoo-2

b7e09af

add deberta-v2, funnel, longformer

06751a0

stas00 mentioned this pull request Sep 21, 2021

[SinusoidalPositionalEmbedding] incorrect dtype when resizing in forward #13665

Merged

stas00 force-pushed the ds-model-zoo-2 branch from d201464 to 06751a0 Compare October 12, 2021 04:21

stas00 added 9 commits October 11, 2021 21:23

new models

8508915

style

18604ad

Merge remote-tracking branch 'origin/master' into ds-model-zoo-2

7226403

add t5_v1

906bc00

Merge remote-tracking branch 'origin/master' into ds-model-zoo-2

ff16c5e

update TAPAS status

d12104b

reorg problematic models

96c9fce

move doc to another PR

cf09bb4

Merge remote-tracking branch 'origin/master' into ds-model-zoo-2

661ef78

stas00 mentioned this pull request Feb 9, 2022

✨ update image classification example #13824

Merged

5 tasks

stas00 mentioned this pull request Mar 25, 2022

Is it possible to train all the models available in hub using deepspeed? #16408

Closed

stas00 added 2 commits April 5, 2022 11:10

Merge remote-tracking branch 'upstream/main' into ds-model-zoo-2

394e2bb

style

4b365b7

stas00 mentioned this pull request Apr 12, 2022

[deepspeed / m2m_100] make deepspeed zero-3 work with layerdrop #16717

Merged

stas00 added 7 commits April 11, 2022 18:14

Merge remote-tracking branch 'upstream/main' into ds-model-zoo-2

fc59ab4

fix checkpoint check test

f0530e4

making progress on more models running

5bbd905

Merge remote-tracking branch 'origin/main' into ds-model-zoo-2

d37063e

cleanup

2442499

new version

e613df6

Merge remote-tracking branch 'origin/main' into ds-model-zoo-2

455b3c1

stas00 marked this pull request as ready for review May 9, 2022 23:36

stas00 requested a review from sgugger May 9, 2022 23:36

stas00 changed the title ~~[WIP] [Deepspeed] model zoo continued~~ [Deepspeed] model zoo continued May 9, 2022

stas00 changed the title ~~[Deepspeed] model zoo continued~~ [Deepspeed] add many more models to the model zoo test May 9, 2022

Merge remote-tracking branch 'origin/main' into ds-model-zoo-2

8d0d976

sgugger approved these changes May 10, 2022

View reviewed changes

cleanup

549784e

stas00 merged commit f861504 into huggingface:main May 10, 2022

stas00 deleted the ds-model-zoo-2 branch May 10, 2022 15:22

stas00 mentioned this pull request May 10, 2022

[Deepspeed tests] missing file #17164

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Deepspeed] add many more models to the model zoo test #12695

[Deepspeed] add many more models to the model zoo test #12695

stas00 commented Jul 14, 2021 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 5, 2022 •

edited

Loading

sgugger left a comment

sameeravithana commented May 10, 2022

stas00 commented May 10, 2022

[Deepspeed] add many more models to the model zoo test #12695

[Deepspeed] add many more models to the model zoo test #12695

Conversation

stas00 commented Jul 14, 2021 • edited Loading

HuggingFaceDocBuilderDev commented Apr 5, 2022 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

sameeravithana commented May 10, 2022

stas00 commented May 10, 2022

stas00 commented Jul 14, 2021 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 5, 2022 •

edited

Loading