[Improvement] accelerate T5 model conversion and fix bloom model on multi-process #447

lanking520 · 2023-02-11T00:18:41Z

For example,

Flan-T5-xxl (49GB) conversion time reduced from 11 min to 4 min on a 64 core CPU with 4 concurrent process

If you do:

-i bigscience/bloomz-3b -o /tmp/ft_model3/ -tp 1 -p 4 -dt fp32

With Bloom conversion script. You will reproduce the bus error. This PR include a fix to address that

lanking520 · 2023-02-11T00:31:21Z

@byshiue

byshiue · 2023-02-13T02:31:02Z

examples/pytorch/gpt/utils/huggingface_bloom_convert.py

-def convert_and_save_parameter(config: PretrainedConfig,
-                               name: str,
-                               param: torch.nn.Parameter,
+def convert_and_save_parameter(param_name: str,


You change the API, but don't update the line 333.

I changed that in the following commit. But maybe we can just remove if else statement and use star_async. 1 process can still work under this condition.

I would assume everyone even the laptop now should have 4 cores, so 1 core condition not common

accelerate T5 model conversion on large models

6cfd70f

fix the bloom error

bced77b

lanking520 changed the title ~~[Improvement] accelerate T5 model conversion on large models~~ [Improvement] accelerate T5 model conversion and fix bloom model on multi-process Feb 11, 2023

rohithkrn approved these changes Feb 11, 2023

View reviewed changes

byshiue reviewed Feb 13, 2023

View reviewed changes

convert to use the same setup

a573be9

byshiue merged commit 9b6d718 into NVIDIA:main Feb 13, 2023

Chris113113 mentioned this pull request Feb 22, 2023

Missing weights when converting T5? #464

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement] accelerate T5 model conversion and fix bloom model on multi-process #447

[Improvement] accelerate T5 model conversion and fix bloom model on multi-process #447

lanking520 commented Feb 11, 2023 •

edited

Loading

lanking520 commented Feb 11, 2023

byshiue Feb 13, 2023

lanking520 Feb 13, 2023

lanking520 Feb 13, 2023

[Improvement] accelerate T5 model conversion and fix bloom model on multi-process #447

[Improvement] accelerate T5 model conversion and fix bloom model on multi-process #447

Conversation

lanking520 commented Feb 11, 2023 • edited Loading

lanking520 commented Feb 11, 2023

byshiue Feb 13, 2023

Choose a reason for hiding this comment

lanking520 Feb 13, 2023

Choose a reason for hiding this comment

lanking520 Feb 13, 2023

Choose a reason for hiding this comment

lanking520 commented Feb 11, 2023 •

edited

Loading