CLI: convert sharded PT models #17959

gante · 2022-06-30T11:26:19Z

What does this PR do?

This PR adds a major upgrade and a minor change to the pt-to-tf CLI.

Major upgrade: we can now convert sharded PT models. It updates how the from_pt loading works so as to be able to load from shards. It also updates how the pt-to-tf CLI stores the models, so it uses sharding capabilities when needed.

Minor change: adds a flag to control the maximum hidden layer admissible error. It is relatively common to find models where the outputs from the PT and TF models are nearly the same, but the hidden layers have a larger mismatch. This flag allows us to temporarily increase the admissible error if the model seems to be behaving properly (for instance, all RegNet models had a hidden layer difference between 1e-4 and 1e-2, but the outputs were behaving properly).

Example of sharded TF model PR, using the updated tools: https://huggingface.co/facebook/regnet-y-10b-seer-in1k/discussions/1

HuggingFaceDocBuilderDev · 2022-06-30T11:37:02Z

The documentation is not available anymore as the PR was closed or merged.

ArthurZucker · 2022-06-30T12:28:29Z

src/transformers/modeling_tf_pytorch_utils.py

+    for path in pytorch_checkpoint_path:
+        pt_path = os.path.abspath(path)
+        logger.info(f"Loading PyTorch weights from {pt_path}")
+        pt_state_dict.update(torch.load(pt_path, map_location="cpu"))


That's super nice 👍🏻

That is a nice first step, but ideally, we'd want to convert the shards one by one to avoid using too much RAM and be able to convert LLMs checkpoints without needing a battle station.

haha yes, I had to spin up a machine with >100GB of RAM to convert the RegNets 😬

ArthurZucker

LGTM thanks for working on that!

Rocketknight1

LGTM to me too!

ArthurZucker · 2022-06-30T12:40:05Z

BTW could we add 2 tests, test_load_sharded_tf_to_pt and load_sharded_pt_to_tf

gante · 2022-06-30T12:45:23Z

TF shards -> PT probably won't work, but I will add the test for PT shards -> TF 👍

sgugger

Nice improvements, thanks!

sgugger · 2022-06-30T12:44:33Z

src/transformers/modeling_tf_pytorch_utils.py

+    for path in pytorch_checkpoint_path:
+        pt_path = os.path.abspath(path)
+        logger.info(f"Loading PyTorch weights from {pt_path}")
+        pt_state_dict.update(torch.load(pt_path, map_location="cpu"))


That is a nice first step, but ideally, we'd want to convert the shards one by one to avoid using too much RAM and be able to convert LLMs checkpoints without needing a battle station.

sgugger · 2022-06-30T12:46:55Z

src/transformers/modeling_tf_utils.py

+                elif from_pt and os.path.isfile(os.path.join(pretrained_model_name_or_path, WEIGHTS_INDEX_NAME)):
+                    # Load from a sharded PyTorch checkpoint
+                    archive_file = os.path.join(pretrained_model_name_or_path, WEIGHTS_INDEX_NAME)
+                    is_sharded = True


Nice addition, maybe we should also support loading from a remote sharded checkpoint with from_pt=True? (It should be its own PR if we decide to support this.)

* sharded conversion; add flag to control max hidden error * better hidden name matching * Add test: load TF from PT shards * fix test (PT data must be local)

sharded conversion; add flag to control max hidden error

4c23798

gante requested review from sgugger, ArthurZucker and Rocketknight1 June 30, 2022 11:26

better hidden name matching

2c80158

ArthurZucker reviewed Jun 30, 2022

View reviewed changes

ArthurZucker approved these changes Jun 30, 2022

View reviewed changes

Rocketknight1 approved these changes Jun 30, 2022

View reviewed changes

sgugger approved these changes Jun 30, 2022

View reviewed changes

gante added 2 commits June 30, 2022 15:06

Add test: load TF from PT shards

a561032

fix test (PT data must be local)

8ea9d98

gante merged commit 91e1f24 into huggingface:main Jun 30, 2022

gante deleted the sharded_tf_conversion branch June 30, 2022 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI: convert sharded PT models #17959

CLI: convert sharded PT models #17959

gante commented Jun 30, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 30, 2022 •

edited

Loading

ArthurZucker Jun 30, 2022

sgugger Jun 30, 2022

gante Jun 30, 2022 •

edited

Loading

ArthurZucker left a comment

Rocketknight1 left a comment

ArthurZucker commented Jun 30, 2022

gante commented Jun 30, 2022

sgugger left a comment

sgugger Jun 30, 2022

sgugger Jun 30, 2022

CLI: convert sharded PT models #17959

CLI: convert sharded PT models #17959

Conversation

gante commented Jun 30, 2022 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Jun 30, 2022 • edited Loading

ArthurZucker Jun 30, 2022

Choose a reason for hiding this comment

sgugger Jun 30, 2022

Choose a reason for hiding this comment

gante Jun 30, 2022 • edited Loading

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

Rocketknight1 left a comment

Choose a reason for hiding this comment

ArthurZucker commented Jun 30, 2022

gante commented Jun 30, 2022

sgugger left a comment

Choose a reason for hiding this comment

sgugger Jun 30, 2022

Choose a reason for hiding this comment

sgugger Jun 30, 2022

Choose a reason for hiding this comment

gante commented Jun 30, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 30, 2022 •

edited

Loading

gante Jun 30, 2022 •

edited

Loading