Loading sharded model in `tf` from pytorch checkpoints #17537

ArthurZucker · 2022-06-03T07:14:03Z

System Info

- `transformers` version: 4.20.0.dev0
- Platform: macOS-12.4-arm64-arm-64bit
- Python version: 3.9.12
- Huggingface_hub version: 0.6.0
- PyTorch version (GPU?): 1.13.0.dev20220521 (False)
- Tensorflow version (GPU?): 2.9.0 (True)
- Flax version (CPU?/GPU?/TPU?): 0.4.2 (cpu)
- Jax version: 0.3.6
- JaxLib version: 0.3.5

Who can help?

@LysandreJik I am not sure who to ping on that 😅

Loading a big model from the hub in tensorflow is impossible if the model is sharded.

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

>>> tf_model = TFOPTModel.from_pretrained("facebook/opt-13b",from_pt = True)

Traceback (most recent call last):
  File "/home/arthur_huggingface_co/transformers/src/transformers/modeling_tf_utils.py", line 1789, in from_pretrained
    resolved_archive_file = cached_path(
  File "/home/arthur_huggingface_co/transformers/src/transformers/utils/hub.py", line 282, in cached_path
    output_path = get_from_cache(
  File "/home/arthur_huggingface_co/transformers/src/transformers/utils/hub.py", line 486, in get_from_cache
    _raise_for_status(r)
  File "/home/arthur_huggingface_co/transformers/src/transformers/utils/hub.py", line 409, in _raise_for_status
    raise EntryNotFoundError(f"404 Client Error: Entry Not Found for url: {request.url}")
transformers.utils.hub.EntryNotFoundError: 404 Client Error: Entry Not Found for url: https://huggingface.co/facebook/opt-13b/resolve/main/pytorch_model.bin

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/arthur_huggingface_co/transformers/src/transformers/modeling_tf_utils.py", line 1833, in from_pretrained
    raise EnvironmentError(
OSError: facebook/opt-13b does not appear to have a file named pytorch_model.bin.

The following script has to be used in order to convert the weights:

path = "facebook/opt-13b"
pt_model = OPTModel.from_pretrained(path)
pt_model.save_pretrained(path,max_shard_size = "1000GB")
tf_model = TFOPTModel.from_pretrained(path,from_pt = True)
tf_model.save_pretrained(path,save_config=False)

Expected behavior

Automatically do this in background?

The text was updated successfully, but these errors were encountered:

LysandreJik · 2022-06-03T13:13:33Z

Indeed, nice catch! Putting @sgugger in the loop

LysandreJik · 2022-06-03T13:13:55Z

Simple reproducer:

from transformers import TFBertModel

model = TFBertModel.from_pretrained("sgugger/bert-sharded")

sgugger · 2022-06-03T13:51:30Z

Putting it on my TODO (might take a few weeks as I have more urgent items, and we don't have a good solution on the TF side for large models right now anyway).

github-actions · 2022-07-03T15:01:39Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ArthurZucker · 2022-07-05T13:27:09Z

Fixing this 😄

github-actions · 2022-07-31T15:02:02Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ArthurZucker added the bug label Jun 3, 2022

ArthurZucker mentioned this issue Jul 5, 2022

Load sharded pt model to flax (from the hub ) #18026

Closed

ArthurZucker self-assigned this Jul 7, 2022

ArthurZucker mentioned this issue Aug 2, 2022

Load sharded pt to flax #18419

Merged

github-actions bot closed this as completed Aug 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading sharded model in `tf` from pytorch checkpoints #17537

Loading sharded model in `tf` from pytorch checkpoints #17537

ArthurZucker commented Jun 3, 2022

LysandreJik commented Jun 3, 2022

LysandreJik commented Jun 3, 2022

sgugger commented Jun 3, 2022

github-actions bot commented Jul 3, 2022

ArthurZucker commented Jul 5, 2022

github-actions bot commented Jul 31, 2022

Loading sharded model in tf from pytorch checkpoints #17537

Loading sharded model in tf from pytorch checkpoints #17537

Comments

ArthurZucker commented Jun 3, 2022

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

LysandreJik commented Jun 3, 2022

LysandreJik commented Jun 3, 2022

sgugger commented Jun 3, 2022

github-actions bot commented Jul 3, 2022

ArthurZucker commented Jul 5, 2022

github-actions bot commented Jul 31, 2022

Loading sharded model in `tf` from pytorch checkpoints #17537

Loading sharded model in `tf` from pytorch checkpoints #17537