Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading sharded model in tf from pytorch checkpoints #17537

Closed
4 tasks
ArthurZucker opened this issue Jun 3, 2022 · 6 comments · Fixed by #18419
Closed
4 tasks

Loading sharded model in tf from pytorch checkpoints #17537

ArthurZucker opened this issue Jun 3, 2022 · 6 comments · Fixed by #18419
Assignees
Labels

Comments

@ArthurZucker
Copy link
Collaborator

System Info

- `transformers` version: 4.20.0.dev0
- Platform: macOS-12.4-arm64-arm-64bit
- Python version: 3.9.12
- Huggingface_hub version: 0.6.0
- PyTorch version (GPU?): 1.13.0.dev20220521 (False)
- Tensorflow version (GPU?): 2.9.0 (True)
- Flax version (CPU?/GPU?/TPU?): 0.4.2 (cpu)
- Jax version: 0.3.6
- JaxLib version: 0.3.5

Who can help?

@LysandreJik I am not sure who to ping on that 😅

Loading a big model from the hub in tensorflow is impossible if the model is sharded.

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

>>> tf_model = TFOPTModel.from_pretrained("facebook/opt-13b",from_pt = True)
Traceback (most recent call last):
  File "/home/arthur_huggingface_co/transformers/src/transformers/modeling_tf_utils.py", line 1789, in from_pretrained
    resolved_archive_file = cached_path(
  File "/home/arthur_huggingface_co/transformers/src/transformers/utils/hub.py", line 282, in cached_path
    output_path = get_from_cache(
  File "/home/arthur_huggingface_co/transformers/src/transformers/utils/hub.py", line 486, in get_from_cache
    _raise_for_status(r)
  File "/home/arthur_huggingface_co/transformers/src/transformers/utils/hub.py", line 409, in _raise_for_status
    raise EntryNotFoundError(f"404 Client Error: Entry Not Found for url: {request.url}")
transformers.utils.hub.EntryNotFoundError: 404 Client Error: Entry Not Found for url: https://huggingface.co/facebook/opt-13b/resolve/main/pytorch_model.bin

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/arthur_huggingface_co/transformers/src/transformers/modeling_tf_utils.py", line 1833, in from_pretrained
    raise EnvironmentError(
OSError: facebook/opt-13b does not appear to have a file named pytorch_model.bin.

The following script has to be used in order to convert the weights:

path = "facebook/opt-13b"
pt_model = OPTModel.from_pretrained(path)
pt_model.save_pretrained(path,max_shard_size = "1000GB")
tf_model = TFOPTModel.from_pretrained(path,from_pt = True)
tf_model.save_pretrained(path,save_config=False)

Expected behavior

Automatically do this in background?
@LysandreJik
Copy link
Member

Indeed, nice catch! Putting @sgugger in the loop

@LysandreJik
Copy link
Member

Simple reproducer:

from transformers import TFBertModel

model = TFBertModel.from_pretrained("sgugger/bert-sharded")

@sgugger
Copy link
Collaborator

sgugger commented Jun 3, 2022

Putting it on my TODO (might take a few weeks as I have more urgent items, and we don't have a good solution on the TF side for large models right now anyway).

@github-actions
Copy link

github-actions bot commented Jul 3, 2022

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@ArthurZucker
Copy link
Collaborator Author

Fixing this 😄

@ArthurZucker ArthurZucker self-assigned this Jul 7, 2022
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants