Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert T5x models to PyTorch #15464

Closed
peregilk opened this issue Feb 1, 2022 · 27 comments
Closed

Convert T5x models to PyTorch #15464

peregilk opened this issue Feb 1, 2022 · 27 comments
Assignees

Comments

@peregilk
Copy link
Contributor

peregilk commented Feb 1, 2022

🚀 Feature request

Googles new Flax implementation of T5, called T5x is creating models/checkpoints in a custom format.

The config is stored in .gin files, and the current T5 conversion scripts like this byT5 conversion script is not working.

Would it be possible to create a script for converting the T5x checkpoints/models?

@patrickvonplaten
@anton-l

@github-actions
Copy link

github-actions bot commented Mar 4, 2022

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@patrickvonplaten
Copy link
Contributor

Think @stefan-it has a working script :-)

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this as completed Apr 7, 2022
@stefan-it stefan-it reopened this Apr 7, 2022
@stefan-it stefan-it self-assigned this Apr 7, 2022
@dirkgr
Copy link
Contributor

dirkgr commented May 11, 2022

@stefan-it, can you share that script?

@stefan-it
Copy link
Collaborator

Hi @dirkgr , the script was merged into current master of Transformers with this #16853 and is available here:

https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/convert_t5x_checkpoint_to_flax.py :)

@StephennFernandes
Copy link

@stefan-it , hey could you please tell me how exactly does the conversion script works.

Actually i tired run the conversion script and I seems like the config file in t5x is in . gin format and the script expects the config file to be in .json format.

Hence I was stuck from converting my t5x model to HF.

Could you please show me how it's done and provide some details

@stefan-it
Copy link
Collaborator

Hi @StephennFernandes , could you please try to use these steps, mentioned in the corresponding PR:

#16853 (comment)

The config file needs to be in JSON format, yes :)

@stefan-it
Copy link
Collaborator

If you get any errors, please post them here, so we can try to find a solution 🤗

@StephennFernandes
Copy link

@stefan-it , thanks for replying. I followed the steps as instructed in #16853 and tried converting my pretrained t5_1_1_base model to hugginface.

But i get the following error:

/home/stephen/anaconda3/lib/python3.9/site-packages/jax/_src/tree_util.py:188: FutureWarning: jax.tree_util.tree_multimap() is deprecated. Please use jax.tree_util.tree_map() instead as a drop-in replacement.
  warnings.warn('jax.tree_util.tree_multimap() is deprecated. Please use jax.tree_util.tree_map() '
Traceback (most recent call last):
  File "/home/stephen/Desktop/t5_test_run/t5x/t5x_convert_to_hf.py", line 234, in <module>
    convert_t5x_checkpoint_to_flax(args.t5x_checkpoint_path, args.config_name, args.flax_dump_folder_path)
  File "/home/stephen/Desktop/t5_test_run/t5x/t5x_convert_to_hf.py", line 27, in convert_t5x_checkpoint_to_flax
    t5x_model = checkpoints.load_t5x_checkpoint(t5x_checkpoint_path)
  File "/home/stephen/Desktop/t5_test_run/t5x/t5x/checkpoints.py", line 1674, in load_t5x_checkpoint
    state_dict = _run_future_tree(future_state_dict)
  File "/home/stephen/Desktop/t5_test_run/t5x/t5x/checkpoints.py", line 162, in _run_future_tree
    leaves = loop.run_until_complete(asyncio.gather(*future_leaves))
  File "/home/stephen/anaconda3/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/stephen/Desktop/t5_test_run/t5x/t5x/checkpoint_importer.py", line 82, in _get_and_cast
    arr = await self._get_fn()  # pytype: disable=bad-return-type
  File "/home/stephen/Desktop/t5_test_run/t5x/t5x/checkpoints.py", line 1502, in _read_ts
    t = await ts.open(tmp_ts_spec_dict, open=True)
ValueError: Error opening "zarr" driver: Error reading local file "./T5_1_1_base_hindi/checkpoint_100000/state.param_states.decoder.decoder_norm.scale.v/.zarray": Invalid key: "./T5_1_1_base_hindi/checkpoint_100000/state.param_states.decoder.decoder_norm.scale.v/.zarray"

@stefan-it
Copy link
Collaborator

Hi @StephennFernandes could you try to install:

pip3 install --upgrade tensorstore==0.1.13

The tensorstore package was the reason for that zarr driver error message in my conversion experiments.

@StephennFernandes
Copy link

@stefan-it , hey i tried that but i didnt work for me, i still get the same error. I came across this issue in the t5x repo #452

i am currently using ubuntu 20.04 with linux kernel 5.13.0

@stefan-it
Copy link
Collaborator

stefan-it commented Jun 20, 2022

Hi @StephennFernandes ,

I think I have a working solution now. I installed everything in a fresh new virtual environment, but I got bazel errors (hopefully Google will stop using bazel someday...) when trying to build tensorstore==0.1.13.

What I did then:

pip3 install --upgrade tensorstore

to install latest version of tensorstore. The non-working conversion script call looks like:

python3 convert_t5x_checkpoint_to_flax.py --t5x_checkpoint_path ./t5_1_1_small --config_name ./config_1_1.json --flax_dump_folder_path ./t5x_1_1_exported

But tensorstore is not able to handle it. The magic trick here is to use the absolute path to the t5x checkpoint path. So instead of using ./t5_1_1_small fetch the absolute path via:

realpath ./t5_1_1_small

this returns something like:

/home/stefan/transformers/src/transformers/models/t5/t5_1_1_small

then use this path for the t5x_checkpoint_path argument.

I hope this works! It worked under my local setup.

(Oh, and in case you get some strange torch.fx import errors, just run pip3 install --upgrade torch --extra-index-url https://download.pytorch.org/whl/cpu to fix them)

@StephennFernandes
Copy link

StephennFernandes commented Jun 20, 2022

@stefan-it , it worked 🎉 Thanks a ton for all the help 🙏

Actually i still have a couple of other questions:

  • The current conversion only works on flax models, supposed I'd have to finetune the model in Huggingface using Pytorch. Is there a way to convert HF flax models to Pytorch internally ? Or would I have to first convert t5x model to Pytorch and then convert it to HF ?

  • Also I am a bit confused about the tokenizer, did this conversion script also convert the tokenizer ? ( I don't think the sentencepiece .model file existed in the model dir ) If not, how should I get going in converting the tokenizer to Huggingface ?

@peregilk
Copy link
Contributor Author

@StephennFernandes Here is a link to a convenience script that I am using for creating the PyTorch and TF models.

https://github.com/peregilk/north-t5/blob/main/create_pytorch_tf_and_vocab.py

Do not expect it to run directly though. It was really not meant for the public. However, it should give you the basic idea about how to load the models and then save them in the correct format.

@StephennFernandes
Copy link

@peregilk , thanks for sharing. actually the link isnt available, apparently i believe its private. could you please check and confirm.

@peregilk
Copy link
Contributor Author

peregilk commented Jun 20, 2022

@StephennFernandes Sorry about that. Now it is public.

As a side note, especially to @patrickvonplaten: Wouldnt it be nice to put a wrapper around the great script that @stefan-it have made. A script that also loads the models in HuggingFace and saves them in PyTorch and TF format, as well as creates the necessary tokenizers. Maybe it can even copy over the training-logs that are saved in the t5x-checkpoint directory. I have done this manually on these models: https://huggingface.co/north/t5_large_NCC. As you see, the tensorboard logs from t5x integrates nicely with the Training Metrics in HF.

@patrickvonplaten
Copy link
Contributor

I think this would indeed be a great idea! Maybe we can open a T5X folder under https://github.com/huggingface/transformers/tree/main/examples/research_projects with lots of functionality for conversion ?

@StephennFernandes
Copy link

StephennFernandes commented Sep 9, 2022

@stefan-it @patrickvonplaten
hey were you able to convert the scalable_t5 models ?

actualy i have pretrained a mt5-base t5x/examples/scalable_t5/mt5/base.gin using t5x

But i am unable to convert it to huggingface. i tried several huggingface config.json files from the t5-efficient-base but none-of them worked.

the following is my error when converting:

convert_t5x_checkpoint_to_flax(args.t5x_checkpoint_path, args.config_name, args.flax_dump_folder_path)
  File "/home/stephen/Desktop/mt5_finetuning_preliminary_tests/t5x_to_hf.py", line 12, in convert_t5x_checkpoint_to_flax
    split_mlp_wi = "wi_0" in t5x_model["target"]["encoder"]["layers_0"]["mlp"]
KeyError: 'layers_0'

@stefan-it
Copy link
Collaborator

stefan-it commented Sep 12, 2022

Hi @StephennFernandes ,

really interesting, I haven't tried it with the Scaled T5X models yet (Those efficient T5 models that can be found on the Model Hub are converted from the TensorFlow checkpoints, because they are trained with the official T5 implementation and not with T5X).

Please give me some time to investigate that :)

@joytianya
Copy link

Does this script support the transformation of XL or XXL models?

@peregilk
Copy link
Contributor Author

peregilk commented Dec 1, 2022

@joytianya I have been using this script a lot for converting both XL and XXL models. Works fine.

@joytianya
Copy link

joytianya commented Dec 1, 2022

@peregilk thank your answer.

I tried it and generated the following files in /content/flan_t5x_xl_exported, and then I used this below code (T5ForConditionalGeneration) to load the dir and happen error(Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found). How do I solve it?

model = T5ForConditionalGeneration.from_pretrained("/content/flan_t5x_xl_exported", from_flax=True)
# Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in 
# directory /content/flan_t5x_xl_exported.

/content/flan_t5x_xl_exported:
"
*model-00001-of-00002.msgpack
*model-00002-of-00002.msgpack
*model.msgpack.index.json
config.json
"

@joytianya
Copy link

@stefan-it
@peregilk
Does the script support T5X converted into pytorch?
if not, Is there any other solution?

@peregilk
Copy link
Contributor Author

peregilk commented Dec 1, 2022

@joytianya Try open the files here: https://huggingface.co/north/t5_xl_NCC. All these are converted using the script written by @stefan-it. Note that the large PyTorch files are split into multiple smaller files.

@joytianya
Copy link

@peregilk
Thank you for your reply
I want to convert my finetuned model into pt,
In addition, when I use scripts to convert t5x to flax, xl and xxl are divided into multiple files. Can they not be divided into multiple files or merge them to a single file?

@peregilk
Copy link
Contributor Author

peregilk commented Dec 2, 2022

@joytianya. I do not think this splitting really is related to the conversion script that @stefan-it wrote. Transformers does this automatically with large files.

@joytianya
Copy link

ok, thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants