Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Fix default tokenizer for HuggingFace Agents #4508

Merged
merged 1 commit into from
Apr 27, 2022

Conversation

Rebecca-Qian
Copy link
Contributor

@Rebecca-Qian Rebecca-Qian commented Apr 23, 2022

Patch description
Updates the HuggingFace agent to over-ride is_prebuilt to return True, since HuggingFace agents build their own dictionaries.

Using the default tokenizer setting with T5 caused the exception ValueError('Dictionaries should be pre-built before distributed train.')

Testing steps

parlai mp_train -m parlai.agents.hugging_face.t5:T5Agent -mf /tmp/model_file -t jsonfile --jsonfile-datapath $PATH --fp16 true --t5-model-parallel False --ddp-backend zero2 --jsonfile-datatype-extension True

Copy link
Contributor

@klshuster klshuster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for this!!

Copy link
Contributor

@stephenroller stephenroller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay

@Rebecca-Qian Rebecca-Qian merged commit 137afdd into main Apr 27, 2022
@Rebecca-Qian Rebecca-Qian deleted the rebeccaqian/fix_huggingface_tokenizer branch April 27, 2022 19:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants