Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting a composer seq2seq t5 model throws an exception #754

Open
timsteuer opened this issue Nov 21, 2023 · 3 comments
Open

Converting a composer seq2seq t5 model throws an exception #754

timsteuer opened this issue Nov 21, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@timsteuer
Copy link

Environment

  • llm-foundry: latest

To reproduce

Steps to reproduce the behavior:

  1. train a hf_t5 model
  2. download the composer checkpoint
  3. try to convert it back to huggingface via scripts/inference/convert_composer_to_hf.py
  4. The script crashes when trying to load the saved model as AutoModelForCausalLM

Expected behavior

The model is saved as a HuggingFace snapshot without any issue

Additional context

Locally, I fixed this via simply loading with AutoModel and not via AutoModelForCausalLM.
I guess this is fine.

@timsteuer timsteuer added the bug Something isn't working label Nov 21, 2023
@dakinggg
Copy link
Collaborator

Ah yes, that script only support causal lms right now. A note on your solution, I'm not certain, but AutoModel here may give you a T5Model rather than a T5ForConditionalGeneration as you may want. Probably worth double checking that.

@timsteuer
Copy link
Author

That was an interesting hint.

Just double checked and the model was indeed marked as a T5Model and not as a T5ForConditionalGeneration.

So I changed that in the conversion script, such that it yields the right config. However, loading the final model via AutoModel still results in a T5Model even though the config now explicitly states the correct model type.

On the other hand, if I load via AutoModelForSeq2SeqLM it loads the lm_head. So, I guess that is a HF specific thing and not related to the conversion script per se.

@dakinggg
Copy link
Collaborator

Yeah, AutoModel generally gives you the backbone model, while the AutoModelForXYZ will give you the model with adaptation/head for XYZ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants