-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems when converting fairseq model to hf format #28174
Comments
cc @ylacombe as well for reference |
Hey @upskyy, thanks for opening this issue, this is very clear and in line with #28165 which converts a model from seamless communication and fairseq2. We are supposed to have integration tests making sure that the two implementations have the same results, but they may very well be outdated or specific to certain wav2vec model. Regarding your issues, could you provide the model that you are testing and a script that shows how to replicate the fact that results are different ? Regarding issue 1, we'd have to make sure that the case in which Regarding issue 2, #28165 adds Thanks again! |
@ylacombe Thanks for your reply. So should I just wait for #28165 PR to merge? Thanks : ) |
I posted a PR, please check it. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
System Info
transformers
version: 4.37.0.dev0Who can help?
@sanchit-gandhi
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Thanks for releasing this awesome repo.
Issue 1
I am converting the fairseq checkpoint to huggingface format (wav2vec2_conformer). Converting is no problem, but the results are different.
I did some debugging and found something different from the fairseq implementation.
In fairseq, if the convolution subsampling dimension and encoder dimension are the same,
nn.Linear
is not used, but hf is used unconditionally, so there is a problem of using random weights.fairseq
https://github.com/facebookresearch/fairseq/blob/main/fairseq/models/wav2vec/wav2vec2.py#L324-L328
huggingface
https://github.com/huggingface/transformers/blob/main/src/transformers/models/wav2vec2_conformer/modeling_wav2vec2_conformer.py#L536
I think this is right.
Issue 2
Also, fairseq performs layer norm before entering the conformer encoder, but huggingface is supposed to perform layer norm after the conformer encoder without any options. Can this be handled as an option? I think the results change because of this.
fairseq
https://github.com/facebookresearch/fairseq/blob/main/fairseq/models/wav2vec/wav2vec2.py#L1230-L1231
huggingface
https://github.com/huggingface/transformers/blob/main/src/transformers/models/wav2vec2_conformer/modeling_wav2vec2_conformer.py#L929
Expected behavior
How do you think about this problem?
If modifications are possible, I can proceed with the PR by including a converting script including the fairseq extension.
The text was updated successfully, but these errors were encountered: