-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Seamless M4T model #25693
Merged
Merged
Add Seamless M4T model #25693
Changes from 37 commits
Commits
Show all changes
250 commits
Select commit
Hold shift + click to select a range
fb7d0ab
first raw commit
ylacombe 48da0bf
still POC
ylacombe 2c493a5
tentative convert script
ylacombe ef5106d
almost working speech encoder conversion scripts
ylacombe d83ea6b
intermediate code for encoder/decoders
ylacombe f0bc513
add modeling code
ylacombe 70661ae
first version of speech encoder
ylacombe 3874353
make style
ylacombe c37b7bd
add new adapter layer architecture
ylacombe 3ded19b
add adapter block
ylacombe 0bf81fd
add first tentative config
ylacombe 4bbf681
add working speech encoder conversion
ylacombe e54bdd5
base model convert works now
ylacombe 0de52f7
make style
ylacombe ca7a980
remove unnecessary classes
ylacombe aac2a34
remove unecessary functions
ylacombe 3735b07
add modeling code speech encoder
ylacombe 66225db
rework logics
ylacombe 41a826f
forward pass of sub components work
ylacombe ae3a7e0
add modeling codes
ylacombe 4f29e2e
some config modifs and modeling code modifs
ylacombe 4692f59
save WIP
ylacombe 451d11e
new edits
ylacombe 319333e
same output speech encoder
ylacombe 5928c72
correct attention mask
ylacombe 8bd3a17
correct attention mask
ylacombe 0342b69
fix generation
ylacombe 09331ac
new generation logics
ylacombe b20f23b
erase comments
ylacombe 74d06c1
make style
ylacombe 38446d5
fix typo
ylacombe 67cf10e
add some descriptions
ylacombe 8568cfb
new state
ylacombe 5fed5c0
clean imports
ylacombe 66920c1
add tests
ylacombe b6a5368
make style
ylacombe 6909a02
make beam search and num_return_sequences>1 works
ylacombe c96f127
correct edge case issue
ylacombe f525f24
correct SeamlessM4TConformerSamePadLayer copied from
ylacombe 850990b
replace ACT2FN relu by nn.relu
ylacombe c8d00ea
remove unecessary return variable
ylacombe fc031e4
move back a class
ylacombe 6ca23e3
change name conformer_attention_mask ->conv_attention_mask
ylacombe 8a907ce
better nit code
ylacombe f9ae3ac
add some Copied from statements
ylacombe b5a33fc
small nits
ylacombe ab97f67
small nit in dict.get
ylacombe 88d1d76
rename t2u model -> conditionalgeneration
ylacombe ffafd66
ongoing refactoring of structure
ylacombe 66ded60
update models architecture
ylacombe 3fb3100
remove SeamlessM4TMultiModal classes
ylacombe bf81144
add tests
ylacombe d0310af
adapt tests
ylacombe 5226aac
some non-working code for vocoder
ylacombe 4b470ea
add seamlessM4T vocoder
ylacombe 8bf0e37
remove buggy line
ylacombe 1e48bc7
fix some hifigan related bugs
ylacombe 42eb3e2
remove hifigan specifc config
ylacombe e0d8eb9
change
ylacombe ae11f30
add WIP tokenization
ylacombe 7fa366d
add seamlessM4T working tokenzier
ylacombe aef9ac3
update tokenization
ylacombe 75099dd
add tentative feature extractor
ylacombe c97a7a7
Update converting script
ylacombe a82f7b3
update working FE
ylacombe 9786302
refactor input_values -> input_features
ylacombe 837e160
update FE
ylacombe 9e2ea89
changes in generation, tokenizer and modeling
ylacombe 6a8bd6f
make style and add t2u_decoder_input_ids
ylacombe c676019
add intermediate outputs for ToSpeech models
ylacombe 5894115
add vocoder to speech models
ylacombe a9ad3dc
update valueerror
ylacombe 03915d7
update FE with languages
ylacombe 0ebc542
add vocoder convert
ylacombe f6d5e7c
update config docstrings and names
ylacombe 02b2ba4
update generation code and configuration
ylacombe 82acf95
remove todos and update config.pad_token_id to generation_config.pad_…
ylacombe 7f447b6
move block vocoder
ylacombe 75230e4
remove unecessary code and uniformize tospeech code
ylacombe e2c4a68
add feature extractor import
ylacombe 87ed6bc
make style and fix some copies from
ylacombe a1cffc2
correct consistency + make fix-copies
ylacombe f540155
add processor code
ylacombe da17767
remove comments
ylacombe ec4b204
add fast tokenizer support
ylacombe 4a8c7af
correct pad_token_id in M4TModel
ylacombe e91c55b
correct config
ylacombe b6e0bc8
update tests and codes + make style
ylacombe 5c1df1f
make some suggested correstion - correct comments and change naming
ylacombe e92c64e
rename some attributes
ylacombe 46d6085
rename some attributes
ylacombe d26e04e
remove unecessary sequential
ylacombe f490ac1
remove option to use dur predictor
ylacombe 3384612
nit
ylacombe 69d5508
refactor hifigan
ylacombe c45fe50
replace normalize_mean and normalize_var with do_normalize + save lan…
ylacombe 7c0d981
add tests
ylacombe c2e3547
Merge branch 'main' into add-S2S-model
ylacombe 2d59fa0
change tgt_lang logic
ylacombe 7173baa
update generation ToSpeech
ylacombe f1a38f7
add support import SeamlessM4TProcessor
ylacombe 305e16c
fix generate
ylacombe 067d918
make tests
ylacombe c4fb4ce
update integration tests, add option to only return text and update t…
ylacombe 7d39862
fix wrong function call
ylacombe d177e01
update import and convert script
ylacombe a85ae94
update integration tests + update repo id
ylacombe f662725
correct paths and add first test
ylacombe 47c0bc5
update how new attention masks are computed
ylacombe 8060aa4
update tests
ylacombe cd3878b
take first care of batching in vocoder code
ylacombe bbb398d
add batching with the vocoder
ylacombe 808366f
add waveform lengths to model outputs
ylacombe d96eba5
make style
ylacombe aeb1a67
add generate kwargs + forward kwargs of M4TModel
ylacombe e62d681
add docstrings forward methods
ylacombe 1d68419
reformate docstrings
ylacombe ea08dc3
add docstrings t2u model
ylacombe 9e8a8b8
add another round of modeling docstrings + reformate speaker_id -> sp…
ylacombe 7c65688
make style
ylacombe 7779477
fix check_repo
ylacombe 7f613ae
make style
ylacombe b804e3d
add seamlessm4t to toctree
ylacombe 6af3b28
correct check_config_attributes
ylacombe cd9e2b4
write config docstrings + some modifs
ylacombe dff8d8f
make style
ylacombe a046830
add docstrings tokenizer
ylacombe 703863a
add docstrings to processor, fe and tokenizers
ylacombe 02cc3e7
make style
ylacombe 8128c66
write first version of model docs
ylacombe e08c86f
fix FE + correct FE test
ylacombe 1bee27d
fix tokenizer + add correct integration tests
ylacombe 22edd86
fix most tokenization tests
ylacombe 22edbb1
make style
ylacombe 9087bcf
correct most processor test
ylacombe da31ddb
add generation tests and fix num_return_sequences > 1
ylacombe a2d4f7f
correct integration tests -still one left
ylacombe 548e79a
make style
ylacombe 31a8ea9
correct position embedding
ylacombe 5d6caba
change numbeams to 1
ylacombe b9deb48
refactor some modeling code and correct one test
ylacombe 43b92cd
make style
ylacombe 1d35ba4
correct typo
ylacombe ad1e476
refactor intermediate fnn
ylacombe b5967c1
refactor feedforward conformer
ylacombe 0f2682d
make style
ylacombe a1d9238
remove comments
ylacombe 95aefed
make style
ylacombe 872789f
fix tokenizer tests
ylacombe f50ff49
make style
ylacombe b0ee7e1
correct processor tests
ylacombe 61e880a
make style
ylacombe 95e8c85
correct S2TT integration
ylacombe 8220a9e
Apply suggestions from Sanchit code review
ylacombe 816559d
correct typo
ylacombe 60b8755
replace torch.nn->nn + make style
ylacombe 286960b
change Output naming (waveforms -> waveform) and ordering
ylacombe 411d5bd
nit renaming and formating
ylacombe c8afa46
remove return None when not necessary
ylacombe 8c407b1
refactor SeamlessM4TConformerFeedForward
ylacombe 25a83ef
nit typo
ylacombe 771f988
remove almost copied from comments
ylacombe 6add43a
add a copied from comment and remove an unecessary dropout
ylacombe fb85bb4
remove inputs_embeds from speechencoder
ylacombe 82123b7
remove backward compatibiliy function
ylacombe 7c04630
reformate class docstrings for a few components
ylacombe f02a3cb
remove unecessary methods
ylacombe 7475a9f
split over 2 lines smthg hard to read
ylacombe 19c5700
make style
ylacombe f7724ed
replace two steps offset by one step as suggested
ylacombe e1ace1a
nice typo
ylacombe 4effd11
move warnings
ylacombe bf52c78
remove useless lines from processor
ylacombe d10fb09
make generation non-standard test more robusts
ylacombe 5cb8df6
remove torch.inference_mode from tests
ylacombe 24038ed
split integration tests
ylacombe 35951a7
enrich md
ylacombe 506fd19
rename control_symbol_vocoder_offset->vocoder_offset
ylacombe bfab469
clean convert file
ylacombe 4fc1f0f
remove tgt_lang and src_lang from FE
ylacombe 415f674
change generate docstring of ToText models
ylacombe f69314c
update generate docstring of tospeech models
ylacombe 1d4ce12
unify how to deal withtext_decoder_input_ids
ylacombe dde7de0
add default spkr_id
ylacombe d6994c3
unify tgt_lang for t2u_model
ylacombe 46efba8
simplify tgt_lang verification
ylacombe 8b82f20
remove a todo
ylacombe a0e00a6
change config docstring
ylacombe 4ead78c
make style
ylacombe ada4824
simplify t2u_tgt_lang_id
ylacombe a0897f1
make style
ylacombe 5b2367d
enrich/correct comments
ylacombe eb597c9
enrich .md
ylacombe c7ec3ce
correct typo in docstrings
ylacombe 1af4ee1
add torchaudio dependency
ylacombe 4138711
Merge branch 'huggingface:main' into add-S2S-model
ylacombe ded425c
update tokenizer
ylacombe a527ed0
make style and fix copies
ylacombe 39a8265
modify SeamlessM4TConverter with new tokenizer behaviour
ylacombe d0f82f4
make style
ylacombe 57b5ad4
correct small typo docs
ylacombe 3785ebe
fix import
ylacombe d094293
update docs and add requirement to tests
ylacombe 6b41584
Merge branch 'main' into add-S2S-model
ylacombe 273dd9e
add convert_fairseq2_to_hf in utils/not_doctested.txt
ylacombe a10ff31
Merge branch 'huggingface:main' into add-S2S-model
ylacombe faae35d
update FE
ylacombe 4e7ea18
fix imports and make style
ylacombe d9a35a3
remove torchaudio in FE test
ylacombe ce126eb
add seamless_m4t.md to utils/not_doctested.txt
ylacombe cb4ccf7
nits and change the way docstring dataset is loaded
ylacombe 0a1bdd4
move checkpoints from ylacombe/ to facebook/ orga
ylacombe 63a01ad
refactor warning/error to be in the 119 line width limit
ylacombe b1f375b
round overly precised floats
ylacombe a28f6a2
add stereo audio behaviour
ylacombe b32bcd2
refactor .md and make style
ylacombe e9cb1a4
enrich docs with more precised architecture description
ylacombe 1b310fc
readd undocumented models
ylacombe c4b70fd
Merge branch 'main' into add-S2S-model
ylacombe 0772b68
make fix-copies
ylacombe 9c47abd
apply some suggestions
ylacombe 782c8e3
Apply suggestions from code review
ylacombe 4257721
correct bug from previous commit
ylacombe 102a448
refactor a parameter allowing to clean the code + some small nits
ylacombe fe9ceca
clean tokenizer
ylacombe a68ff89
make style and fix
ylacombe cc4fbfb
make style
ylacombe 15c5bce
clean tokenizers arguments
ylacombe 071532f
add precisions for some tests
ylacombe 789f421
move docs from not_tested to slow
ylacombe 48b3488
modify tokenizer according to last comments
ylacombe ebee245
add copied from statements in tests
ylacombe 87a5886
correct convert script
ylacombe e4685cb
Merge branch 'huggingface:main' into add-S2S-model
ylacombe cad5136
correct parameter docstring style
ylacombe a4f437d
correct tokenization
ylacombe c367cb9
correct multi gpus
ylacombe b137431
make style
ylacombe 8c7f5a4
clean modeling code
ylacombe 22aca15
make style
ylacombe b0e2626
add copied from statements
ylacombe bec7235
add copied statements
ylacombe 14c4d4a
add support with ASR pipeline
ylacombe 121187a
remove file added inadvertently
ylacombe 0563778
fix docstrings seamlessM4TModel
ylacombe 7620fd6
add seamlessM4TConfig to OBJECTS_TO_IGNORE due of unconventional mark…
ylacombe 79dda0a
Merge branch 'huggingface:main' into add-S2S-model
ylacombe e65cf14
add seamlessm4t to assisted generation ignored models
ylacombe 8682fc1
Merge branch 'huggingface:main' into add-S2S-model
ylacombe 6369fd6
Merge branch 'huggingface:main' into add-S2S-model
ylacombe File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
<!--Copyright 2022 The HuggingFace Team. All rights reserved. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
the License. You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations under the License. | ||
--> | ||
|
||
# SeamlessM4T | ||
|
||
## Overview | ||
|
||
The SeamlessM4T model was proposed in [<INSERT PAPER NAME HERE>](<INSERT PAPER LINK HERE>) by <INSERT AUTHORS HERE>. <INSERT SHORT SUMMARY HERE> | ||
|
||
ylacombe marked this conversation as resolved.
Show resolved
Hide resolved
|
||
The abstract from the paper is the following: | ||
|
||
*<INSERT PAPER ABSTRACT HERE>* | ||
|
||
Tips: | ||
|
||
<INSERT TIPS ABOUT MODEL HERE> | ||
|
||
This model was contributed by [INSERT YOUR HF USERNAME HERE](<https://huggingface.co/<INSERT YOUR HF USERNAME HERE>). The original code can be found [here](<INSERT LINK TO GITHUB REPO HERE>). | ||
|
||
## SeamlessM4TConfig | ||
|
||
[[autodoc]] SeamlessM4TConfig | ||
|
||
|
||
## SeamlessM4TTokenizer | ||
|
||
[[autodoc]] SeamlessM4TTokenizer | ||
- build_inputs_with_special_tokens | ||
- get_special_tokens_mask | ||
- create_token_type_ids_from_sequences | ||
- save_vocabulary | ||
|
||
|
||
## SeamlessM4TTokenizerFast | ||
|
||
[[autodoc]] SeamlessM4TTokenizerFast | ||
|
||
|
||
## SeamlessM4TModel | ||
|
||
[[autodoc]] SeamlessM4TModel | ||
- forward | ||
|
||
|
||
## SeamlessM4TForTextToSpeech | ||
|
||
[[autodoc]] SeamlessM4TForTextToSpeech | ||
- forward | ||
|
||
|
||
## SeamlessM4TForSpeechToSpeech | ||
|
||
[[autodoc]] SeamlessM4TForSpeechToSpeech | ||
- forward | ||
|
||
|
||
## SeamlessM4TForTextToText | ||
|
||
[[autodoc]] transformers.SeamlessM4TForTextToText | ||
- forward | ||
|
||
## SeamlessM4TForSpeechToText | ||
|
||
[[autodoc]] transformers.SeamlessM4TForSpeechToText | ||
- forward |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -171,6 +171,7 @@ | |
roformer, | ||
rwkv, | ||
sam, | ||
seamless_m4t, | ||
segformer, | ||
sew, | ||
sew_d, | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
# Copyright 2020 The HuggingFace Team. All rights reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
from typing import TYPE_CHECKING | ||
|
||
from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_tokenizers_available, is_torch_available | ||
|
||
|
||
_import_structure = { | ||
"configuration_seamless_m4t": ["SEAMLESS_M4T_PRETRAINED_CONFIG_ARCHIVE_MAP", "SeamlessM4TConfig"], | ||
"tokenization_seamless_m4t": ["SeamlessM4TTokenizer"], | ||
} | ||
|
||
try: | ||
if not is_tokenizers_available(): | ||
raise OptionalDependencyNotAvailable() | ||
except OptionalDependencyNotAvailable: | ||
pass | ||
else: | ||
_import_structure["tokenization_seamless_m4t_fast"] = ["SeamlessM4TTokenizerFast"] | ||
|
||
try: | ||
if not is_torch_available(): | ||
raise OptionalDependencyNotAvailable() | ||
except OptionalDependencyNotAvailable: | ||
pass | ||
else: | ||
_import_structure["modeling_seamless_m4t"] = [ | ||
"SEAMLESS_M4T_PRETRAINED_MODEL_ARCHIVE_LIST", | ||
"SeamlessM4TForTextToSpeech", | ||
"SeamlessM4TForSpeechToSpeech", | ||
"SeamlessM4TForTextToText", | ||
"SeamlessM4TForSpeechToText", | ||
"SeamlessM4TModel", | ||
"SeamlessM4TPreTrainedModel", | ||
] | ||
|
||
|
||
if TYPE_CHECKING: | ||
from .configuration_seamless_m4t import SEAMLESS_M4T_PRETRAINED_CONFIG_ARCHIVE_MAP, SeamlessM4TConfig | ||
from .tokenization_seamless_m4t import SeamlessM4TTokenizer | ||
|
||
try: | ||
if not is_tokenizers_available(): | ||
raise OptionalDependencyNotAvailable() | ||
except OptionalDependencyNotAvailable: | ||
pass | ||
else: | ||
from .tokenization_seamless_m4t_fast import SeamlessM4TTokenizerFast | ||
|
||
try: | ||
if not is_torch_available(): | ||
raise OptionalDependencyNotAvailable() | ||
except OptionalDependencyNotAvailable: | ||
pass | ||
else: | ||
from .modeling_seamless_m4t import ( | ||
SEAMLESS_M4T_PRETRAINED_MODEL_ARCHIVE_LIST, | ||
SeamlessM4TForSpeechToSpeech, | ||
SeamlessM4TForSpeechToText, | ||
SeamlessM4TForTextToSpeech, | ||
SeamlessM4TForTextToText, | ||
SeamlessM4TModel, | ||
SeamlessM4TPreTrainedModel, | ||
) | ||
|
||
|
||
else: | ||
import sys | ||
|
||
sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nice!