Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SpeechEncoderDecoder & Speech2Text2 #13186

Merged
Merged
Show file tree
Hide file tree
Changes from 51 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
f7197df
fix_torch_device_generate_test
patrickvonplaten May 19, 2021
5f70018
remove @
patrickvonplaten May 19, 2021
f6d1d34
X::qxX
May 21, 2021
02e5b56
Merge branch 'master' of https://github.com/huggingface/transformers
May 21, 2021
2ade5e3
Merge branch 'master' of https://github.com/huggingface/transformers
patrickvonplaten May 25, 2021
12dc58e
:wqa:Merge branch 'master' of https://github.com/huggingface/transfor…
May 25, 2021
b18ef83
Merge branch 'master' of https://github.com/huggingface/transformers
May 25, 2021
a941700
Merge branch 'master' of https://github.com/huggingface/transformers
patrickvonplaten May 26, 2021
34adbd9
Merge branch 'master' of https://github.com/huggingface/transformers
May 28, 2021
00967fa
Merge branch 'master' of https://github.com/huggingface/transformers
Jun 3, 2021
aea2f96
Merge branch 'master' of https://github.com/huggingface/transformers
Jun 3, 2021
b947a5c
Merge branch 'master' of https://github.com/huggingface/transformers
patrickvonplaten Jun 9, 2021
71079aa
Merge branch 'master' of https://github.com/huggingface/transformers
Jun 13, 2021
75eaa47
:wMerge branch 'master' of https://github.com/huggingface/transformers
Jun 24, 2021
11b5a4c
Merge branch 'master' of https://github.com/huggingface/transformers
Jul 1, 2021
2ba6f5c
Merge branch 'master' of https://github.com/huggingface/transformers
Jul 1, 2021
f956ea1
:wqa: Merge branch 'master' of https://github.com/huggingface/transfo…
Jul 5, 2021
8c4570c
Merge branch 'master' of https://github.com/huggingface/transformers
Jul 8, 2021
a5cea4b
Merge branch 'master' of https://github.com/huggingface/transformers
Jul 9, 2021
14ab959
Merge branch 'master' of https://github.com/huggingface/transformers
Jul 13, 2021
e871407
Merge branch 'master' of https://github.com/huggingface/transformers
Jul 15, 2021
98c2245
Merge branch 'master' of https://github.com/huggingface/transformers
Aug 5, 2021
5bc06a5
Merge branch 'master' of https://github.com/huggingface/transformers
Aug 6, 2021
dfda5e2
Merge branch 'master' of https://github.com/huggingface/transformers
patrickvonplaten Aug 12, 2021
3d03cc3
Merge branch 'master' of https://github.com/patrickvonplaten/transfor…
patrickvonplaten Aug 12, 2021
14b0b94
Merge branch 'master' of https://github.com/huggingface/transformers
patrickvonplaten Aug 15, 2021
4c6354e
up
Aug 19, 2021
5a075c9
correct some bugs
Aug 19, 2021
19106d1
correct model
Aug 20, 2021
ddbb5ae
finish speech2text extension
patrickvonplaten Aug 20, 2021
a80511c
up
patrickvonplaten Aug 24, 2021
3bfc3a9
up
patrickvonplaten Aug 24, 2021
7ae54e4
up
patrickvonplaten Aug 24, 2021
5c85e11
up
patrickvonplaten Aug 24, 2021
a127254
Update utils/custom_init_isort.py
patrickvonplaten Aug 24, 2021
d04fe22
up
patrickvonplaten Aug 24, 2021
e24a659
up
patrickvonplaten Aug 24, 2021
03f83fb
update with tokenizer
patrickvonplaten Aug 25, 2021
7dad652
correct old tok
patrickvonplaten Aug 25, 2021
88fce43
correct old tok
patrickvonplaten Aug 25, 2021
782462e
fix bug
patrickvonplaten Aug 25, 2021
54ccc61
up
patrickvonplaten Aug 25, 2021
ea839ee
up
patrickvonplaten Aug 25, 2021
b3312dc
merge conflict
patrickvonplaten Aug 26, 2021
f772b2b
merge conflict
patrickvonplaten Aug 26, 2021
557dbdd
add more tests
patrickvonplaten Aug 26, 2021
e6352a6
up
patrickvonplaten Aug 26, 2021
3421ae8
fix docs
patrickvonplaten Aug 26, 2021
ef58aaf
Merge branch 'master' of https://github.com/huggingface/transformers …
patrickvonplaten Aug 26, 2021
3017421
up
patrickvonplaten Aug 26, 2021
571b50b
Merge branch 'master' of https://github.com/huggingface/transformers …
patrickvonplaten Aug 26, 2021
e584248
fix some more tests
patrickvonplaten Aug 26, 2021
dcb9a38
add better config
patrickvonplaten Aug 26, 2021
f25a8c4
correct some more things
patrickvonplaten Aug 26, 2021
e9e6efd
fix tests
patrickvonplaten Aug 26, 2021
348138f
Merge branch 'master' of https://github.com/huggingface/transformers …
patrickvonplaten Aug 26, 2021
aedd676
improve docs
patrickvonplaten Aug 26, 2021
7185f7e
Apply suggestions from code review
patrickvonplaten Aug 26, 2021
f23e257
Apply suggestions from code review
patrickvonplaten Aug 26, 2021
745da1b
final fixes
patrickvonplaten Aug 26, 2021
b54eaf6
Merge branch 'add_s2t_wav2vec2' of https://github.com/patrickvonplate…
patrickvonplaten Aug 26, 2021
b30bcaa
finalize
patrickvonplaten Aug 26, 2021
3ab86f5
Apply suggestions from code review
patrickvonplaten Sep 1, 2021
5fbe61a
apply suggestions Lysandre and Sylvain
patrickvonplaten Sep 1, 2021
5fd96f6
merge conflict
patrickvonplaten Sep 1, 2021
652dbd8
apply nicos suggestions
patrickvonplaten Sep 1, 2021
ef9e0f7
Merge branch 'master' into add_s2t_wav2vec2
patrickvonplaten Sep 1, 2021
e39a2ac
upload everything
patrickvonplaten Sep 1, 2021
ae346c9
finish
patrickvonplaten Sep 1, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -416,6 +416,8 @@ Flax), PyTorch, and/or TensorFlow.
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Speech2Text | ✅ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Speech2Text2 | ✅ | ❌ | ❌ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Splinter | ✅ | ✅ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| SqueezeBERT | ✅ | ✅ | ✅ | ❌ | ❌ |
Expand Down Expand Up @@ -575,7 +577,9 @@ Flax), PyTorch, and/or TensorFlow.
model_doc/retribert
model_doc/roberta
model_doc/roformer
model_doc/speechencoderdecoder
model_doc/speech_to_text
model_doc/speech_to_text_2
model_doc/splinter
model_doc/squeezebert
model_doc/t5
Expand Down
123 changes: 123 additions & 0 deletions docs/source/model_doc/speech_to_text_2.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
..
Copyright 2021 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

Speech2Text2
-----------------------------------------------------------------------------------------------------------------------

Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The Speech2Text2 model is used together with :doc:`Wav2Vec2 <wav2vec2>` for Speech Translation models proposed in
`Large-Scale Self- and Semi-Supervised Learning for Speech Translation <https://arxiv.org/abs/2104.06678>`__ by
Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.

Speech2Text2 is a *decoder-only* transformer model that can be used with any speech *encoder-only*, such as
:doc:`Wav2Vec2 <wav2vec2>` or :doc:`HuBERT <hubert>` for Speech-to-Text tasks. Please refer to the
:doc:`SpeechEncoderDecoder <speechencoderdecoder>` class on how to combine Speech2Text2 with any speech *encoder-only*
model.

This model was contributed by `Patrick von Platen <https://huggingface.co/patrickvonplaten>`__.

The original code can be found `here
<https://github.com/pytorch/fairseq/blob/1f7ef9ed1e1061f8c7f88f8b94c7186834398690/fairseq/models/wav2vec/wav2vec2_asr.py#L266>`__.


Tips:

- Speech2Text2 achieves state-of-the-art results on the CoVoST Speech Translation dataset. For more information, see
the `official models <https://huggingface.co/models?other=speech2text2>`__ .
- Speech2Text2 is always used within the :doc:`SpeechEncoderDecoder <speechencoderdecoder>` framework.
- Speech2Text2's tokenizer currently only supports inference, but not training.

Inference
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Speech2Text2's :class:`~transformers.SpeechEncoderDecoderModel` model accepts raw waveform input values from speech and
makes use of :func:`~transformers.generation_utils.GenerationMixin.generate` to translate the input speech
autoregressively to the target language.

The :class:`~transformers.Wav2Vec2FeatureExtractor` class is responsible for preprocessing the input speech and
:class:`~transformers.Speech2Text2Tokenizer` decodes the generated target tokens to the target string. The
:class:`~transformers.Speech2Text2Processor` wraps :class:`~transformers.Wav2Vec2FeatureExtractor` and
:class:`~transformers.Speech2Text2Tokenizer` into a single instance to both extract the input features and decode the
predicted token ids.

- Step-by-step Speech Translation

.. code-block::

>>> import torch
>>> from transformers import Speech2Text2Processor, SpeechEncoderDecoder
>>> from datasets import load_dataset
>>> import soundfile as sf

>>> model = SpeechEncoderDecoder.from_pretrained("facebook/s2t-wav2vec2-large-en-de")
>>> processor = Speech2Text2Processor.from_pretrained("facebook/s2t-wav2vec2-large-en-de")

>>> def map_to_array(batch):
... speech, _ = sf.read(batch["file"])
... batch["speech"] = speech
... return batch

>>> ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
>>> ds = ds.map(map_to_array)

>>> inputs = processor(ds["speech"][0], sampling_rate=16_000, return_tensors="pt")
>>> generated_ids = model.generate(input_ids=inputs["input_features"], attention_mask=inputs["attention_mask"])

>>> transcription = processor.batch_decode(generated_ids)


- Speech Translation via Pipelines

The automatic speech recognition pipeline can also be used to translate speech in just a couple lines of code

.. code-block::

>>> from datasets import load_dataset
>>> from transformers import pipeline

>>> librispeech_en = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
>>> asr = pipeline("automatic-speech-recognition", model="facebook/s2t-wav2vec2-large-en-de", feature_extractor="facebook/s2t-wav2vec2-large-en-de")

>>> translation_de = asr(librispeech_en[0]["file"])


See `model hub <https://huggingface.co/models?filter=speech2text2>`__ to look for Speech2Text2 checkpoints.


Speech2Text2Config
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.Speech2Text2Config
:members:


Speech2TextTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.Speech2Text2Tokenizer
:members: batch_decode, decode, save_vocabulary


Speech2Text2Processor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.Speech2Text2Processor
:members: __call__, from_pretrained, save_pretrained, batch_decode, decode, as_target_processor


Speech2Text2ForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.Speech2Text2ForCausalLM
:members: forward
33 changes: 33 additions & 0 deletions docs/source/model_doc/speechencoderdecoder.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
..
Copyright 2021 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

Speech Encoder Decoder Models
-----------------------------------------------------------------------------------------------------------------------

The :class:`~transformers.SpeechEncoderDecoderModel` can be used to initialize a speech-sequence-to-text-sequence model
with any pretrained speech autoencoding model as the encoder (*e.g.* :doc:`Wav2Vec2 <wav2vec2>`, :doc:`Hubert <hubert>`
patrickvonplaten marked this conversation as resolved.
Show resolved Hide resolved
and any pretrained autoregressive model as the decoder.

The effectiveness of initializing speech-sequence-to-text-sequence models with pretrained checkpoints for speech
recognition and speech translation has *e.g.* been shown in `Large-Scale Self- and Semi-Supervised Learning for Speech
Translation <https://arxiv.org/abs/2104.06678>`__ by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli,
Alexis Conneau.

An example of how to use a :class:`~transformers.SpeechEncoderDecoderModel` for inference can be seen in
:doc:`Speech2Text2 <speech_to_text_2>`.


SpeechEncoderDecoderModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.SpeechEncoderDecoderModel
:members: forward, from_encoder_decoder_pretrained
16 changes: 16 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,12 @@
"SPEECH_TO_TEXT_PRETRAINED_CONFIG_ARCHIVE_MAP",
"Speech2TextConfig",
],
"models.speech_to_text_2": [
"SPEECH_TO_TEXT_2_PRETRAINED_CONFIG_ARCHIVE_MAP",
"Speech2Text2Config",
"Speech2Text2Processor",
"Speech2Text2Tokenizer",
],
"models.splinter": ["SPLINTER_PRETRAINED_CONFIG_ARCHIVE_MAP", "SplinterConfig", "SplinterTokenizer"],
"models.squeezebert": ["SQUEEZEBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "SqueezeBertConfig", "SqueezeBertTokenizer"],
"models.t5": ["T5_PRETRAINED_CONFIG_ARCHIVE_MAP", "T5Config"],
Expand Down Expand Up @@ -1040,6 +1046,7 @@
"load_tf_weights_in_roformer",
]
)
_import_structure["models.speech_encoder_decoder"] = ["SpeechEncoderDecoderModel"]
_import_structure["models.speech_to_text"].extend(
[
"SPEECH_TO_TEXT_PRETRAINED_MODEL_ARCHIVE_LIST",
Expand All @@ -1048,6 +1055,7 @@
"Speech2TextPreTrainedModel",
]
)
_import_structure["models.speech_to_text_2"].extend(["Speech2Text2ForCausalLM", "Speech2Text2PreTrainedModel"])
_import_structure["models.splinter"].extend(
[
"SPLINTER_PRETRAINED_MODEL_ARCHIVE_LIST",
Expand Down Expand Up @@ -1926,6 +1934,12 @@
from .models.roberta import ROBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP, RobertaConfig, RobertaTokenizer
from .models.roformer import ROFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, RoFormerConfig, RoFormerTokenizer
from .models.speech_to_text import SPEECH_TO_TEXT_PRETRAINED_CONFIG_ARCHIVE_MAP, Speech2TextConfig
from .models.speech_to_text_2 import (
SPEECH_TO_TEXT_2_PRETRAINED_CONFIG_ARCHIVE_MAP,
Speech2Text2Config,
Speech2Text2Processor,
Speech2Text2Tokenizer,
)
from .models.splinter import SPLINTER_PRETRAINED_CONFIG_ARCHIVE_MAP, SplinterConfig, SplinterTokenizer
from .models.squeezebert import SQUEEZEBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, SqueezeBertConfig, SqueezeBertTokenizer
from .models.t5 import T5_PRETRAINED_CONFIG_ARCHIVE_MAP, T5Config
Expand Down Expand Up @@ -2610,12 +2624,14 @@
RoFormerPreTrainedModel,
load_tf_weights_in_roformer,
)
from .models.speech_encoder_decoder import SpeechEncoderDecoderModel
from .models.speech_to_text import (
SPEECH_TO_TEXT_PRETRAINED_MODEL_ARCHIVE_LIST,
Speech2TextForConditionalGeneration,
Speech2TextModel,
Speech2TextPreTrainedModel,
)
from .models.speech_to_text_2 import Speech2Text2ForCausalLM, Speech2Text2PreTrainedModel
from .models.splinter import (
SPLINTER_PRETRAINED_MODEL_ARCHIVE_LIST,
SplinterForQuestionAnswering,
Expand Down
3 changes: 3 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
("detr", "DetrConfig"),
("gpt_neo", "GPTNeoConfig"),
("big_bird", "BigBirdConfig"),
("speech_to_text_2", "Speech2Text2Config"),
("speech_to_text", "Speech2TextConfig"),
("vit", "ViTConfig"),
("wav2vec2", "Wav2Vec2Config"),
Expand Down Expand Up @@ -109,6 +110,7 @@
("big_bird", "BIG_BIRD_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("megatron-bert", "MEGATRON_BERT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("speech_to_text", "SPEECH_TO_TEXT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("speech_to_text_2", "SPEECH_TO_TEXT_2_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("vit", "VIT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("wav2vec2", "WAV_2_VEC_2_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("m2m_100", "M2M_100_PRETRAINED_CONFIG_ARCHIVE_MAP"),
Expand Down Expand Up @@ -168,6 +170,7 @@
("detr", "DETR"),
("gpt_neo", "GPT Neo"),
("big_bird", "BigBird"),
("speech_to_text_2", "Speech2Text2"),
("speech_to_text", "Speech2Text"),
("vit", "ViT"),
("wav2vec2", "Wav2Vec2"),
Expand Down
6 changes: 3 additions & 3 deletions src/transformers/models/auto/feature_extraction_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,11 +156,11 @@ def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):

model_type = config_class_to_model_type(type(config).__name__)

if model_type is not None:
return FEATURE_EXTRACTOR_MAPPING[type(config)].from_dict(config_dict, **kwargs)
elif "feature_extractor_type" in config_dict:
if "feature_extractor_type" in config_dict:
patrickvonplaten marked this conversation as resolved.
Show resolved Hide resolved
feature_extractor_class = feature_extractor_class_from_name(config_dict["feature_extractor_type"])
return feature_extractor_class.from_dict(config_dict, **kwargs)
elif model_type is not None:
return FEATURE_EXTRACTOR_MAPPING[type(config)].from_dict(config_dict, **kwargs)

raise ValueError(
f"Unrecognized feature extractor in {pretrained_model_name_or_path}. Should have a `feature_extractor_type` key in "
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,7 @@
("blenderbot", "BlenderbotForCausalLM"),
("blenderbot-small", "BlenderbotSmallForCausalLM"),
("megatron-bert", "MegatronBertForCausalLM"),
("speech_to_text_2", "Speech2Text2ForCausalLM"),
]
)

Expand Down
3 changes: 2 additions & 1 deletion src/transformers/models/auto/tokenization_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,7 @@
("rag", ("RagTokenizer", None)),
("xlm-prophetnet", ("XLMProphetNetTokenizer" if is_sentencepiece_available() else None, None)),
("speech_to_text", ("Speech2TextTokenizer" if is_sentencepiece_available() else None, None)),
("speech_to_text_2", ("Speech2Text2Tokenizer", None)),
("m2m_100", ("M2M100Tokenizer" if is_sentencepiece_available() else None, None)),
("prophetnet", ("ProphetNetTokenizer", None)),
("mpnet", ("MPNetTokenizer", "MPNetTokenizerFast" if is_tokenizers_available() else None)),
Expand Down Expand Up @@ -233,7 +234,7 @@ def tokenizer_class_from_name(class_name: str):
module_name = "openai"

module = importlib.import_module(f".{module_name.replace('-', '_')}", "transformers.models")
return getattr(module, class_name)
return getattr(module, class_name, None)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getattr(module, class_name) can return an error if None is not defined (cc @sgugger)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but it should at this stage: module should have class_name as an attribute. With this change you will make silent an error that should be raisde,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it was that Speech2Text2Tokenizer has no fast tokenizer. This means that the list TOKENIZER_MAPPING_NAMES.items() is iterated for Speech2Text2TokenizerFast which then makes module_name the last module in the list of which then throws an error. So I think for such cases we need the None here no?



def get_tokenizer_config(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@
general usage and behavior.

Parameters:
config (:class:`~transformers.T5Config`): Model configuration class with all the parameters of the model.
config (:class:`~transformers.PretrainedConfig`): Model configuration class with all the parameters of the model.
Initializing with a config file does not load the weights associated with the model, only the
configuration. Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model
weights.
Expand Down
36 changes: 36 additions & 0 deletions src/transformers/models/speech_encoder_decoder/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# flake8: noqa
# There's no way to ignore "F401 '...' imported but unused" warnings in this
# module, but to preserve other warnings. So, don't check this module at all.

# Copyright 2021 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from typing import TYPE_CHECKING

from ...file_utils import _LazyModule, is_torch_available


_import_structure = {}

if is_torch_available():
_import_structure["modeling_speech_encoder_decoder"] = ["SpeechEncoderDecoderModel"]

if TYPE_CHECKING:
if is_torch_available():
from .modeling_speech_encoder_decoder import SpeechEncoderDecoderModel

else:
import sys

sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure)
Loading