Tf timestamps whisper + update generate support #21334

ArthurZucker · 2023-01-27T10:35:54Z

What does this PR

This PR updates the way we generation TF and FLAX to fix the breaking changes that we had.
It also adds support for the timestamps in TF.
Follows #21965

HuggingFaceDocBuilderDev · 2023-01-27T10:51:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

gante

A few suggestions 🙌

src/transformers/generation/tf_utils.py

src/transformers/generation/tf_logits_process.py

ArthurZucker · 2023-01-31T10:51:49Z

Awesome thanks for the review 🤗

github-actions · 2023-02-26T15:02:13Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

gante · 2023-02-27T14:46:46Z

lmk when you want to pick this up again :P Meanwhile, shall we add the WIP label, so that the bot doesn't ping us?

ArthurZucker · 2023-02-27T15:40:29Z

yes! Hahah sorry, maybe next week or 2 weeks from now !

…to tf-timestamps-whisper

Co-authored-by: Joao Gante <joao@huggingface.co>

ArthurZucker · 2023-03-14T10:17:23Z

Okay! Thanks to @gante's recommendations, the xla generation works perfectly! The slow timestamp processing test also passes 🥳

gante

Looks great! 🔥

I think we still have the Beam Search XLA Whisper problem -- I'm going to prioritize it now so we can announce this cool feature soon!

src/transformers/models/whisper/modeling_flax_whisper.py

amyeroberts

Thanks for adding this! Super exciting to have this in the other frameworks ⭐

Just a few small comments asking checks on inputs

amyeroberts · 2023-03-14T10:49:57Z

src/transformers/models/rag/modeling_tf_rag.py

@@ -1000,7 +1000,7 @@ def generate(
        doc_scores=None,
        n_docs=None,
        generation_config=None,
-        logits_processor=TFLogitsProcessorList(),
+        logits_processor: Optional[TFLogitsProcessorList] = TFLogitsProcessorList(),


Is it optional in the sense it can be a None value here?

Yes! this follows the same pattern as with torch where usually you don't give a logit processor

src/transformers/models/whisper/modeling_tf_whisper.py

amyeroberts · 2023-03-14T12:13:58Z

src/transformers/models/whisper/modeling_tf_whisper.py

+        forced_decoder_ids = []
+        if task is not None or language is not None:
+            if hasattr(generation_config, "language"):
+                if generation_config.language in generation_config.lang_to_id.keys():
+                    language_token = generation_config.language
+                elif generation_config.language in TO_LANGUAGE_CODE.keys():
+                    language_token = f"<|{TO_LANGUAGE_CODE[generation_config.language]}|>"
+                else:
+                    raise ValueError(
+                        f"Unsupported language: {self.language}. Language should be one of:"
+                        f" {list(TO_LANGUAGE_CODE.keys()) if generation_config.language in TO_LANGUAGE_CODE.keys() else list(TO_LANGUAGE_CODE.values())}."
+                    )
+                forced_decoder_ids.append((1, generation_config.lang_to_id[language_token]))
+            else:
+                forced_decoder_ids.append((1, None))  # automatically detect the language
+
+            if hasattr(generation_config, "task"):
+                if generation_config.task in TASK_IDS:
+                    forced_decoder_ids.append((2, generation_config.task_to_id[generation_config.task]))
+                else:
+                    raise ValueError(
+                        f"The `{generation_config.task}`task is not supported. The task should be one of `{TASK_IDS}`"
+                    )
+            else:
+                forced_decoder_ids.append((2, generation_config.task_to_id["transcribe"]))  # defaults to transcribe
+            if hasattr(generation_config, "no_timestamps_token_id") and not generation_config.return_timestamps:
+                idx = forced_decoder_ids[-1][0] + 1 if forced_decoder_ids else 1
+                forced_decoder_ids.append((idx, generation_config.no_timestamps_token_id))
+
+        # Legacy code for backward compatibility
+        elif hasattr(self.config, "forced_decoder_ids") and self.config.forced_decoder_ids is not None:
+            forced_decoder_ids = self.config.forced_decoder_ids
+        elif (
+            hasattr(self.generation_config, "forced_decoder_ids")
+            and self.generation_config.forced_decoder_ids is not None
+        ):
+            forced_decoder_ids = self.generation_config.forced_decoder_ids


I would add an extra check on the inputs to prevent this function having silent, unexpected behaviour - specifically forced_decoder_ids from the config overriding the language or task arguments

Suggested change

forced_decoder_ids = []

if task is not None or language is not None:

if hasattr(generation_config, "language"):

if generation_config.language in generation_config.lang_to_id.keys():

language_token = generation_config.language

elif generation_config.language in TO_LANGUAGE_CODE.keys():

language_token = f"<|{TO_LANGUAGE_CODE[generation_config.language]}|>"

else:

raise ValueError(

f"Unsupported language: {self.language}. Language should be one of:"

f" {list(TO_LANGUAGE_CODE.keys()) if generation_config.language in TO_LANGUAGE_CODE.keys() else list(TO_LANGUAGE_CODE.values())}."

)

forced_decoder_ids.append((1, generation_config.lang_to_id[language_token]))

else:

forced_decoder_ids.append((1, None)) # automatically detect the language

if hasattr(generation_config, "task"):

if generation_config.task in TASK_IDS:

forced_decoder_ids.append((2, generation_config.task_to_id[generation_config.task]))

else:

raise ValueError(

f"The `{generation_config.task}`task is not supported. The task should be one of `{TASK_IDS}`"

)

else:

forced_decoder_ids.append((2, generation_config.task_to_id["transcribe"])) # defaults to transcribe

if hasattr(generation_config, "no_timestamps_token_id") and not generation_config.return_timestamps:

idx = forced_decoder_ids[-1][0] + 1 if forced_decoder_ids else 1

forced_decoder_ids.append((idx, generation_config.no_timestamps_token_id))

# Legacy code for backward compatibility

elif hasattr(self.config, "forced_decoder_ids") and self.config.forced_decoder_ids is not None:

forced_decoder_ids = self.config.forced_decoder_ids

elif (

hasattr(self.generation_config, "forced_decoder_ids")

and self.generation_config.forced_decoder_ids is not None

):

forced_decoder_ids = self.generation_config.forced_decoder_ids

forced_decoder_ids = []

legacy_forced_decoder_ids = []

if hasattr(self.config, "forced_decoder_ids") and self.config.forced_decoder_ids is not None:

legacy_forced_decoder_ids = self.config.forced_decoder_ids

elif (

hasattr(self.generation_config, "forced_decoder_ids")

and self.generation_config.forced_decoder_ids is not None

):

legacy_forced_decoder_ids = self.generation_config.forced_decoder_ids

if task is not None or language is not None:

if legacy_forced_decoder_ids:

raise ValueError(

"Cannot specify language or task if forced_decoder_ids in model config or generation_config is set. "

"Please remove forced_decoder_ids from config file(s)"

)

if hasattr(generation_config, "language"):

if generation_config.language in generation_config.lang_to_id.keys():

language_token = generation_config.language

elif generation_config.language in TO_LANGUAGE_CODE.keys():

language_token = f"<|{TO_LANGUAGE_CODE[generation_config.language]}|>"

else:

raise ValueError(

f"Unsupported language: {self.language}. Language should be one of:"

f" {list(TO_LANGUAGE_CODE.keys()) if generation_config.language in TO_LANGUAGE_CODE.keys() else list(TO_LANGUAGE_CODE.values())}."

)

forced_decoder_ids.append((1, generation_config.lang_to_id[language_token]))

else:

forced_decoder_ids.append((1, None)) # automatically detect the language

if hasattr(generation_config, "task"):

if generation_config.task in TASK_IDS:

forced_decoder_ids.append((2, generation_config.task_to_id[generation_config.task]))

else:

raise ValueError(

f"The `{generation_config.task}`task is not supported. The task should be one of `{TASK_IDS}`"

)

else:

forced_decoder_ids.append((2, generation_config.task_to_id["transcribe"])) # defaults to transcribe

if hasattr(generation_config, "no_timestamps_token_id") and not generation_config.return_timestamps:

idx = forced_decoder_ids[-1][0] + 1 if forced_decoder_ids else 1

forced_decoder_ids.append((idx, generation_config.no_timestamps_token_id))

# Legacy code for backward compatibility

elif legacy_forced_decoder_ids:

forced_decoder_ids = legacy_forced_decoder_ids

For backward compatibility reasons, we had to make sure that the config's forced_decoder_ids. TF would probably also suffer from this 😞

amyeroberts · 2023-03-14T12:16:43Z

src/transformers/generation/tf_logits_process.py

+        self.timestamp_begin = generate_config.no_timestamps_token_id + 1
+
+        self.begin_index = len(generate_config.forced_decoder_ids) + 2
+        if generate_config.forced_decoder_ids[-1][1] == self.no_timestamps_token_id:


Is forced_decoder_ids always going to be set?

It has to! In the generate it will always be given a default value, but we can raise an error for easier debugging

amyeroberts · 2023-03-14T12:34:05Z

src/transformers/models/whisper/modeling_flax_whisper.py


        forced_decoder_ids = []
-
-        if hasattr(generation_config, "is_multilingual") and generation_config.is_multilingual:
+        if task is not None or language is not None:


Same notes as in modeling_tf_whisper about checking the inputs

amyeroberts · 2023-03-14T12:35:13Z

src/transformers/generation/tf_logits_process.py

+            new_scores = tf.tensor_scatter_nd_update(new_scores, indices, updates)
+            return new_scores
+
+        def _force_tokens(current_tokens):


nit: My understanding is this function only works with a single token (not tokens)

Suggested change

def _force_tokens(current_tokens):

def _force_token(current_token):

amyeroberts · 2023-03-14T12:43:32Z

src/transformers/models/whisper/modeling_tf_whisper.py

+        if generation_config.return_timestamps:
+            if logits_processor is not None:
+                logits_processor += [TFWhisperTimeStampLogitsProcessor(generation_config)]
+            else:
+                logits_processor = [TFWhisperTimeStampLogitsProcessor(generation_config)]


Should there be a default logits_processor set here if generation_config.return_timestamps is False?

No, if we don't return timestamps we just don't need an additional processor. (but we should make sure that the forced_decoder_ids has the no_timestamp_token forced

amyeroberts · 2023-03-14T13:05:12Z

src/transformers/generation/tf_logits_process.py

+
+class TFWhisperTimeStampLogitsProcessor(TFLogitsProcessor):
+    r"""
+    Whisper specific Processor. This processor can be used to force a list of tokens. The processor will set their log


Could you expand this a little to specify what special about it being a timestamp logits processor?

Sure, will also update the Pytorch definition to make it clearer!

amyeroberts · 2023-03-14T15:22:42Z

src/transformers/generation/tf_logits_process.py

+            return scores
+        # len(seq) == 1 corresponds to cur_len == self.begin_index + 2
+        last_was_timestamp = (cur_len >= self.begin_index + 2) & (input_ids[:, cur_len - 1] >= self.timestamp_begin)
+        penultimate_was_timestamp = (cur_len < self.begin_index + 3) | (


Just for my understanding is the cur_len < self.begin_index + 3 line because the first timestamp won't have a previous timestamp token prediction?

Yes! I can add a comment for this 😉

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

ArthurZucker · 2023-03-29T11:45:24Z

Thanks for your review, will adresse all of this

makaveli10 · 2023-06-14T15:34:45Z

@ArthurZucker I was testing out if I get the timestamps with TF model with your tf-timestamps-whisper branch on colab but I see this:

[/content/transformers/src/transformers/models/whisper/tokenization_whisper.py](https://localhost:8080/#) in decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces, output_offsets, time_precision, decode_with_timestamps, **kwargs)
    593         )
    594         if decode_with_timestamps:
--> 595             text = self._decode_with_timestamps(token_ids, time_precision=time_precision)
    596         # retrieve offsets
    597         if output_offsets:

[/content/transformers/src/transformers/models/whisper/tokenization_whisper.py](https://localhost:8080/#) in _decode_with_timestamps(self, token_ids, time_precision)
    501         for token in token_ids:
    502             if token >= timestamp_begin:
--> 503                 timestamp = f"<|{(token - timestamp_begin) * time_precision:.2f}|>"
    504                 outputs.append(timestamp)
    505                 outputs.append([])

[/usr/local/lib/python3.10/dist-packages/tensorflow/python/util/traceback_utils.py](https://localhost:8080/#) in error_handler(*args, **kwargs)
    151     except Exception as e:
    152       filtered_tb = _process_traceback_frames(e.__traceback__)
--> 153       raise e.with_traceback(filtered_tb) from None
    154     finally:
    155       del filtered_tb

[/usr/local/lib/python3.10/dist-packages/tensorflow/python/ops/gen_math_ops.py](https://localhost:8080/#) in mul(x, y, name)
   6574   if tld.is_eager:
   6575     try:
-> 6576       _result = pywrap_tfe.TFE_Py_FastPathExecute(
   6577         _ctx, "Mul", name, x, y)
   6578       return _result

TypeError: Cannot convert 0.02 to EagerTensor of dtype int32

ArthurZucker · 2023-06-15T06:33:59Z

Hey! That’s probably because I haven’t pull from main for a while and we changed the whisper tokenizer. As you can see the decoding process is the one failing here

makaveli10 · 2023-06-15T06:54:18Z

@ArthurZucker Thanks for the response. I got the issue resolved with

timestamp = f"<|{float(token - timestamp_begin) * time_precision:.2f}|>"

i.e. changing token - timestamp_begin to float(token - timestamp_begin)

github-actions · 2023-08-03T15:03:08Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ArthurZucker added 5 commits January 26, 2023 15:20

another patch

7942d3f

fix timestamp test modeling

322ab29

let it be negative when the token is None

d4326da

draft logits

41e5e0f

properly support adding a ligit process

7f0c2c1

update processsor, generation works a bit!

074218b

ArthurZucker changed the title ~~Tf timestamps whisper~~ [Whisper] Tf timestamps whisper Jan 27, 2023

ArthurZucker added 5 commits January 27, 2023 15:32

add test

d7113cd

fix piepline tf

014f56c

remove calls to np

3287ffb

xla works but nrmal doesn't

8d0a624

update, tf works! not jit compiled

31a7584

ArthurZucker mentioned this pull request Jan 29, 2023

Patch rag tf generate #21356

Closed

add logits processor args

c6aec69

ArthurZucker requested a review from gante January 30, 2023 13:09

gante reviewed Jan 30, 2023

View reviewed changes

ArthurZucker changed the title ~~[Whisper] Tf timestamps whisper~~ [WIP] Tf timestamps whisper Feb 27, 2023

ArthurZucker added 3 commits March 14, 2023 08:35

Merge branch 'main' of https://github.com/huggingface/transformers in…

0ebc684

…to tf-timestamps-whisper

update and merge main

c725b7d

finish mergin conflicts

2f86231

ArthurZucker mentioned this pull request Mar 14, 2023

[🛠️] Fix-whisper-breaking-changes #21965

Merged

ArthurZucker changed the title ~~[WIP] Tf timestamps whisper~~ [WIP] Tf timestamps whisper + update generate support Mar 14, 2023

update code based on suggestions

2e19c4f

Co-authored-by: Joao Gante <joao@huggingface.co>

ArthurZucker marked this pull request as ready for review March 14, 2023 10:17

ArthurZucker changed the title ~~[WIP] Tf timestamps whisper + update generate support~~ Tf timestamps whisper + update generate support Mar 14, 2023

remove logs

6357632

ArthurZucker requested a review from amyeroberts March 14, 2023 10:31

gante approved these changes Mar 14, 2023

View reviewed changes

src/transformers/models/whisper/modeling_flax_whisper.py Show resolved Hide resolved

ArthurZucker requested a review from sgugger March 14, 2023 12:40

amyeroberts approved these changes Mar 14, 2023

View reviewed changes

Update src/transformers/models/whisper/modeling_tf_whisper.py

f2781b4

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

huggingface deleted a comment from github-actions bot Apr 24, 2023

huggingface deleted a comment from github-actions bot May 25, 2023

sanchit-gandhi mentioned this pull request Jun 1, 2023

[Feature Request] Add timestamp prediction for TF Whisper #23928

Open

4 tasks

ArthurZucker mentioned this pull request Jun 23, 2023

Fix some TFWhisperModelIntegrationTests #24428

Merged

huggingface deleted a comment from github-actions bot Jul 10, 2023

github-actions bot closed this Aug 11, 2023

forfrt mentioned this pull request Nov 28, 2024

Make whisper-event checkpoints compliant to support return_timestamp #21878

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tf timestamps whisper + update generate support #21334

Tf timestamps whisper + update generate support #21334

ArthurZucker commented Jan 27, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 27, 2023

gante left a comment

ArthurZucker commented Jan 31, 2023

github-actions bot commented Feb 26, 2023

gante commented Feb 27, 2023

ArthurZucker commented Feb 27, 2023

ArthurZucker commented Mar 14, 2023 •

edited

Loading

gante left a comment

amyeroberts left a comment

amyeroberts Mar 14, 2023

ArthurZucker Mar 29, 2023

amyeroberts Mar 14, 2023

ArthurZucker Mar 29, 2023

amyeroberts Mar 14, 2023

ArthurZucker Mar 29, 2023

amyeroberts Mar 14, 2023

amyeroberts Mar 14, 2023

amyeroberts Mar 14, 2023

ArthurZucker Mar 29, 2023

amyeroberts Mar 14, 2023

ArthurZucker Mar 29, 2023

amyeroberts Mar 14, 2023

ArthurZucker Mar 29, 2023

ArthurZucker commented Mar 29, 2023

makaveli10 commented Jun 14, 2023 •

edited

Loading

ArthurZucker commented Jun 15, 2023

makaveli10 commented Jun 15, 2023 •

edited

Loading

github-actions bot commented Aug 3, 2023

	def _force_tokens(current_tokens):
	def _force_token(current_token):

Tf timestamps whisper + update generate support #21334

Tf timestamps whisper + update generate support #21334

Conversation

ArthurZucker commented Jan 27, 2023 • edited Loading

What does this PR

HuggingFaceDocBuilderDev commented Jan 27, 2023

gante left a comment

Choose a reason for hiding this comment

ArthurZucker commented Jan 31, 2023

github-actions bot commented Feb 26, 2023

gante commented Feb 27, 2023

ArthurZucker commented Feb 27, 2023

ArthurZucker commented Mar 14, 2023 • edited Loading

gante left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArthurZucker commented Mar 29, 2023

makaveli10 commented Jun 14, 2023 • edited Loading

ArthurZucker commented Jun 15, 2023

makaveli10 commented Jun 15, 2023 • edited Loading

github-actions bot commented Aug 3, 2023

ArthurZucker commented Jan 27, 2023 •

edited

Loading

ArthurZucker commented Mar 14, 2023 •

edited

Loading

makaveli10 commented Jun 14, 2023 •

edited

Loading

makaveli10 commented Jun 15, 2023 •

edited

Loading