Fix ONNX export for causal LM sequence classifiers by removing reverse indexing #28144

dwyatte · 2023-12-19T17:43:36Z

What does this PR do?

Follow-up to #27450 and another step to fixing huggingface/optimum#1527. ONNX implements indexing using a combination of its own operators and when using reverse indexing (e.g., -1 to indicate 1 element from the right side of an array), it can produce incorrect results (see PyTorch's ONNX export code). In practice, this can cause the batch dimension to get shuffled

Causal LM sequence were previously using -1 for the last token. Adding sequence_lengths = torch.where(sequence_lengths >= 0, sequence_lengths, input_ids.shape[-1] - 1) effectively removes reverse indexing

While this could be fixed in https://github.com/huggingface/optimum by forcing the inputs used to trace the graph to contain a pad token and avoiding reverse indexing, it seems better to fix in transformers with the added benefit of bringing the code in line with TensorFlow implementations of the same code (e.g., https://github.com/huggingface/transformers/pull/25085/files#diff-7c6fdd54ac4b8ce0c09bb17da15f176d3e5827df39dd8234fd802631e99ef38dR801-R804)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@ArthurZucker, @amyeroberts, @younesbelkada (CC @fxmarty)

amyeroberts

Thanks for fixing this!

General comment on the indexing assumptions made here

amyeroberts · 2023-12-19T18:06:43Z

src/transformers/models/ctrl/modeling_ctrl.py

+                sequence_lengths = torch.where(sequence_lengths >= 0, sequence_lengths, input_ids.shape[-1] - 1).to(
                    logits.device
                )


Let's split across lines - it'll use fewer lines overall

Can we instead use modulo to convert to the equivalent negative index? The logic at the moment assumes we only ever want to take the final index and I think it'll be faster than torch.where

Suggested change

sequence_lengths = torch.where(sequence_lengths >= 0, sequence_lengths, input_ids.shape[-1] - 1).to(

logits.device

)

sequence_lengths = sequence_lengths % input_ids.shape[-1]

sequence_lengths = sequence_lengths.to(logits.device)

That should work. Updated all changes

amyeroberts

LGTM - thanks for iterating!

I'd like to have another approval from either @younesbelkada or @ArthurZucker before merging as they know the causal lm models well

ArthurZucker

Thanks both, looks good yep!
More models that can benefit from this: and that use sequence_lengths = (torch.ne(input_ids, self.config.pad_token_id).sum(dim=-1) - 1).to(logits.device)
(not sure if argmax is faster?)

bloom
falcon
mpt

let's try to unify this!

dwyatte · 2023-12-20T16:33:55Z

Thanks @ArthurZucker @amyeroberts. I've unified the Bloom/Falcon/MPT implementations, but doing so triggered what looks to be an unrelated CI failure. Can someone take a look and fix/disable that test if it is indeed unrelated?

FAILED tests/models/seamless_m4t/test_modeling_seamless_m4t.py::SeamlessM4TModelWithTextInputTest::test_retain_grad_hidden_states_attentions - AttributeError: 'NoneType' object has no attribute 'retain_grad'

amyeroberts · 2023-12-20T16:54:18Z

@dwyatte Yep, that's a flaky test. A patch to skip it in the testing suite was recently merged into main to prevent it affecting unrelated PRs like this one :) Could you rebase to include recent updates and trigger a new CI run?

dwyatte · 2023-12-20T17:15:23Z

@amyeroberts Hm, b0db02c contains the latest commit on main (224ab70), so I think tests/models/seamless_m4t/test_modeling_seamless_m4t.py::SeamlessM4TModelWithTextInputTest::test_retain_grad_hidden_states_attentions is still broken/flaking there

amyeroberts · 2023-12-20T18:22:26Z

@dwyatte hm, that's odd. The test shouldn't even be running as it's explicitly skipped. In your local, on this branch, do you see this skip condition in test_modeling_seamless_m4t.py?

dwyatte · 2023-12-20T18:43:17Z

@amyeroberts I see what's going on -- the failure is on SeamlessM4TModelWithTextInputTest but the explicit skip exists on SeamlessM4TModelWithSpeechInputTest. Let me know if I should add the same skip to SeamlessM4TModelWithTextInputTest on my branch or if you prefer a different fix/PR

amyeroberts · 2023-12-20T19:02:29Z

@dwyatte Ah! Gotcha. Yes please, could you open another separate PR to skip the retain grad tests for all the SeamlessMT4 models?

dwyatte · 2023-12-21T16:33:03Z

Ok @amyeroberts @ArthurZucker, after rebasing on the above, this is ready for merging. Thanks both!

HuggingFaceDocBuilderDev · 2023-12-22T09:53:10Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…e indexing (huggingface#28144) * normalize reverse indexing for causal lm sequence classifiers * normalize reverse indexing for causal lm sequence classifiers * normalize reverse indexing for causal lm sequence classifiers * use modulo instead * unify modulo-based sequence lengths

sentialx · 2024-02-11T21:21:35Z

Why not just have a shared util for this, instead of repeating the code all over the place

dwyatte mentioned this pull request Dec 19, 2023

Support exporting text-generation models for sequence classification to ONNX huggingface/optimum#1527

Closed

amyeroberts reviewed Dec 19, 2023

View reviewed changes

dwyatte requested a review from amyeroberts December 19, 2023 18:28

amyeroberts approved these changes Dec 19, 2023

View reviewed changes

ArthurZucker approved these changes Dec 20, 2023

View reviewed changes

dwyatte force-pushed the causal_classification_onnx2 branch from 02c782a to 55d4013 Compare December 20, 2023 16:09

dwyatte force-pushed the causal_classification_onnx2 branch from 55d4013 to b0db02c Compare December 20, 2023 17:04

dwyatte force-pushed the causal_classification_onnx2 branch from b0db02c to bb55859 Compare December 20, 2023 18:36

dwyatte mentioned this pull request Dec 20, 2023

disable test_retain_grad_hidden_states_attentions on SeamlessM4TModelWithTextInputTest #28169

Merged

5 tasks

dwyatte added 5 commits December 21, 2023 16:20

normalize reverse indexing for causal lm sequence classifiers

33c30ed

normalize reverse indexing for causal lm sequence classifiers

1d1af83

normalize reverse indexing for causal lm sequence classifiers

a301abc

use modulo instead

2823b47

unify modulo-based sequence lengths

ebfcd71

dwyatte force-pushed the causal_classification_onnx2 branch from bb55859 to ebfcd71 Compare December 21, 2023 16:21

amyeroberts merged commit 548a8f6 into huggingface:main Dec 22, 2023
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ONNX export for causal LM sequence classifiers by removing reverse indexing #28144

Fix ONNX export for causal LM sequence classifiers by removing reverse indexing #28144

dwyatte commented Dec 19, 2023 •

edited

Loading

amyeroberts left a comment

amyeroberts Dec 19, 2023

dwyatte Dec 19, 2023

amyeroberts left a comment

ArthurZucker left a comment

dwyatte commented Dec 20, 2023

amyeroberts commented Dec 20, 2023

dwyatte commented Dec 20, 2023 •

edited

Loading

amyeroberts commented Dec 20, 2023

dwyatte commented Dec 20, 2023

amyeroberts commented Dec 20, 2023

dwyatte commented Dec 21, 2023

HuggingFaceDocBuilderDev commented Dec 22, 2023

sentialx commented Feb 11, 2024

Fix ONNX export for causal LM sequence classifiers by removing reverse indexing #28144

Fix ONNX export for causal LM sequence classifiers by removing reverse indexing #28144

Conversation

dwyatte commented Dec 19, 2023 • edited Loading

What does this PR do?

Before submitting

Who can review?

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Dec 19, 2023

Choose a reason for hiding this comment

dwyatte Dec 19, 2023

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

dwyatte commented Dec 20, 2023

amyeroberts commented Dec 20, 2023

dwyatte commented Dec 20, 2023 • edited Loading

amyeroberts commented Dec 20, 2023

dwyatte commented Dec 20, 2023

amyeroberts commented Dec 20, 2023

dwyatte commented Dec 21, 2023

HuggingFaceDocBuilderDev commented Dec 22, 2023

sentialx commented Feb 11, 2024

dwyatte commented Dec 19, 2023 •

edited

Loading

dwyatte commented Dec 20, 2023 •

edited

Loading