Remove device parameter from create_extended_attention_mask_for_decoder #16894

pbelevich · 2022-04-22T14:59:43Z

What does this PR do?

This RP removes redundant device parameter from create_extended_attention_mask_for_decoder that may cause potential issues if passed device is not equal attention_mask.device, see line modeling_utils.py#L610. Explanation: tracing logic from line 610 to method signature:
causal_mask.device == attention_mask.device => seq_ids.device == attention_mask.device => device == attention_mask.device

@michaelbenayoun

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2022-04-22T15:26:28Z

The documentation is not available anymore as the PR was closed or merged.

michaelbenayoun · 2022-04-25T08:41:01Z

This seems legit for me, pinging @LysandreJik, @sgugger and @ydshieh to comment on this.

ydshieh · 2022-04-25T09:23:15Z

LGTM, as it uses the device from the argument attention_mask.

transformers/src/transformers/modeling_utils.py

Lines 592 to 594 in 5d59df5

    
           def create_extended_attention_mask_for_decoder(self, input_shape, attention_mask): 
        
               batch_size, seq_length = input_shape 
        
               device = attention_mask.device

Thank you for reducing the potential issue!

(Please wait the approvals from sgugger or LysandreJik before merge 🙏 )

sgugger

Thanks for your PR. Removing arguments from public methods is a bit of a breaking change (even if I don't expect many users to use those directly) so since it's very easy to avoid it here and raise a proper deprecation warning, I would like to this added before we merge.

Also, the first two research projects should not be touched.

sgugger · 2022-04-25T12:47:12Z

examples/research_projects/bert-loses-patience/pabee/modeling_pabee_bert.py

@@ -152,7 +152,7 @@ def forward(

        # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
        # ourselves in which case we just need to make it broadcastable to all heads.
-        extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(attention_mask, input_shape, device)
+        extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(attention_mask, input_shape)


This example is pinned to Transformers == 3.5.1 so don't make any change there.

sgugger · 2022-04-25T12:47:30Z

examples/research_projects/deebert/src/modeling_highway_bert.py

@@ -195,7 +195,7 @@ def forward(

        # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
        # ourselves in which case we just need to make it broadcastable to all heads.
-        extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(attention_mask, input_shape, device)
+        extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(attention_mask, input_shape)


sgugger · 2022-04-25T12:47:30Z

examples/research_projects/deebert/src/modeling_highway_bert.py

@@ -195,7 +195,7 @@ def forward(

        # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
        # ourselves in which case we just need to make it broadcastable to all heads.
-        extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(attention_mask, input_shape, device)
+        extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(attention_mask, input_shape)


sgugger · 2022-04-25T12:48:02Z

examples/research_projects/longform-qa/eli5_utils.py

@@ -137,7 +137,7 @@ def embed_sentences_checkpointed(self, input_ids, attention_mask, checkpoint_bat
            token_type_ids = torch.zeros(input_shape, dtype=torch.long, device=device)
            head_mask = [None] * self.sent_encoder.config.num_hidden_layers
            extended_attention_mask: torch.Tensor = self.sent_encoder.get_extended_attention_mask(
-                attention_mask, input_shape, device
+                attention_mask, input_shape


This one can be updated as it's not pinned.

sgugger · 2022-04-25T12:48:02Z

examples/research_projects/longform-qa/eli5_utils.py

@@ -137,7 +137,7 @@ def embed_sentences_checkpointed(self, input_ids, attention_mask, checkpoint_bat
            token_type_ids = torch.zeros(input_shape, dtype=torch.long, device=device)
            head_mask = [None] * self.sent_encoder.config.num_hidden_layers
            extended_attention_mask: torch.Tensor = self.sent_encoder.get_extended_attention_mask(
-                attention_mask, input_shape, device
+                attention_mask, input_shape


This one can be updated as it's not pinned.

sgugger · 2022-04-25T12:50:13Z

src/transformers/modeling_utils.py

@@ -589,8 +589,9 @@ def invert_attention_mask(self, encoder_attention_mask: Tensor) -> Tensor:

        return encoder_extended_attention_mask

-    def create_extended_attention_mask_for_decoder(self, input_shape, attention_mask, device):
+    def create_extended_attention_mask_for_decoder(self, input_shape, attention_mask):


Removing an argument from a public method like this is a breaking change, so we should continue to accept if with a default of None than raise a deprecation warning if we detect it's not None telling the user that argument is not used anymore and will be removed in v5 of Transformers.

sgugger · 2022-04-25T12:50:13Z

src/transformers/modeling_utils.py

@@ -589,8 +589,9 @@ def invert_attention_mask(self, encoder_attention_mask: Tensor) -> Tensor:

        return encoder_extended_attention_mask

-    def create_extended_attention_mask_for_decoder(self, input_shape, attention_mask, device):
+    def create_extended_attention_mask_for_decoder(self, input_shape, attention_mask):


Removing an argument from a public method like this is a breaking change, so we should continue to accept if with a default of None than raise a deprecation warning if we detect it's not None telling the user that argument is not used anymore and will be removed in v5 of Transformers.

sgugger · 2022-04-25T12:50:18Z

src/transformers/modeling_utils.py

@@ -610,7 +611,7 @@ def create_extended_attention_mask_for_decoder(self, input_shape, attention_mask
        extended_attention_mask = causal_mask[:, None, :, :] * attention_mask[:, None, None, :]
        return extended_attention_mask

-    def get_extended_attention_mask(self, attention_mask: Tensor, input_shape: Tuple[int], device: device) -> Tensor:
+    def get_extended_attention_mask(self, attention_mask: Tensor, input_shape: Tuple[int]) -> Tensor:


sgugger · 2022-04-25T12:50:18Z

src/transformers/modeling_utils.py

@@ -610,7 +611,7 @@ def create_extended_attention_mask_for_decoder(self, input_shape, attention_mask
        extended_attention_mask = causal_mask[:, None, :, :] * attention_mask[:, None, None, :]
        return extended_attention_mask

-    def get_extended_attention_mask(self, attention_mask: Tensor, input_shape: Tuple[int], device: device) -> Tensor:
+    def get_extended_attention_mask(self, attention_mask: Tensor, input_shape: Tuple[int]) -> Tensor:


sgugger · 2022-04-25T12:47:12Z

examples/research_projects/bert-loses-patience/pabee/modeling_pabee_bert.py

@@ -152,7 +152,7 @@ def forward(

        # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
        # ourselves in which case we just need to make it broadcastable to all heads.
-        extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(attention_mask, input_shape, device)
+        extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(attention_mask, input_shape)


This example is pinned to Transformers == 3.5.1 so don't make any change there.

sgugger · 2022-04-29T15:20:24Z

src/transformers/modeling_utils.py

-    def create_extended_attention_mask_for_decoder(input_shape, attention_mask, device):
+    def create_extended_attention_mask_for_decoder(self, input_shape, attention_mask, device=None):
+        if device is not None:
+            warnings.warn("`device` is deprecated and will be removed in v5 of Transformers.")


Suggested change

warnings.warn("`device` is deprecated and will be removed in v5 of Transformers.")

warnings.warn("The `device` argument is deprecated and will be removed in v5 of Transformers.", FutureWarning)

sgugger · 2022-04-29T15:20:52Z

src/transformers/modeling_utils.py

            device: (`torch.device`):
+                **DEPRECATED**. `attention_mask.device` will be used instead in v5 of Transformers.
                The device of the input to the model.


Remove the documentation entirely for a deprecated argument.

sgugger · 2022-04-29T15:21:09Z

src/transformers/modeling_utils.py

+        if not (attention_mask.dim() == 2 and self.config.is_decoder):
+            # show warning only if it won't be shown in `create_extended_attention_mask_for_decoder`
+            if device is not None:
+                warnings.warn("`device` is deprecated and will be removed in v5 of Transformers.")


Suggested change

warnings.warn("`device` is deprecated and will be removed in v5 of Transformers.")

warnings.warn("The `device` argument is deprecated and will be removed in v5 of Transformers.", FutureWarning)

pbelevich · 2022-04-29T16:01:15Z

@sgugger thanks for the code review! all comments have been addressed

sgugger · 2022-04-29T16:12:23Z

Thanks! Pinging @LysandreJik for final review :-)

LysandreJik

LGTM, thanks @pbelevich!

…er (huggingface#16894)

pbelevich force-pushed the remove_device_from_create_extended_attention_mask_for_decoder branch from 9eaea11 to 5d59df5 Compare April 22, 2022 15:10

pbelevich marked this pull request as ready for review April 22, 2022 15:27

michaelbenayoun requested review from ydshieh, michaelbenayoun, LysandreJik and sgugger and removed request for ydshieh and michaelbenayoun April 25, 2022 08:41

ydshieh approved these changes Apr 25, 2022

View reviewed changes

sgugger approved these changes Apr 25, 2022

View reviewed changes

pbelevich force-pushed the remove_device_from_create_extended_attention_mask_for_decoder branch 3 times, most recently from 1326012 to 994597c Compare April 29, 2022 15:19

sgugger reviewed Apr 29, 2022

View reviewed changes

pbelevich force-pushed the remove_device_from_create_extended_attention_mask_for_decoder branch from 994597c to 91758cf Compare April 29, 2022 15:31

Remove device parameter from create_extended_attention_mask_for_decoder

209647b

pbelevich force-pushed the remove_device_from_create_extended_attention_mask_for_decoder branch from 91758cf to 209647b Compare April 29, 2022 15:43

LysandreJik approved these changes May 3, 2022

View reviewed changes

LysandreJik merged commit 39f8eaf into huggingface:main May 3, 2022

stevhliu pushed a commit to stevhliu/transformers that referenced this pull request May 3, 2022

Remove device parameter from create_extended_attention_mask_for_decod…

4dbd9ef

…er (huggingface#16894)

nandwalritik pushed a commit to nandwalritik/transformers that referenced this pull request May 4, 2022

Remove device parameter from create_extended_attention_mask_for_decod…

726b8f3

…er (huggingface#16894)

Narsil pushed a commit to Narsil/transformers that referenced this pull request May 12, 2022

Remove device parameter from create_extended_attention_mask_for_decod…

559deb8

…er (huggingface#16894)

elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022

Remove device parameter from create_extended_attention_mask_for_decod…

43d3466

…er (huggingface#16894)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove device parameter from create_extended_attention_mask_for_decoder #16894

Remove device parameter from create_extended_attention_mask_for_decoder #16894

pbelevich commented Apr 22, 2022

HuggingFaceDocBuilderDev commented Apr 22, 2022 •

edited

Loading

michaelbenayoun commented Apr 25, 2022

ydshieh commented Apr 25, 2022 •

edited

Loading

sgugger left a comment

sgugger Apr 25, 2022

sgugger Apr 25, 2022

sgugger Apr 25, 2022

sgugger Apr 25, 2022

sgugger Apr 25, 2022

sgugger Apr 25, 2022

sgugger Apr 25, 2022

sgugger Apr 25, 2022

sgugger Apr 25, 2022

sgugger Apr 25, 2022

sgugger Apr 29, 2022

sgugger Apr 29, 2022

sgugger Apr 29, 2022

pbelevich commented Apr 29, 2022

sgugger commented Apr 29, 2022

LysandreJik left a comment

	warnings.warn("`device` is deprecated and will be removed in v5 of Transformers.")
	warnings.warn("The `device` argument is deprecated and will be removed in v5 of Transformers.", FutureWarning)

Remove device parameter from create_extended_attention_mask_for_decoder #16894

Remove device parameter from create_extended_attention_mask_for_decoder #16894

Conversation

pbelevich commented Apr 22, 2022

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Apr 22, 2022 • edited Loading

michaelbenayoun commented Apr 25, 2022

ydshieh commented Apr 25, 2022 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pbelevich commented Apr 29, 2022

sgugger commented Apr 29, 2022

LysandreJik left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Apr 22, 2022 •

edited

Loading

ydshieh commented Apr 25, 2022 •

edited

Loading