Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip failing AlignModelTest::test_multi_gpu_data_parallel_forward #23374

Merged
merged 1 commit into from
May 15, 2023

Conversation

ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented May 15, 2023

What does this PR do?

tests/models/align/test_modeling_align.py::AlignModelTest::test_multi_gpu_data_parallel_forward starts to fail after we switch to torch+cu118. If I install back with torch+cu117, it passes again.

This test uses torch.nn.DataParallel which is not recommended (despite not deprecated yet). The error is pure CUDA thing for which I have no knowledge. Combing all the above facts + the usage of this model, let's just skip this particular test for AlignModelTest.

(This failing test cause the other 18 tests to fail due to the CUDA is in a bad state)

E       RuntimeError: Caught RuntimeError in replica 0 on device 0.
E       Original Traceback (most recent call last):
E         File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
E           output = module(*input, **kwargs)
E         File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
E           return forward_call(*args, **kwargs)
E         File "/transformers/src/transformers/models/align/modeling_align.py", line 1596, in forward
E           vision_outputs = self.vision_model(
E         File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
E           return forward_call(*args, **kwargs)
E         File "/transformers/src/transformers/models/align/modeling_align.py", line 1395, in forward
E           embedding_output = self.embeddings(pixel_values)
E         File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
E           return forward_call(*args, **kwargs)
E         File "/transformers/src/transformers/models/align/modeling_align.py", line 345, in forward
E           features = self.convolution(features)
E         File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
E           return forward_call(*args, **kwargs)
E         File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py", line 463, in forward
E           return self._conv_forward(input, self.weight, self.bias)
E         File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
E           return F.conv2d(input, weight, bias, self.stride,
E       RuntimeError: GET was unable to find an engine to execute this computation

@ydshieh ydshieh requested a review from amyeroberts May 15, 2023 14:34
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for resolving!

@ydshieh ydshieh merged commit 8f76dc8 into main May 15, 2023
@ydshieh ydshieh deleted the skip_test_001 branch May 15, 2023 14:47
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

sheonhan pushed a commit to sheonhan/transformers that referenced this pull request Jun 1, 2023
…uggingface#23374)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
gojiteji pushed a commit to gojiteji/transformers that referenced this pull request Jun 5, 2023
…uggingface#23374)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023
…uggingface#23374)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants