[`pipeline`] Fix str device issue #24396

younesbelkada · 2023-06-21T07:51:03Z

What does this PR do?

Currently passing device="cuda" is not supported when creating a pipeline.
This is because torch.cuda.set_device(self.device) expects the device to have an explicit index. The fix is to create an indexed device when initializing a pipeline with a str device

Handy reproducible snippet:

from transformers import pipeline

# this works
pipe = pipeline("text-generation", device=0)
pipe("Hello")

# this works
pipe = pipeline("text-generation", device="cuda:0")
pipe("Hello")

# this fails
pipe = pipeline("text-generation", device="cuda")
pipe("Hello")

cc @amyeroberts @Narsil

HuggingFaceDocBuilderDev · 2023-06-21T08:11:00Z

The documentation is not available anymore as the PR was closed or merged.

amyeroberts

Thanks for fixing!

Just some more comments on making it a bit more robust. They're suggestions, so up to you if you want to add.

amyeroberts · 2023-06-21T08:04:11Z

tests/pipelines/test_pipelines_common.py

+    @require_torch_gpu
+    def test_pipeline_cuda(self):
+
+        pipe = pipeline("text-generation", device="cuda")


Could you also add an equivalent test here for "cuda:0" to make sure things still work even if the logic changes upstream?

amyeroberts · 2023-06-21T08:11:09Z

src/transformers/pipelines/base.py

@@ -793,6 +793,8 @@ def __init__(
            if isinstance(device, torch.device):
                self.device = device
            elif isinstance(device, str):
+                if device == "cuda":
+                    device = f"cuda:{torch.cuda.current_device()}"
                self.device = torch.device(device)
            elif device < 0:
                self.device = torch.device("cpu")


I can't comment on the line below, but we could make this if/elif/else check a bit safer by doing

elif isinstance(device, int): self.device = device else: raise ValueError(f"Device type not supported. Got {device}")

https://pytorch.org/docs/stable/generated/torch.cuda.set_device.html

set_device seems strongly discouraged, so I'm unsure about current_device() usage.

torch.device("cuda")

Works though, what's the issue ?

Narsil · 2023-06-21T10:08:11Z

Also

python -c 'from transformers import pipeline; pipe = pipeline(model="gpt2", device="cuda")'

Works on main.. So I'm not sure what's the issue

younesbelkada · 2023-06-21T10:09:32Z

@Narsil what you shared works on main but it should throw an error if you try to run an example with it (I attached a reproducible snippet above)

Alternatively, this fails on main and this PR fixes it

python -c 'from transformers import pipeline; pipe = pipeline(model="gpt2", device="cuda"); pipe("hello")'

Narsil · 2023-06-21T10:12:57Z

Can we remove the set_device instead then ? Seems better:

diff --git a/src/transformers/pipelines/base.py b/src/transformers/pipelines/base.py
index 510c07cf5..b5975d081 100644
--- a/src/transformers/pipelines/base.py
+++ b/src/transformers/pipelines/base.py
@@ -901,10 +901,8 @@ class Pipeline(_ScikitCompat):
             with tf.device("/CPU:0" if self.device == -1 else f"/device:GPU:{self.device}"):
                 yield
         else:
-            if self.device.type == "cuda":
-                torch.cuda.set_device(self.device)
-
-            yield
+            with torch.cuda.device(self.device):
+                yield

Narsil · 2023-06-21T10:14:40Z

The initial thing fails indeed, and seems to be linked to the fact that there are multiple set_device happening causing issues.

By removing it the issue is indeed removed (but the test you added in the test suite isn't failing on main, and since this is what supposed to catch the regression, this is what I tried :) )

younesbelkada · 2023-06-21T10:26:35Z

I am happy to revert some of the changes I proposed and add yours, it looks much better. However I have few questions
1- is it ok to call that context manager if self.device is CPU? I think we need a check on top of that to make sure we're not on CPU (similarly as what we had before)

import torch
device = torch.device("cpu")

with torch.cuda.device(device):
    print(torch.randn(1))

Throws:

    raise ValueError('Expected a cuda device, but got: {}'.format(device))
ValueError: Expected a cuda device, but got: cpu

EDIT: just with torch.device(self.device) seems to work

2- I am not sure but I think the with device context manager is only available since PT2.0 no?

Narsil · 2023-06-21T10:38:27Z

2- I am not sure but I think the with device context manager is only available since PT2.0 no?

I don't know, all those are very good questions for which I don't have the answer to. I just know that now set_device is strongly discouraged so it's probably the source of our issues.

younesbelkada · 2023-06-21T10:57:10Z

Thanks !
I can confirm the context manager doesn't work for PT==1.9 which is should be supported by us:

Traceback (most recent call last):
  File "scratch.py", line 203, in <module>
    with torch.device(device):
AttributeError: __enter__

Therefore I just added some changes to ensure backward compatibility with older PT versions. WDYT?

younesbelkada · 2023-06-21T10:57:58Z

src/transformers/pipelines/base.py

@@ -793,11 +793,16 @@ def __init__(
            if isinstance(device, torch.device):
                self.device = device
            elif isinstance(device, str):
+                if device == "cuda":


Maybe?

Suggested change

if device == "cuda":

if device == "cuda" and not hasattr(torch.device, "__enter__"):

Just double checking, if this condition is true, does the line below run OK?

self.device = torch.device(device)

Yes I think so, there shouldn't be an issue , i.e. torch.device(f"cuda:{i}") should work as long as i<n_gpus

amyeroberts

I think the updates look good 👍

amyeroberts · 2023-06-21T11:32:46Z

src/transformers/pipelines/base.py

-                torch.cuda.set_device(self.device)
-
-            yield
+            if hasattr(torch.device, "__enter__"):


nit: we typically check for compatibility with flags like is_torch_greater_or_equal_than_2_0 in pytorch_utils.. It's a bit cleaner than checking for a private attribute and is clearer for the reader what's being checked

Perfect, will install different PT version and try to trck down from when the support has been added

Ok I can confirm it has been introduced on PT>=2.0.0

amyeroberts · 2023-06-21T11:35:53Z

src/transformers/pipelines/base.py

@@ -793,11 +793,16 @@ def __init__(
            if isinstance(device, torch.device):
                self.device = device
            elif isinstance(device, str):
+                if device == "cuda":


Just double checking, if this condition is true, does the line below run OK?

self.device = torch.device(device)

younesbelkada · 2023-06-21T15:41:00Z

Hi @Narsil
Let me know if the changes look all good to you, happy to address any additional comments you have

Narsil · 2023-06-21T15:53:01Z

May I attempt a different thing ?

I think the fix is correct, but I'm wondering if simply relying on torch.cuda.device context manager couldn't help remove the need for the compat layer.

younesbelkada · 2023-06-21T15:54:39Z

Sure yes!

Narsil · 2023-06-21T16:11:39Z

Cannot push

diff --git a/src/transformers/pipelines/base.py b/src/transformers/pipelines/base.py
index 626d33a3d..ee117e62a 100644
--- a/src/transformers/pipelines/base.py
+++ b/src/transformers/pipelines/base.py
@@ -50,7 +50,6 @@ if is_torch_available():
     from torch.utils.data import DataLoader, Dataset

     from ..models.auto.modeling_auto import AutoModel
-    from ..pytorch_utils import is_torch_greater_or_equal_than_2_0

     # Re-export for backward compatibility
     from .pt_utils import KeyDataset
@@ -794,16 +793,11 @@ class Pipeline(_ScikitCompat):
             if isinstance(device, torch.device):
                 self.device = device
             elif isinstance(device, str):
-                if device == "cuda" and not is_torch_greater_or_equal_than_2_0:
-                    # for backward compatiblity if using `set_device` and `cuda`
-                    device = f"cuda:{torch.cuda.current_device()}"
                 self.device = torch.device(device)
             elif device < 0:
                 self.device = torch.device("cpu")
-            elif isinstance(device, int):
-                self.device = torch.device(f"cuda:{device}")
             else:
-                raise ValueError(f"Device type not supported. Got {device}")
+                self.device = torch.device(f"cuda:{device}")
         else:
             self.device = device if device is not None else -1
         self.torch_dtype = torch_dtype
@@ -908,13 +902,10 @@ class Pipeline(_ScikitCompat):
             with tf.device("/CPU:0" if self.device == -1 else f"/device:GPU:{self.device}"):
                 yield
         else:
-            if is_torch_greater_or_equal_than_2_0:
-                with torch.device(self.device):
+            if self.device.type == "cuda":
+                with torch.cuda.device(self.device):
                     yield
-            # for backward compatibility
             else:
-                if self.device.type == "cuda":
-                    torch.cuda.set_device(self.device)
                 yield

Narsil · 2023-06-21T16:12:15Z

torch.cuda.device is defined for torch==1.9 so it should work.

And torch.device("cpu") ... well it's the default there's no need to context manage it.

younesbelkada · 2023-06-26T10:03:12Z

Hi @Narsil
I am not sure if with torch.cuda.device(self.device): is supported for torch<2.0

https://pytorch.org/tutorials/recipes/recipes/changing_default_device.html

Maybe we should merge this PR for now to unblock also @thomasw21 & @NouamaneTazi . what do you think?

thomasw21 · 2023-06-26T10:07:23Z

I don't think we're blocked by this.

And torch.device("cpu") ... well it's the default there's no need to context manage it.

Not sure of the context of this sentence, but we're overriding the default to cuda, so having a context manager to switch back to cpu makes sense to me.

Narsil · 2023-06-26T10:17:36Z

https://pytorch.org/docs/1.9.0/generated/torch.cuda.device.html?highlight=torch%20cuda%20device#torch.cuda.device

It is supported from 1.9.0+, at least in the docs.

younesbelkada · 2023-06-26T10:20:31Z

Great ! agreed with those changes

sgugger

Thanks for the fix!

fix str device issue

0a26c16

younesbelkada requested review from Narsil and amyeroberts June 21, 2023 07:51

fixup

4a90c66

amyeroberts approved these changes Jun 21, 2023

View reviewed changes

younesbelkada and others added 2 commits June 21, 2023 10:06

adapt from suggestions

ef37cda

forward contrib credits from suggestions

a631377

better fix

b438b4b

added backward compatibility for older PT versions

7d7adf3

younesbelkada commented Jun 21, 2023

View reviewed changes

amyeroberts approved these changes Jun 21, 2023

View reviewed changes

younesbelkada added 2 commits June 21, 2023 12:14

final fixes

407056e

oops

ceb4f9b

Attempting something with less branching.

b9a15d1

Merge remote-tracking branch 'upstream/main' into add-str-support

0fe6399

younesbelkada requested a review from sgugger June 26, 2023 10:27

sgugger approved these changes Jun 26, 2023

View reviewed changes

younesbelkada merged commit 914289a into huggingface:main Jun 26, 2023

younesbelkada deleted the add-str-support branch June 26, 2023 11:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`pipeline`] Fix str device issue #24396

[`pipeline`] Fix str device issue #24396

younesbelkada commented Jun 21, 2023

HuggingFaceDocBuilderDev commented Jun 21, 2023 •

edited

Loading

amyeroberts left a comment

amyeroberts Jun 21, 2023

amyeroberts Jun 21, 2023

Narsil Jun 21, 2023 •

edited

Loading

Narsil commented Jun 21, 2023

younesbelkada commented Jun 21, 2023 •

edited

Loading

Narsil commented Jun 21, 2023

Narsil commented Jun 21, 2023

younesbelkada commented Jun 21, 2023 •

edited

Loading

Narsil commented Jun 21, 2023

younesbelkada commented Jun 21, 2023

younesbelkada Jun 21, 2023

amyeroberts Jun 21, 2023

younesbelkada Jun 21, 2023 •

edited

Loading

amyeroberts left a comment

amyeroberts Jun 21, 2023

younesbelkada Jun 21, 2023

younesbelkada Jun 21, 2023

amyeroberts Jun 21, 2023

younesbelkada commented Jun 21, 2023

Narsil commented Jun 21, 2023

younesbelkada commented Jun 21, 2023

Narsil commented Jun 21, 2023

Narsil commented Jun 21, 2023

younesbelkada commented Jun 26, 2023 •

edited

Loading

thomasw21 commented Jun 26, 2023

Narsil commented Jun 26, 2023

younesbelkada commented Jun 26, 2023

sgugger left a comment

	if device == "cuda":
	if device == "cuda" and not hasattr(torch.device, "__enter__"):

[pipeline] Fix str device issue #24396

[pipeline] Fix str device issue #24396

Conversation

younesbelkada commented Jun 21, 2023

What does this PR do?

HuggingFaceDocBuilderDev commented Jun 21, 2023 • edited Loading

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Narsil Jun 21, 2023 • edited Loading

Choose a reason for hiding this comment

Narsil commented Jun 21, 2023

younesbelkada commented Jun 21, 2023 • edited Loading

Narsil commented Jun 21, 2023

Narsil commented Jun 21, 2023

younesbelkada commented Jun 21, 2023 • edited Loading

Narsil commented Jun 21, 2023

younesbelkada commented Jun 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

younesbelkada Jun 21, 2023 • edited Loading

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

younesbelkada commented Jun 21, 2023

Narsil commented Jun 21, 2023

younesbelkada commented Jun 21, 2023

Narsil commented Jun 21, 2023

Narsil commented Jun 21, 2023

younesbelkada commented Jun 26, 2023 • edited Loading

thomasw21 commented Jun 26, 2023

Narsil commented Jun 26, 2023

younesbelkada commented Jun 26, 2023

sgugger left a comment

Choose a reason for hiding this comment

[`pipeline`] Fix str device issue #24396

[`pipeline`] Fix str device issue #24396

HuggingFaceDocBuilderDev commented Jun 21, 2023 •

edited

Loading

Narsil Jun 21, 2023 •

edited

Loading

younesbelkada commented Jun 21, 2023 •

edited

Loading

younesbelkada commented Jun 21, 2023 •

edited

Loading

younesbelkada Jun 21, 2023 •

edited

Loading

younesbelkada commented Jun 26, 2023 •

edited

Loading