Device agnostic testing #25870

vvvm23 · 2023-08-30T14:17:54Z

What does this PR do?

Adds extra capabilities to testing_utils.py to support testing on devices besides from cuda and cpu without having to upstream device-specific changes.

This involves introducing some device agnostic functions that dispatch to specific backend functions. Users can specify new backends and backends for device agnostic functions via creating a device specification file and pointing the test suite to it using TRANSFORMERS_TEST_DEVICE_SPEC.

An example specification for a hypothetical CUDA device without support for torch.cuda.empty_cache could look like this:

import torch

# !! Specify additional imports here !!

# Specify the device name (eg. 'cuda', 'cpu')
DEVICE_NAME = 'cuda2'

# Specify device-specific backends to dispatch to.
# If not specified, will fallback to 'default' in 'testing_utils.py`
MANUAL_SEED_FN = torch.cuda.manual_seed
EMPTY_CACHE_FN = None
DEVICE_COUNT_FN = torch.cuda.device_count

By default, we have cpu and cuda backends available, so not to affect default behaviour.

We also introduce a new decorator @require_torch_accelerator which can be used to specify that a test needs an accelerator (but not necessarily a CUDA one).

Crucially, these changes should not change the behaviour of upstream CI runners. They aim to be as non-intrusive as possible and do not break compatibility with tests before these changes are made.

In this PR, only a subset of all tests are updated to support these new features at first. These are:

test_modeling_bloom – demonstrating usage of new @require_torch_accelerator
test_modeling_codegen – demonstrating usage of device agnostic function (accelerator_manual-seed)
test_modeling_opt – demonstrating another device agnostic function, this time to check whether the current device supports torch.float16
test_modeling_reformer – decorator version of the above.

Related #25654

TODO:

Write some documentation on TRANSFORMERS_TEST_DEVICE_SPEC (once we finalise the PR)
Additional checks and finding edge cases
Verify this PR does indeed have no effect on the Huggingface CI runners.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@ydshieh

ydshieh · 2023-09-01T13:57:40Z

Thanks, I will take a look @vvvm23

ydshieh · 2023-09-01T14:02:13Z

@vvvm23 Could you pull the latest main to your local clone and rebase your PR branch on top of your local main? Thanks!

bloom, opt, and reformer for now

if present, imports the target file and updates device to function mappings

vvvm23 · 2023-09-06T10:07:40Z

I rebased a few days ago, but realised I forgot to ping! Sorry @ydshieh!

ydshieh · 2023-09-06T11:43:46Z

No problem, I will take a look this week 🙏

ydshieh

Hi! Thank you for opening this PR. Overall it's good! I left a few nint comments.

I will have to check the failing tests though.

Have you tried to run the tests with a 3rd party device (with this PR of course)?

src/transformers/utils/import_utils.py

src/transformers/testing_utils.py

vvvm23 · 2023-09-14T09:56:04Z

Hey @ydshieh, just writing to let you know that @arsalanu will be picking up this PR on my behalf as I am changing jobs.

Please let me or him know if you have any further review comments 🤗

ydshieh · 2023-09-19T13:01:57Z

@vvvm23 well noted. Thank you for the contribution, and best wishes for your next adventure!

(I was just back from a break, will take a look and pin @arampacha for necessary changes if any)

arsalanu · 2023-10-02T17:58:59Z

Hi, as Alex mentioned, I’ll be taking over this PR. Just wanted to check in on the status of the review, please let me know when you can if there are any comments/further changes you’d like made 🙂

ydshieh · 2023-10-03T08:46:06Z

Hi @arsalanu

Toward the end of src/transformers/testing_utils.py: I think it would be less confusing if we use backend_ instead of accelerator_ (and similarly for ACCELERATOR_)

For example, instead of ACCELERATOR_MANUAL_SEED and accelerator_manual_seed, use the name BACKEND_MANUAL_SEED and backend_manual_seed.

If we check require_torch_accelerator excludes cpu, so it means we don't consider cpu as an accelerator. So it's somehow strange we can use accelerator_manual_seed with CPU.

And those methods are actually method for torch backends.

WDYT?

ydshieh · 2023-10-03T08:50:38Z

Other than this, it looks good to me. It would be great to see an example run on npu backend/device and make sure this PR works with it.

Finally, we should update the documentation here

transformers/docs/source/en/testing.md

Line 514 in b5ca8fc

### Testing with a specific PyTorch backend or device

In this PR, we don't have to apply all the new things like accelerator_manual_seed or require_torch_accelerator everywhere. We can make it simple, make it works as expected and merge (after approved by a core maintainer). Then we can apply the changes in a follow up PR.

arsalanu · 2023-10-03T16:10:22Z

Hi, I've updated the PR to rename the functions, I agree that backend_* makes more sense here. Also updated the docs to include an explanation of using the spec file.

arsalanu · 2023-10-19T11:27:26Z

Thanks, @statelesshz - I've made your changes to this PR.

@ydshieh when you can, could you please change this PR from a 'draft' to ready-for-review if you feel it is ready to be approved? (I don't have the access to do this) Thank you!

ydshieh · 2023-10-19T11:38:53Z

Sure. BTW, could you run make style and/or make quality to fix the code quality issue.
Or more simply,

black --check examples tests src utils

arsalanu · 2023-10-19T13:26:47Z

Looks like they're all passing now 👍

HuggingFaceDocBuilderDev · 2023-10-19T13:45:40Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

ydshieh · 2023-10-19T14:43:18Z

Hi @LysandreJik

This PR is ready for a review from core maintainers 🙏 . It enables the testing on different type of accelerators.

The only thing I see that might be a bit inconvenience is the file testing_utils.py might get (even) larger if we ever need to add more device specific methods: see below

If this is the case, we can move the actual definitions to a new file and import them from there to testing_utils.

transformers/src/transformers/testing_utils.py

Lines 2201 to 2281 in a1230d4

    
           def _device_agnostic_dispatch(device: str, dispatch_table: Dict[str, Callable], *args, **kwargs): 
        
               if device not in dispatch_table: 
        
                   return dispatch_table["default"](*args, **kwargs) 
        
               fn = dispatch_table[device] 
        
               # Some device agnostic functions return values. Need to guard against `None` 
        
               # instead at user level. 
        
               if fn is None: 
        
                   return None 
        
               return fn(*args, **kwargs) 
        
           if is_torch_available(): 
        
               # Mappings from device names to callable functions to support device agnostic 
        
               # testing. 
        
               BACKEND_MANUAL_SEED = {"cuda": torch.cuda.manual_seed, "cpu": torch.manual_seed, "default": torch.manual_seed} 
        
               BACKEND_EMPTY_CACHE = {"cuda": torch.cuda.empty_cache, "cpu": None, "default": None} 
        
               BACKEND_DEVICE_COUNT = {"cuda": torch.cuda.device_count, "cpu": lambda: 0, "default": lambda: 1} 
        
           def backend_manual_seed(device: str, seed: int): 
        
               return _device_agnostic_dispatch(device, BACKEND_MANUAL_SEED, seed) 
        
           def backend_empty_cache(device: str): 
        
               return _device_agnostic_dispatch(device, BACKEND_EMPTY_CACHE) 
        
           def backend_device_count(device: str): 
        
               return _device_agnostic_dispatch(device, BACKEND_DEVICE_COUNT) 
        
           if is_torch_available(): 
        
               # If `TRANSFORMERS_TEST_DEVICE_SPEC` is enabled we need to import extra entries 
        
               # into device to function mappings. 
        
               if "TRANSFORMERS_TEST_DEVICE_SPEC" in os.environ: 
        
                   device_spec_path = os.environ["TRANSFORMERS_TEST_DEVICE_SPEC"] 
        
                   if not Path(device_spec_path).is_file(): 
        
                       raise ValueError( 
        
                           f"Specified path to device spec file is not a file or not found. Received '{device_spec_path}" 
        
                       ) 
        
                   # Try to strip extension for later import – also verifies we are importing a 
        
                   # python file. 
        
                   try: 
        
                       import_name = device_spec_path[: device_spec_path.index(".py")] 
        
                   except ValueError as e: 
        
                       raise ValueError(f"Provided device spec file was not a Python file! Received '{device_spec_path}") from e 
        
                   device_spec_module = importlib.import_module(import_name) 
        
                   # Imported file must contain `DEVICE_NAME`. If it doesn't, terminate early. 
        
                   try: 
        
                       device_name = device_spec_module.DEVICE_NAME 
        
                   except AttributeError as e: 
        
                       raise AttributeError("Device spec file did not contain `DEVICE_NAME`") from e 
        
                   if "TRANSFORMERS_TEST_DEVICE" in os.environ and torch_device != device_name: 
        
                       msg = f"Mismatch between environment variable `TRANSFORMERS_TEST_DEVICE` '{torch_device}' and device found in spec '{device_name}'\n" 
        
                       msg += "Either unset `TRANSFORMERS_TEST_DEVICE` or ensure it matches device spec name." 
        
                       raise ValueError(msg) 
        
                   torch_device = device_name 
        
                   def update_mapping_from_spec(device_fn_dict: Dict[str, Callable], attribute_name: str): 
        
                       try: 
        
                           # Try to import the function directly 
        
                           spec_fn = getattr(device_spec_module, attribute_name) 
        
                           device_fn_dict[torch_device] = spec_fn 
        
                       except AttributeError as e: 
        
                           # If the function doesn't exist, and there is no default, throw an error 
        
                           if "default" not in device_fn_dict: 
        
                               raise AttributeError( 
        
                                   f"`{attribute_name}` not found in '{device_spec_path}' and no default fallback function found." 
        
                               ) from e 
        
                   # Add one entry here for each `BACKEND_*` dictionary. 
        
                   update_mapping_from_spec(BACKEND_MANUAL_SEED, "MANUAL_SEED_FN") 
        
                   update_mapping_from_spec(BACKEND_EMPTY_CACHE, "EMPTY_CACHE_FN") 
        
                   update_mapping_from_spec(BACKEND_DEVICE_COUNT, "DEVICE_COUNT_FN")

statelesshz · 2023-10-23T16:19:38Z

looking forward to this PR being accepted asap 🤗

LysandreJik

@fxmarty, given your work with other accelerators, do you have any feedback on this PR?

fxmarty

LGTM

fxmarty · 2023-10-24T09:31:14Z

tests/models/opt/test_modeling_opt.py

@@ -291,7 +292,7 @@ def test_generate_fp16(self):
        input_ids = input_dict["input_ids"]
        attention_mask = input_ids.ne(1).to(torch_device)
        model = OPTForCausalLM(config).eval().to(torch_device)
-        if torch_device == "cuda":
+        if is_torch_fp16_available_on_device(torch_device):


This test should probably just be skipped if the device does not support fp16 (alternatively adding @require_torch_fp16).

cc @arsalanu

LysandreJik

After fixing @fxmarty's feedback, LGTM & merge ahead

arsalanu · 2023-10-24T13:16:22Z

Thank you @LysandreJik, @fxmarty for the look-over. I'm just waiting for the tests to pass and then will merge.

arsalanu · 2023-10-24T14:13:38Z

@ydshieh (or @vvvm23!) I don't have the access to merge this. It looks ready to go, could one of you click the button?

Reminder to include @statelesshz as a contributer :)

vvvm23 · 2023-10-24T14:32:47Z

I don't have the ability sadly 😅

ydshieh · 2023-10-24T14:50:30Z

Thank you again @vvvm23 @arsalanu and @statelesshz

(and especially reminding me to add @statelesshz as coauthor!)

vvvm23 · 2023-10-24T15:25:11Z

Email for @statelesshz seems right but name is wrong in the commit 🤔

ydshieh · 2023-10-24T18:30:39Z

My bad 😭 sorry. But I think GitHub counts with the email, so @statelesshz still appears

@statelesshz

* adds agnostic decorators and availability fns * renaming decorators and fixing imports * updating some representative example tests bloom, opt, and reformer for now * wip device agnostic functions * lru cache to device checking functions * adds `TRANSFORMERS_TEST_DEVICE_SPEC` if present, imports the target file and updates device to function mappings * comments `TRANSFORMERS_TEST_DEVICE_SPEC` code * extra checks on device name * `make style; make quality` * updates default functions for agnostic calls * applies suggestions from review * adds `is_torch_available` guard * Add spec file to docs, rename function dispatch names to backend_* * add backend import to docs example for spec file * change instances of to * Move register backend to before device check as per @statelesshz changes * make style * make opt test require fp16 to run --------- Co-authored-by: arsalanu <arsalanu@graphcore.ai> Co-authored-by: arsalanu <hzji210@gmail.com>

@statelesshz

* adds agnostic decorators and availability fns * renaming decorators and fixing imports * updating some representative example tests bloom, opt, and reformer for now * wip device agnostic functions * lru cache to device checking functions * adds `TRANSFORMERS_TEST_DEVICE_SPEC` if present, imports the target file and updates device to function mappings * comments `TRANSFORMERS_TEST_DEVICE_SPEC` code * extra checks on device name * `make style; make quality` * updates default functions for agnostic calls * applies suggestions from review * adds `is_torch_available` guard * Add spec file to docs, rename function dispatch names to backend_* * add backend import to docs example for spec file * change instances of to * Move register backend to before device check as per @statelesshz changes * make style * make opt test require fp16 to run --------- Co-authored-by: arsalanu <arsalanu@graphcore.ai> Co-authored-by: arsalanu <hzji210@gmail.com>

@statelesshz

* adds agnostic decorators and availability fns * renaming decorators and fixing imports * updating some representative example tests bloom, opt, and reformer for now * wip device agnostic functions * lru cache to device checking functions * adds `TRANSFORMERS_TEST_DEVICE_SPEC` if present, imports the target file and updates device to function mappings * comments `TRANSFORMERS_TEST_DEVICE_SPEC` code * extra checks on device name * `make style; make quality` * updates default functions for agnostic calls * applies suggestions from review * adds `is_torch_available` guard * Add spec file to docs, rename function dispatch names to backend_* * add backend import to docs example for spec file * change instances of to * Move register backend to before device check as per @statelesshz changes * make style * make opt test require fp16 to run --------- Co-authored-by: arsalanu <arsalanu@graphcore.ai> Co-authored-by: arsalanu <hzji210@gmail.com>

ydshieh self-requested a review September 1, 2023 13:57

vvvm23 added 10 commits September 1, 2023 15:19

adds agnostic decorators and availability fns

1327ab7

renaming decorators and fixing imports

98bb8fc

updating some representative example tests

9f4c9ef

bloom, opt, and reformer for now

wip device agnostic functions

d09ea4b

lru cache to device checking functions

2c9afb4

adds TRANSFORMERS_TEST_DEVICE_SPEC

4d12550

if present, imports the target file and updates device to function mappings

comments TRANSFORMERS_TEST_DEVICE_SPEC code

31c61ed

extra checks on device name

686d8f2

make style; make quality

7c9e892

updates default functions for agnostic calls

c961bd6

vvvm23 force-pushed the device-agnostic-testing branch from dc6c9e7 to c961bd6 Compare September 1, 2023 15:19

ydshieh self-assigned this Sep 7, 2023

ydshieh reviewed Sep 8, 2023

View reviewed changes

src/transformers/testing_utils.py Outdated Show resolved Hide resolved

vvvm23 added 2 commits September 8, 2023 11:48

applies suggestions from review

a582f93

adds is_torch_available guard

bc65a0e

ydshieh assigned ydshieh and unassigned ydshieh Sep 19, 2023

Add spec file to docs, rename function dispatch names to backend_*

d0e5443

vvvm23 marked this pull request as ready for review October 19, 2023 11:29

make style

a1230d4

ydshieh requested a review from LysandreJik October 19, 2023 14:37

ydshieh approved these changes Oct 24, 2023

View reviewed changes

LysandreJik reviewed Oct 24, 2023

View reviewed changes

fxmarty approved these changes Oct 24, 2023

View reviewed changes

LysandreJik approved these changes Oct 24, 2023

View reviewed changes

make opt test require fp16 to run

d696b55

ydshieh merged commit 9da4517 into huggingface:main Oct 24, 2023
22 checks passed

ydshieh changed the title ~~Draft PR for Device agnostic testing.~~ Device agnostic testing Oct 24, 2023

ydshieh mentioned this pull request Oct 27, 2023

make tests of pytorch_example device agnostic #27081

Merged

5 tasks

This was referenced Oct 27, 2023

Make 🤗 Diffusers tests device-agnostic huggingface/diffusers#5562

Closed

Device agnostic testing huggingface/diffusers#5612

Merged

This was referenced Nov 5, 2023

Make 🤗 Accelerate tests device-agnostic huggingface/accelerate#2122

Closed

device agnostic testing huggingface/accelerate#2123

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Device agnostic testing #25870

Device agnostic testing #25870

vvvm23 commented Aug 30, 2023 •

edited

Loading

ydshieh commented Sep 1, 2023

ydshieh commented Sep 1, 2023 •

edited

Loading

vvvm23 commented Sep 6, 2023

ydshieh commented Sep 6, 2023

ydshieh left a comment

vvvm23 commented Sep 14, 2023

ydshieh commented Sep 19, 2023

arsalanu commented Oct 2, 2023

ydshieh commented Oct 3, 2023

ydshieh commented Oct 3, 2023 •

edited

Loading

arsalanu commented Oct 3, 2023

arsalanu commented Oct 19, 2023

ydshieh commented Oct 19, 2023

arsalanu commented Oct 19, 2023

HuggingFaceDocBuilderDev commented Oct 19, 2023

ydshieh commented Oct 19, 2023

statelesshz commented Oct 23, 2023 •

edited

Loading

LysandreJik left a comment

fxmarty left a comment

fxmarty Oct 24, 2023

statelesshz Oct 24, 2023

LysandreJik left a comment

arsalanu commented Oct 24, 2023

arsalanu commented Oct 24, 2023

vvvm23 commented Oct 24, 2023

ydshieh commented Oct 24, 2023

vvvm23 commented Oct 24, 2023

ydshieh commented Oct 24, 2023 •

edited

Loading

Device agnostic testing #25870

Device agnostic testing #25870

Conversation

vvvm23 commented Aug 30, 2023 • edited Loading

What does this PR do?

Before submitting

Who can review?

ydshieh commented Sep 1, 2023

ydshieh commented Sep 1, 2023 • edited Loading

vvvm23 commented Sep 6, 2023

ydshieh commented Sep 6, 2023

ydshieh left a comment

Choose a reason for hiding this comment

vvvm23 commented Sep 14, 2023

ydshieh commented Sep 19, 2023

arsalanu commented Oct 2, 2023

ydshieh commented Oct 3, 2023

ydshieh commented Oct 3, 2023 • edited Loading

arsalanu commented Oct 3, 2023

arsalanu commented Oct 19, 2023

ydshieh commented Oct 19, 2023

arsalanu commented Oct 19, 2023

HuggingFaceDocBuilderDev commented Oct 19, 2023

ydshieh commented Oct 19, 2023

statelesshz commented Oct 23, 2023 • edited Loading

LysandreJik left a comment

Choose a reason for hiding this comment

fxmarty left a comment

Choose a reason for hiding this comment

fxmarty Oct 24, 2023

Choose a reason for hiding this comment

statelesshz Oct 24, 2023

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

arsalanu commented Oct 24, 2023

arsalanu commented Oct 24, 2023

vvvm23 commented Oct 24, 2023

ydshieh commented Oct 24, 2023

vvvm23 commented Oct 24, 2023

ydshieh commented Oct 24, 2023 • edited Loading

vvvm23 commented Aug 30, 2023 •

edited

Loading

ydshieh commented Sep 1, 2023 •

edited

Loading

ydshieh commented Oct 3, 2023 •

edited

Loading

statelesshz commented Oct 23, 2023 •

edited

Loading

ydshieh commented Oct 24, 2023 •

edited

Loading