Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device agnostic testing #25870

Merged
merged 19 commits into from
Oct 24, 2023
Merged

Conversation

vvvm23
Copy link
Contributor

@vvvm23 vvvm23 commented Aug 30, 2023

What does this PR do?

Adds extra capabilities to testing_utils.py to support testing on devices besides from cuda and cpu without having to upstream device-specific changes.

This involves introducing some device agnostic functions that dispatch to specific backend functions. Users can specify new backends and backends for device agnostic functions via creating a device specification file and pointing the test suite to it using TRANSFORMERS_TEST_DEVICE_SPEC.

An example specification for a hypothetical CUDA device without support for torch.cuda.empty_cache could look like this:

import torch

# !! Specify additional imports here !!

# Specify the device name (eg. 'cuda', 'cpu')
DEVICE_NAME = 'cuda2'

# Specify device-specific backends to dispatch to.
# If not specified, will fallback to 'default' in 'testing_utils.py`
MANUAL_SEED_FN = torch.cuda.manual_seed
EMPTY_CACHE_FN = None
DEVICE_COUNT_FN = torch.cuda.device_count

By default, we have cpu and cuda backends available, so not to affect default behaviour.

We also introduce a new decorator @require_torch_accelerator which can be used to specify that a test needs an accelerator (but not necessarily a CUDA one).

Crucially, these changes should not change the behaviour of upstream CI runners. They aim to be as non-intrusive as possible and do not break compatibility with tests before these changes are made.

In this PR, only a subset of all tests are updated to support these new features at first. These are:

  • test_modeling_bloom – demonstrating usage of new @require_torch_accelerator
  • test_modeling_codegen – demonstrating usage of device agnostic function (accelerator_manual-seed)
  • test_modeling_opt – demonstrating another device agnostic function, this time to check whether the current device supports torch.float16
  • test_modeling_reformer – decorator version of the above.

Related #25654

TODO:

  • Write some documentation on TRANSFORMERS_TEST_DEVICE_SPEC (once we finalise the PR)
  • Additional checks and finding edge cases
  • Verify this PR does indeed have no effect on the Huggingface CI runners.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@ydshieh

@ydshieh ydshieh self-requested a review September 1, 2023 13:57
@ydshieh
Copy link
Collaborator

ydshieh commented Sep 1, 2023

Thanks, I will take a look @vvvm23

@ydshieh
Copy link
Collaborator

ydshieh commented Sep 1, 2023

@vvvm23 Could you pull the latest main to your local clone and rebase your PR branch on top of your local main? Thanks!

@vvvm23
Copy link
Contributor Author

vvvm23 commented Sep 6, 2023

I rebased a few days ago, but realised I forgot to ping! Sorry @ydshieh!

@ydshieh
Copy link
Collaborator

ydshieh commented Sep 6, 2023

No problem, I will take a look this week 🙏

@ydshieh ydshieh self-assigned this Sep 7, 2023
Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Thank you for opening this PR. Overall it's good! I left a few nint comments.

I will have to check the failing tests though.

Have you tried to run the tests with a 3rd party device (with this PR of course)?

src/transformers/utils/import_utils.py Show resolved Hide resolved
src/transformers/utils/import_utils.py Outdated Show resolved Hide resolved
src/transformers/testing_utils.py Outdated Show resolved Hide resolved
src/transformers/testing_utils.py Outdated Show resolved Hide resolved
src/transformers/testing_utils.py Outdated Show resolved Hide resolved
src/transformers/testing_utils.py Outdated Show resolved Hide resolved
@vvvm23
Copy link
Contributor Author

vvvm23 commented Sep 14, 2023

Hey @ydshieh, just writing to let you know that @arsalanu will be picking up this PR on my behalf as I am changing jobs.

Please let me or him know if you have any further review comments 🤗

@ydshieh
Copy link
Collaborator

ydshieh commented Sep 19, 2023

@vvvm23 well noted. Thank you for the contribution, and best wishes for your next adventure!

(I was just back from a break, will take a look and pin @arampacha for necessary changes if any)

@ydshieh ydshieh assigned ydshieh and unassigned ydshieh Sep 19, 2023
@arsalanu
Copy link
Contributor

arsalanu commented Oct 2, 2023

Hi, as Alex mentioned, I’ll be taking over this PR. Just wanted to check in on the status of the review, please let me know when you can if there are any comments/further changes you’d like made 🙂

@ydshieh
Copy link
Collaborator

ydshieh commented Oct 3, 2023

Hi @arsalanu

Toward the end of src/transformers/testing_utils.py: I think it would be less confusing if we use backend_ instead of accelerator_ (and similarly for ACCELERATOR_)

For example, instead of ACCELERATOR_MANUAL_SEED and accelerator_manual_seed, use the name BACKEND_MANUAL_SEED and backend_manual_seed.

If we check require_torch_accelerator excludes cpu, so it means we don't consider cpu as an accelerator. So it's somehow strange we can use accelerator_manual_seed with CPU.

And those methods are actually method for torch backends.

WDYT?

@ydshieh
Copy link
Collaborator

ydshieh commented Oct 3, 2023

Other than this, it looks good to me. It would be great to see an example run on npu backend/device and make sure this PR works with it.

Finally, we should update the documentation here

### Testing with a specific PyTorch backend or device

In this PR, we don't have to apply all the new things like accelerator_manual_seed or require_torch_accelerator everywhere. We can make it simple, make it works as expected and merge (after approved by a core maintainer). Then we can apply the changes in a follow up PR.

@arsalanu
Copy link
Contributor

arsalanu commented Oct 3, 2023

Hi, I've updated the PR to rename the functions, I agree that backend_* makes more sense here. Also updated the docs to include an explanation of using the spec file.

@arsalanu
Copy link
Contributor

Thanks, @statelesshz - I've made your changes to this PR.

@ydshieh when you can, could you please change this PR from a 'draft' to ready-for-review if you feel it is ready to be approved? (I don't have the access to do this) Thank you!

@vvvm23 vvvm23 marked this pull request as ready for review October 19, 2023 11:29
@ydshieh
Copy link
Collaborator

ydshieh commented Oct 19, 2023

Sure. BTW, could you run make style and/or make quality to fix the code quality issue.
Or more simply,

black --check examples tests src utils

@arsalanu
Copy link
Contributor

Looks like they're all passing now 👍

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@ydshieh
Copy link
Collaborator

ydshieh commented Oct 19, 2023

Hi @LysandreJik

This PR is ready for a review from core maintainers 🙏 . It enables the testing on different type of accelerators.

The only thing I see that might be a bit inconvenience is the file testing_utils.py might get (even) larger if we ever need to add more device specific methods: see below

If this is the case, we can move the actual definitions to a new file and import them from there to testing_utils.

def _device_agnostic_dispatch(device: str, dispatch_table: Dict[str, Callable], *args, **kwargs):
if device not in dispatch_table:
return dispatch_table["default"](*args, **kwargs)
fn = dispatch_table[device]
# Some device agnostic functions return values. Need to guard against `None`
# instead at user level.
if fn is None:
return None
return fn(*args, **kwargs)
if is_torch_available():
# Mappings from device names to callable functions to support device agnostic
# testing.
BACKEND_MANUAL_SEED = {"cuda": torch.cuda.manual_seed, "cpu": torch.manual_seed, "default": torch.manual_seed}
BACKEND_EMPTY_CACHE = {"cuda": torch.cuda.empty_cache, "cpu": None, "default": None}
BACKEND_DEVICE_COUNT = {"cuda": torch.cuda.device_count, "cpu": lambda: 0, "default": lambda: 1}
def backend_manual_seed(device: str, seed: int):
return _device_agnostic_dispatch(device, BACKEND_MANUAL_SEED, seed)
def backend_empty_cache(device: str):
return _device_agnostic_dispatch(device, BACKEND_EMPTY_CACHE)
def backend_device_count(device: str):
return _device_agnostic_dispatch(device, BACKEND_DEVICE_COUNT)
if is_torch_available():
# If `TRANSFORMERS_TEST_DEVICE_SPEC` is enabled we need to import extra entries
# into device to function mappings.
if "TRANSFORMERS_TEST_DEVICE_SPEC" in os.environ:
device_spec_path = os.environ["TRANSFORMERS_TEST_DEVICE_SPEC"]
if not Path(device_spec_path).is_file():
raise ValueError(
f"Specified path to device spec file is not a file or not found. Received '{device_spec_path}"
)
# Try to strip extension for later import – also verifies we are importing a
# python file.
try:
import_name = device_spec_path[: device_spec_path.index(".py")]
except ValueError as e:
raise ValueError(f"Provided device spec file was not a Python file! Received '{device_spec_path}") from e
device_spec_module = importlib.import_module(import_name)
# Imported file must contain `DEVICE_NAME`. If it doesn't, terminate early.
try:
device_name = device_spec_module.DEVICE_NAME
except AttributeError as e:
raise AttributeError("Device spec file did not contain `DEVICE_NAME`") from e
if "TRANSFORMERS_TEST_DEVICE" in os.environ and torch_device != device_name:
msg = f"Mismatch between environment variable `TRANSFORMERS_TEST_DEVICE` '{torch_device}' and device found in spec '{device_name}'\n"
msg += "Either unset `TRANSFORMERS_TEST_DEVICE` or ensure it matches device spec name."
raise ValueError(msg)
torch_device = device_name
def update_mapping_from_spec(device_fn_dict: Dict[str, Callable], attribute_name: str):
try:
# Try to import the function directly
spec_fn = getattr(device_spec_module, attribute_name)
device_fn_dict[torch_device] = spec_fn
except AttributeError as e:
# If the function doesn't exist, and there is no default, throw an error
if "default" not in device_fn_dict:
raise AttributeError(
f"`{attribute_name}` not found in '{device_spec_path}' and no default fallback function found."
) from e
# Add one entry here for each `BACKEND_*` dictionary.
update_mapping_from_spec(BACKEND_MANUAL_SEED, "MANUAL_SEED_FN")
update_mapping_from_spec(BACKEND_EMPTY_CACHE, "EMPTY_CACHE_FN")
update_mapping_from_spec(BACKEND_DEVICE_COUNT, "DEVICE_COUNT_FN")

@statelesshz
Copy link
Contributor

statelesshz commented Oct 23, 2023

looking forward to this PR being accepted asap 🤗

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fxmarty, given your work with other accelerators, do you have any feedback on this PR?

Copy link
Contributor

@fxmarty fxmarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -291,7 +292,7 @@ def test_generate_fp16(self):
input_ids = input_dict["input_ids"]
attention_mask = input_ids.ne(1).to(torch_device)
model = OPTForCausalLM(config).eval().to(torch_device)
if torch_device == "cuda":
if is_torch_fp16_available_on_device(torch_device):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should probably just be skipped if the device does not support fp16 (alternatively adding @require_torch_fp16).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After fixing @fxmarty's feedback, LGTM & merge ahead

@arsalanu
Copy link
Contributor

Thank you @LysandreJik, @fxmarty for the look-over. I'm just waiting for the tests to pass and then will merge.

@arsalanu
Copy link
Contributor

@ydshieh (or @vvvm23!) I don't have the access to merge this. It looks ready to go, could one of you click the button?

Reminder to include @statelesshz as a contributer :)

@vvvm23
Copy link
Contributor Author

vvvm23 commented Oct 24, 2023

I don't have the ability sadly 😅

@ydshieh ydshieh merged commit 9da4517 into huggingface:main Oct 24, 2023
22 checks passed
@ydshieh ydshieh changed the title Draft PR for Device agnostic testing. Device agnostic testing Oct 24, 2023
@ydshieh
Copy link
Collaborator

ydshieh commented Oct 24, 2023

Thank you again @vvvm23 @arsalanu and @statelesshz

(and especially reminding me to add @statelesshz as coauthor!)

@vvvm23
Copy link
Contributor Author

vvvm23 commented Oct 24, 2023

image
Email for @statelesshz seems right but name is wrong in the commit 🤔

@ydshieh
Copy link
Collaborator

ydshieh commented Oct 24, 2023

My bad 😭 sorry. But I think GitHub counts with the email, so @statelesshz still appears

Screenshot 2023-10-24 202831

staghado pushed a commit to staghado/transformers that referenced this pull request Oct 24, 2023
* adds agnostic decorators and availability fns

* renaming decorators and fixing imports

* updating some representative example tests
bloom, opt, and reformer for now

* wip device agnostic functions

* lru cache to device checking functions

* adds `TRANSFORMERS_TEST_DEVICE_SPEC`
if present, imports the target file and updates device to function
mappings

* comments `TRANSFORMERS_TEST_DEVICE_SPEC` code

* extra checks on device name

* `make style; make quality`

* updates default functions for agnostic calls

* applies suggestions from review

* adds `is_torch_available` guard

* Add spec file to docs, rename function dispatch names to backend_*

* add backend import to docs example for spec file

* change instances of  to

* Move register backend to before device check as per @statelesshz changes

* make style

* make opt test require fp16 to run

---------

Co-authored-by: arsalanu <arsalanu@graphcore.ai>
Co-authored-by: arsalanu <hzji210@gmail.com>
i4never pushed a commit to i4never/transformers that referenced this pull request Oct 25, 2023
* adds agnostic decorators and availability fns

* renaming decorators and fixing imports

* updating some representative example tests
bloom, opt, and reformer for now

* wip device agnostic functions

* lru cache to device checking functions

* adds `TRANSFORMERS_TEST_DEVICE_SPEC`
if present, imports the target file and updates device to function
mappings

* comments `TRANSFORMERS_TEST_DEVICE_SPEC` code

* extra checks on device name

* `make style; make quality`

* updates default functions for agnostic calls

* applies suggestions from review

* adds `is_torch_available` guard

* Add spec file to docs, rename function dispatch names to backend_*

* add backend import to docs example for spec file

* change instances of  to

* Move register backend to before device check as per @statelesshz changes

* make style

* make opt test require fp16 to run

---------

Co-authored-by: arsalanu <arsalanu@graphcore.ai>
Co-authored-by: arsalanu <hzji210@gmail.com>
EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 19, 2023
* adds agnostic decorators and availability fns

* renaming decorators and fixing imports

* updating some representative example tests
bloom, opt, and reformer for now

* wip device agnostic functions

* lru cache to device checking functions

* adds `TRANSFORMERS_TEST_DEVICE_SPEC`
if present, imports the target file and updates device to function
mappings

* comments `TRANSFORMERS_TEST_DEVICE_SPEC` code

* extra checks on device name

* `make style; make quality`

* updates default functions for agnostic calls

* applies suggestions from review

* adds `is_torch_available` guard

* Add spec file to docs, rename function dispatch names to backend_*

* add backend import to docs example for spec file

* change instances of  to

* Move register backend to before device check as per @statelesshz changes

* make style

* make opt test require fp16 to run

---------

Co-authored-by: arsalanu <arsalanu@graphcore.ai>
Co-authored-by: arsalanu <hzji210@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants