-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make 🤗Transformers tests device agnostic #25654
Comments
See #25655 #25506 huggingface/diffusers#4673 for further context and existing changes to help make testing device agnostic. |
cc @ydshieh |
Hi @vvvm23 Thank you for this proposal! This sound an impactful thing to do 🚀 ! For a draft PR, it would be very nice to keep the change as minimal as possible, especially not to change all the tests but just a few tests so we can see how things work and how them are applied to those tests. 🙏 |
That sounds reasonable to start with. I'll pick a few tests that cover all the proposed features (certain models are simpler than others, and won't highlight all the changes required). I'll try and get a draft PR by end of this week. |
Hi @ydshieh, can I get your thoughts on a couple design choices? First, certain device agnostic checks are as simple as simply trying to use the specified feature. For example, to test whether a specific device can use fp16, try and execute an op in half precision, and catch the exception. Does this approach work for you? A similar thing is already done to check if the device exists (attempt to create device, catch the exception) and could work here too. For more complex device agnostic functions (such as clearing cache, setting PRNG) we will need a way to define new devices without having to upstream the device itself. I was thinking of going for the following approach:
If a specific entry does not exist in How does that approach sound to you? Any suggestions that would help make the PR fit into the current HF testing structure? Thanks~ |
Hi @vvvm23 Before I take a deeper read, could you provide the comment along with some links to the code base, so I can understand easier. For example:
Although I am the one within the team focus on the testing, many of the tests are written before I joined. So a bit more detailed description would be easier for me to give my thoughts 🙏 please (I know it will takes you some more time to write). Thank you in advance! |
No worries, I wrote my previous comment in a bit of a rush, so in retrospect it wasn't too clear.
Please see these PRs which make this change -> #25506 huggingface/diffusers#4673
For example, in
torch_device .
Another example is here
One example can be found here
Suppose we implement the function However, if we try a custom device, the function won't know which function to dispatch to. This isn't the responsibility of Huggingface to solve (as there could be countless custom devices) but rather the user to register their device and the function to use when we call I have begun work on this here but it isn't very robust yet. Let me know if you need me to clear anything else up~ |
Hi, Thanks for the writing-up! When I read the Note the design of Regarding But maybe we can start the task easily and not using external file. Just put everything inside |
This is a fair point. I noticed in some other functions that check for device availability, they are wrapped in
To clarify, I am only intending one device to be used for a single set of tests, so there won't be countless devices in use in a single session.
For our purposes (and I am assuming others too) the main point is to be able to specifiy a new backend or device without having to upstream anything into HF. So if we put everything inside |
Yeah, I agree. I would say something defined in HF (only for |
Yep! That was my original plan, we don't want to put any burden on HF into maintaining additional devices that are only known to a small set of people 👍 We will have the definitions for |
@ydshieh please see the above draft PR 🙂 I am hoping the CI passes without issue as there should be no effect on Huggingface CI runners with these changes. |
The proof-of-concept for device-agnostic testing has been merged into the master branch 🎉. And I'd like to use this issue as a centralized place to list and track work on making the rest of the testing suites device agnostic.
|
Feature request
We would like to make the testing suite in this repository more device agnostic. It seems there has already been some work towards this, however the majority of tests will still only run on either GPU or CPU. This would require a number of changes to all tests present in the library, however it would not alter the behaviour of Huggingface's CI runners.
A non-exhaustive list of changes would be:
@require_torch_with_accelerator
that largerly supersedes (but does not replace)@require_torch_gpu
. This new decorator can be used for any test that is device agnostic that we would like to accelerate. We would keep@require_torch_gpu
for tests that truly require CUDA features, such as ones that check device memory utilisation (such as in model parallelism or lower precision tests) or use custom CUDA kernels (such as Flash Attention).testing_utils.py
that compare the current device in use and dispatch to the appropriate backend specific function if available.torch_device == 'cuda'
to check if we can run with fp16, we could call a functiontesting_utils.accelerator_is_fp16_available(torch_device)
or similar. Similar functions already exist to check for tf32 or bf16 support.Motivation
As Huggingface libraries and models make up a significant part of the current ML community, it makes sense when developing custom PyTorch backends to test against these model libraries as they cover a large proportion of the most users' use cases.
However, the current testing suite does not easily allow for custom devices – not without maintaining a custom private fork that needs to be continuously kept up to date with the upstream repository. This reason, and because the number of changes required is not especially significant, is why we are making this proposal.
Your contribution
We would write and submit a PR to implement these changes following discussion and approval with 🤗Transformers maintainers.
I am collaborating with @joshlk and @arsalanu
The text was updated successfully, but these errors were encountered: