-
Notifications
You must be signed in to change notification settings - Fork 27.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix skip of test_training_gradient_checkpointing #34723
Conversation
@amyeroberts : test failures seem unrelated. Please, advice when I should rebase/try again?
|
19d58d3 has introduced a context manager to manage subtests of test_training_gradient_checkpointing. However, test body was not moved under "with" statement. Thus, while tests are correctly marked as skipped, test bodies were still executed. In some cases, as with llama this caused attribute errors. Fixes: huggingface#34722 Fixes: 19d58d3 ("Add MLLama (huggingface#33703)") Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
@amyeroberts : rerun tests, no side fails now. Please, help review. |
Hi @dvrogozh, thank you for the PR. When I run things like
I see
and I also check the body after the Could you provide us in which cases it will enter that body while |
Could you set breakpoint(s) to check which Also, is |
Shortly: if @ydshieh : this seems interesting. I was wondering why HF ci does not see this issue. Your comment above also suggests that you don't see the issue on your side. It seems I've found the reason. I have 2 systems, one with XPU, another with CUDA. Initially I saw issue on both systems. Yesterday, I fully cleaned and reinstall environment for XPU and issue was gone. So, the issue which I observe is triggered by environment difference. In particular, it shows up if the following package is installed: |
Most likely I got |
Most likely no. I am just building pytorch from sources on my side since I on purposely look for XPU backend issues in the most recent code. I did not try to check other pytorch versions, but the issue I see does not seem to be related to pytorch. |
Hi thanks for the information. Indeed, our CI env. doesn't have I will check it and see how a fix would be. Currently, without |
This sounds like we need to add |
As discussed in [1], pytest-subtests changes behavior of .skipTest() having effect to really skip individual subtests or skip the entire test if module is not installed. Huggingface Accelerate has module in its dependencies. It makes sense to add it for Transformers as well to avoid divergent environment between users and ci. See[1]: huggingface#34723 (comment) Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
As discussed in [1], pytest-subtests changes behavior of .skipTest() having effect to really skip individual subtests or skip the entire test if module is not installed. Huggingface Accelerate has module in its dependencies. It makes sense to add it for Transformers as well to avoid divergent environment between users and ci. See[1]: huggingface#34723 (comment) Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Although I think installing
fails with
while change
I am not sure I am doing stupid things 😭 |
without
|
with
So, it loops thru all 6 cases. It correctly skips first 3, but after that I don't understand what's going on. It should have reported 4 failures, but it reported 1 failure, 1 pass and ate up 2 others. And in the end only printed assertion failure for the last iteration :). |
Yeah. Maybe it's a bug. I will open an issue. |
You mean without
output:
|
yes, sorry. I mean |
@ydshieh : I filed #34755 to specifically consider what to do with subtests story. See breakdown on how it works in different cases. Really confusing... As for the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving this up to you @ydshieh 🤗 I think this one makes sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite the weired issue from the pytest-subtests
with subTest + skipTest
, this PR itself makes sense.
Furthermore, it doesn't change the current behavior when pytest-subtests
is not installed (although that behavior is not desirable).
Therefore LGTM to merge.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
19d58d3 has introduced a context manager to manage subtests of test_training_gradient_checkpointing. However, test body was not moved under "with" statement. Thus, while tests are correctly marked as skipped, test bodies were still executed. In some cases, as with llama this caused attribute errors. Fixes: huggingface#34722 Fixes: 19d58d3 ("Add MLLama (huggingface#33703)") Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
19d58d3 has introduced a context manager to manage subtests of test_training_gradient_checkpointing. However, test body was not moved under "with" statement. Thus, while tests are correctly marked as skipped, test bodies were still executed. In some cases, as with llama this caused attribute errors. Fixes: huggingface#34722 Fixes: 19d58d3 ("Add MLLama (huggingface#33703)") Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
19d58d3 has introduced a context manager to manage subtests of test_training_gradient_checkpointing. However, test body was not moved under the "with" statement. Thus, while tests are correctly marked as skipped, test bodies were still executed. In some cases, as with llama this caused attribute errors.
Fixes: #34722
Fixes: 19d58d3 ("Add MLLama (#33703)")
CC: @amyeroberts, @ArthurZucker