Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hotfix PyTorch Version Installation in CI Workflow for Minimum Version Matrix #2889

Merged
merged 2 commits into from
Jul 15, 2024

Conversation

yhna940
Copy link
Contributor

@yhna940 yhna940 commented Jun 25, 2024

What does this PR do?

ci-workflow

This PR corrects the CI workflow to ensure that the minimum specified PyTorch version (1.10.0) is installed when the matrix.pytorch-version is set to minimum. Previously, the installation condition incorrectly referenced matrix.test-kind, leading to the installation of the latest PyTorch version instead.

Changes Made

  • Corrected the conditional check for installing PyTorch 1.10.0 by changing matrix.test-kind to matrix.pytorch-version.

Before

if [[ ${{ matrix.test-kind }} = minimum ]]; then pip install torch==1.10.0; fi

After

if [[ ${{ matrix.pytorch-version }} = minimum ]]; then pip install torch==1.10.0; fi

Motivation and Context

This change ensures that the CI workflow correctly installs PyTorch 1.10.0 when the matrix.pytorch-version is set to minimum. This avoids any inconsistencies and ensures that tests are run against the intended version of PyTorch.

Related Issues

N/A

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • Did you write any new necessary tests?

@yhna940 yhna940 changed the title HotFix PyTorch Version Installation in CI Workflow for Minimum Version Matrix Hotfix PyTorch Version Installation in CI Workflow for Minimum Version Matrix Jun 25, 2024
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@BenjaminBossan
Copy link
Member

I was afraid that something like this would happen: Since CI did not correctly install the old PyTorch version, there seems to be some code that indeed fails with the CI because it requires a more recent version. So for this fix to pass, we would also have to fix all the failing tests (which may or may not be easy).

However, I wonder if we really want to support PyTorch 1.10.0, which is from Oct 2021. When I talked to PyTorch devs, their opinion was that supporting the last 4 versions (not counting patch releases) is reasonable, so that would mean setting the min version to 2.0.1. Probably we want to be a bit more generous here but the last change to the min version was one year ago, so I think we can bump it. Depending on the chosen version, this means way may not have to fix any tests after all.

Let's wait for Zach's return to decide this.

@yhna940
Copy link
Contributor Author

yhna940 commented Jun 25, 2024

Thank you for the review!

I understand the concern regarding the PyTorch version support. Since Zach's input is pending, I'm happy to wait for his review.

However, I would like to emphasize that this PR specifically addresses the issue of the pytorch-version not being correctly referenced in the CI configuration. It ensures that the specified minimum version is actually installed.

Given that this PR is focused on fixing the reference issue, I suggest merging it with the installation set to torch==2.X (the minimum version that allows the CI to pass). This way, we can ensure the CI pipeline is correctly referencing the pytorch-version.

Subsequently, huggingface maintainers can decide on the appropriate minimum PyTorch version policy internally. They can also address any failing tests and gradually adjust the CI minimum version as needed. Modifying the test cases within this PR would make it significantly larger and more complex, so I propose handling those changes separately.

Looking forward to your thoughts and Zach's input.

if [[ ${{ matrix.pytorch-version }} = minimum ]]; then pip install torch==2.x; fi

@BenjaminBossan
Copy link
Member

Given that this PR is focused on fixing the reference issue, I suggest merging it with the installation set to torch==2.X (the minimum version that allows the CI to pass). This way, we can ensure the CI pipeline is correctly referencing the pytorch-version.

This could be a good compromise and would be a strict improvement on the current status. Do you know what that version would be? If not, we could just try 2.0. Regardless though, this PR would need to wait for Zach's approval to be merged, so a bit of waiting is inevitable.

Modifying the test cases within this PR would make it significantly larger and more complex, so I propose handling those changes separately.

I agree.

@yhna940
Copy link
Contributor Author

yhna940 commented Jul 1, 2024

Do you know what that version would be? If not, we could just try 2.0. Regardless though, this PR would need to wait for Zach's approval to be merged, so a bit of waiting is inevitable.

It seems that CI only works on the latest version of torch. I've tried v2.2.2 and v2.1.2 with the make test command and got the following error messages.

FAILED tests/test_grad_sync.py::SyncScheduler::test_gradient_sync_gpu_multi - RuntimeError: 'accelerate launch --num_processes=8 --monitor_interval=0.1 /workspaces/accelerate/src/accelerate/test_utils/scripts/test_sync.py' failed with returncode 1
FAILED tests/test_multigpu.py::MultiDeviceTester::test_multi_device_merge_fsdp_weights - RuntimeError: 'accelerate launch --num_processes=8 --monitor_interval=0.1 /workspaces/accelerate/src/accelerate/test_utils/scripts/test_merge_weights.py' failed with returncode 1
FAILED tests/test_multigpu.py::MultiDeviceTester::test_pippy - RuntimeError: 'accelerate launch --multi_gpu --num_processes=8 /workspaces/accelerate/src/accelerate/test_utils/scripts/external_deps/test_pippy.py' failed with returncode 1
FAILED tests/test_utils.py::UtilsTester::test_dynamo - torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:

@muellerzr
Copy link
Collaborator

Thanks for your patience while I was out on holiday! Our general rule of thumb with Accelerate to keep things stable is the last 2 years of PyTorch releases, which for us right now is 1.12.0.

We should definitely fix whatever is broken, thank you for pointing out that minimum isn't doing it's job!

If you feel up to working on fixing all these issues, or tackling a few of them, that'd be most welcome. The other solution is to merge this, since we know this "fixes" the CI, and I can get to getting these passing ASAP.

Which are you more open to contribute to? 🤗

@yhna940
Copy link
Contributor Author

yhna940 commented Jul 1, 2024

Thank you for getting back to me, and I hope you had a wonderful holiday!

For this PR, I suggest merging it as it corrects the issue with the pytorch-version reference in the CI workflow. Once this is addressed, we can move forward with fixing any broken CI issues separately.

I'm open to contributing further by addressing specific failing tests or issues that come up.

Looking forward to your thoughts on this plan :)

@BenjaminBossan
Copy link
Member

For this PR, I suggest merging it as it corrects the issue with the pytorch-version reference in the CI workflow. Once this is addressed, we can move forward with fixing any broken CI issues separately.

We can't really merge the PR as is, since it would mean that CI is broken until all the issues with older PyTorch versions have been fixed. Instead, we should make this change, right?

- if [[ ${{ matrix.pytorch-version }} = minimum ]]; then pip install torch==1.10.0; fi
+ if [[ ${{ matrix.pytorch-version }} = minimum ]]; then pip install torch==2.3.1; fi

This way, all the tests pass and we're basically testing the same thing as we already do in accelerate. When PyTorch 2.4 is released, we also make sure that 2.3 keeps on working. Next, we can slowly try to move the version down to 1.12 while fixing the issues that come up.

@yhna940
Copy link
Contributor Author

yhna940 commented Jul 2, 2024

- if [[ ${{ matrix.pytorch-version }} = minimum ]]; then pip install torch==1.10.0; fi
+ if [[ ${{ matrix.pytorch-version }} = minimum ]]; then pip install torch==2.3.1; fi

Done 5709c05 🙏

Copy link
Collaborator

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@muellerzr muellerzr merged commit 709fd1e into huggingface:main Jul 15, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants