Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pin apex to a speicifc commit (for DeepSpeed CI docker image) #24351

Merged
merged 2 commits into from
Jun 19, 2023

Conversation

ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Jun 19, 2023

What does this PR do?

The docker image build for DeepSpeed job in CI fails since ~ one week due to this apex issue.

Let's pin to the previous commit until the above mentioned issue is resolved on apex side.

Currently, the DeepSpeed job fails as the above failure prevents to use newer images that include some fixes on accelerate side.

@ydshieh ydshieh requested a review from amyeroberts June 19, 2023 09:51
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 19, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

@ydshieh ydshieh merged commit 17e3e7d into main Jun 19, 2023
@ydshieh ydshieh deleted the fix_ds_docker_image branch June 19, 2023 10:48
akhilkedia added a commit to akhilkedia/ConvNeXt-V2 that referenced this pull request Jun 20, 2023
The latest version of apex currently does not install, as mentioned here facebookresearch#52.

This issue with apex has also been reported here NVIDIA/apex#1679

huggingface/transformers#24351 suggests pinning apex to a specific commit, `cd apex && git checkout 82ee367f3da74b4cd62a1fb47aa9806f0f47b58b`, after which apex installs successfully.

However, that version of apex is incompatible with the version of torch used here, and I get this error NVIDIA/apex#1532.

The previous link suggest using version `22.04-dev` (`cd apex && git checkout 22.04-dev`) of apex. With this, apex compiles successfully and `python ./main_finetune.py` also runs training using amp successfully.

If the authors can tell us the exact HEAD commit of apex version that they used, we can use that version instead!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants