-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker Build Fails #184
Comments
Related issue: NVIDIA/Megatron-LM#650 |
If:
goes before:
|
I appreciate your suggestion, @JanuszL ! However, it seems that it is not working. I will wait for the bugfix from the developers. My diff for Dockerfile:
Docker build result:
|
It seems that the ToT of Megatron-LM build 0.4.0rc0 while NeMo expects 0.4.0. |
Thanks, @JanuszL. I tried your suggestion both before and after modifying the Dockerfile. Without the modifications, it still prints out the same error. However, when I change the Dockerfile, the pip installation stage takes an unusually long time. |
@TaekyungHeo thank you for checking. I think I may lack the necessary understanding of the build logic used here. Let us wait for the project maintainers to share their thoughts. |
Not that this is a valid solution, however I was facing the same issue while installing nemo_toolkit[all] and I reverted the version of the package to the previous one nemo-toolkit==1.21.0 released in 2023 as opposed to the current one which released in Jan 2024. |
Issue Description
When attempting to build a Docker image using the latest branch of the NeMo-Megatron-Launcher, the build fails.
Steps to Reproduce
Run the Docker build command:
Additional Context
megatron_core==0.4.0
package, which is installed as part of the Docker build process.megatron_core
team. Peter Dykas replied that we need to use python3.10.--build-arg NEMO_COMMIT=c7948b26a00c91a7332d9eb04f4d66725e9d62e3
) installs a previous megatron package (0.3.0) but leads to failure in the data preparation stage, possibly due to other issues resolved in the latest NeMo version.The text was updated successfully, but these errors were encountered: