Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Docker Build Fails at pip install megatron-core==0.4.0 #650

Closed
TaekyungHeo opened this issue Jan 4, 2024 · 8 comments
Closed

[BUG] Docker Build Fails at pip install megatron-core==0.4.0 #650

TaekyungHeo opened this issue Jan 4, 2024 · 8 comments
Labels
stale No activity in 60 days on issue or PR

Comments

@TaekyungHeo
Copy link
Member

TaekyungHeo commented Jan 4, 2024

Describe the bug
There is an error in building the Docker image for a project dependent on Megatron-LM (https://github.com/NVIDIA/NeMo-Megatron-Launcher). The build process gets stuck during the package installation phase, specifically at pip install megatron-core==0.4.0.

To Reproduce
Steps to reproduce the behavior:

  1. Clone NeMo-Megatron-Launcher
$ git clone --recurse-submodules https://github.com/NVIDIA/NeMo-Megatron-Launcher.git
  1. Build a docker image and observe the failure during the pip install megatron-core==0.4.0 step.
$ cd NeMo-Megatron-Launcher
$ docker build .

Expected behavior
The Docker build should proceed without errors and successfully install all required packages, including megatron-core==0.4.0.

Stack trace/logs

$ pip install megatron-core==0.4.0
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting megatron-core==0.4.0
  Downloading megatron_core-0.4.0.tar.gz (154 kB)
     |████████████████████████████████| 154 kB 15.1 MB/s 
  Installing build dependencies ... done
  WARNING: Missing build requirements in pyproject.toml for megatron-core==0.4.0 from https://files.pythonhosted.org/packages/fd/b9/e85da25f4de43dad70d6fd1c21b88db085f471d5348c51cce05dc9e4b0ef/megatron_core-0.4.0.tar.gz#sha256=bb2cd1f4c5746b31a8b4abd676820ddceec272f002873801a519dbbf1352d8ef.
  WARNING: The project does not specify a build backend, and pip cannot fall back to setuptools without 'wheel'.
  Getting requirements to build wheel ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python /usr/local/lib/python3.8/dist-packages/pip/_vendor/pep517/in_process/_in_process.py get_requires_for_build_wheel /tmp/tmp1gpfom00
       cwd: /tmp/pip-install-_fo_70fz/megatron-core_b051d2fdd6d846beb6e755037509a79a
  Complete output (18 lines):
  Traceback (most recent call last):
    File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 349, in <module>
      main()
    File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 331, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 117, in get_requires_for_build_wheel
      return hook(config_settings)
    File "/usr/local/lib/python3.8/dist-packages/setuptools/build_meta.py", line 338, in get_requires_for_build_wheel
      return self._get_build_requires(config_settings, requirements=['wheel'])
    File "/usr/local/lib/python3.8/dist-packages/setuptools/build_meta.py", line 320, in _get_build_requires
      self.run_setup()
    File "/usr/local/lib/python3.8/dist-packages/setuptools/build_meta.py", line 483, in run_setup
      super(_BuildMetaLegacyBackend,
    File "/usr/local/lib/python3.8/dist-packages/setuptools/build_meta.py", line 335, in run_setup
      exec(code, locals())
    File "<string>", line 52, in <module>
    File "<string>", line 45, in req_file
  FileNotFoundError: [Errno 2] No such file or directory: 'megatron/core/requirements.txt'
  ----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/fd/b9/e85da25f4de43dad70d6fd1c21b88db085f471d5348c51cce05dc9e4b0ef/megatron_core-0.4.0.tar.gz#sha256=bb2cd1f4c5746b31a8b4abd676820ddceec272f002873801a519dbbf1352d8ef (from https://pypi.org/simple/megatron-core/). Command errored out with exit status 1: /usr/bin/python /usr/local/lib/python3.8/dist-packages/pip/_vendor/pep517/in_process/_in_process.py get_requires_for_build_wheel /tmp/tmp1gpfom00 Check the logs for full command output.
ERROR: Could not find a version that satisfies the requirement megatron-core==0.4.0 (from versions: 0.1.0, 0.2.0, 0.3.0, 0.4.0)
ERROR: No matching distribution found for megatron-core==0.4.0
WARNING: You are using pip version 21.2.4; however, version 23.3.2 is available.
You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.

Environment:

  • Megatron-LM commit ID: Unknown (depends on the Dockerfile configuration)
  • PyTorch version: Unknown (depends on the Dockerfile configuration)
  • CUDA version: Unknown (depends on the Dockerfile configuration, if applicable)
  • NCCL version: Unknown (depends on the Dockerfile configuration, if applicable)

Proposed fix
Currently, I do not have a proposed fix. I am hoping the maintainers can provide insight or a fix for this issue.

Additional context

  • The issue appears to be specific to megatron_core version 0.4.0. Notably, when using megatron_core version 0.3.0, the build process completes successfully. This suggests that the problem may be isolated to changes introduced in version 0.4.0 of megatron_core.
@TaekyungHeo
Copy link
Member Author

Related issue: NVIDIA/NeMo-Framework-Launcher#184

@JanuszL
Copy link

JanuszL commented Jan 4, 2024

I believe it could be fixed by:

diff --git a/MANIFEST.in b/MANIFEST.in
new file mode 100644
index 00000000..b3356b76
--- /dev/null
+++ b/MANIFEST.in
@@ -0,0 +1 @@
+include megatron/core/requirements.txt```
`requirements.txt` is not packed to the source distribution package. 

@vishakha-lall
Copy link

Not that this is a valid solution, however I was facing the same issue while installing nemo_toolkit[all] and I reverted the version of the package to the previous one nemo-toolkit==1.21.0 released in 2023 as opposed to the current one which released in Jan 2024.

@pzelasko
Copy link

Encountered the same issue as the OP -- indeed it seems this repo is missing a MANIFEST.in file that tells distutils which extra non-code files need to be included in the PyPI release.

https://setuptools.pypa.io/en/latest/userguide/miscellaneous.html

@JanuszL
Copy link

JanuszL commented Jan 26, 2024

@pzelasko - I think is has been fixed f6b0f4e. NeMo-Megatron-Launcher seems to have other problems as well.

@pzelasko
Copy link

Thanks @JanuszL you’re right, I missed that.

Copy link

Marking as stale. No activity in 60 days.

@github-actions github-actions bot added the stale No activity in 60 days on issue or PR label Mar 26, 2024
@elliottnv
Copy link
Collaborator

Closing bug. Will track the issue on the new ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale No activity in 60 days on issue or PR
Projects
None yet
Development

No branches or pull requests

5 participants