fix: WAR DeepGemm JIT compilation errors #2937

GuanLuo · 2025-09-08T21:12:37Z

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Runtime container now includes CUDA command-line tools (e.g., cuobjdump/NVTools) for improved GPU diagnostics and profiling out of the box.
Bug Fixes
- Adjusted environment configuration so CUDA headers are reliably detected by nvcc inside the container, reducing missing-header build errors and ensuring consistent compilation of CUDA components.
Chores
- Streamlined CUDA tooling availability in the runtime image, minimizing the need for custom setups and improving reliability of GPU-enabled workflows.

Signed-off-by: GuanLuo <gluo@nvidia.com>

GuanLuo · 2025-09-08T21:13:20Z

/ok to test bf497ee

coderabbitai · 2025-09-08T21:16:21Z

Walkthrough

The Dockerfile for the vLLM runtime image adds cuda-command-line-tools-12-8 and introduces the CPATH environment variable pointing to /usr/local/cuda/include. CPATH is declared twice within the runtime stage. Comments clarify the need for cuobjdump/NVTools and nvcc header resolution.

Changes

Cohort / File(s)	Summary
Runtime image env/tooling updates `container/Dockerfile.vllm`	- Install `cuda-command-line-tools-12-8` alongside `ca-certificates` in runtime stage - Add `ENV CPATH=/usr/local/cuda/include:$CPATH` (declared twice in runtime stage) - Add comments noting cuobjdump absence in CUDA image and rationale for CPATH for nvcc header lookup

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

build: enable vllm runtime container as default container for ci pipelines #1451 — Also updates container/Dockerfile.vllm runtime configuration (env vars/tooling), indicating overlapping container runtime concerns.

Pre-merge checks (2 passed, 1 warning)

❌ Failed Checks (1 warning)

Check Name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The description includes the required template headings but contains only placeholder comments without any actual overview, change details, reviewer guidance, or issue context.	Populate each section with substantive content: a concise summary under Overview, specific alterations under Details, files to inspect under Where should the reviewer start?, and the actual GitHub issue number if known under Related Issues.

✅ Passed Checks (2 passed)

Check Name	Status	Explanation
Title Check	✅ Passed	The title succinctly indicates that the pull request addresses DeepGEMM JIT compilation errors, which aligns with the core change applied to the Dockerfile to resolve those errors.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

Poem

In a crate of bytes I hop with glee,
Added tools to peek at GPU’s sea.
CPATH carrots lined in two neat rows,
So nvcc finds where header grows.
With cuobjdump nibs and NVTools light,
This runtime burrow builds just right. 🥕🐇

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  - Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.
  - Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

container/Dockerfile.vllm (2)
202-204: Install only cuobjdump to keep the runtime slim

DeepGEMM needs cuobjdump, but pulling the full cuda-command-line-tools bundle is heavier than necessary. Prefer the specific package(s) to reduce image size and attack surface.

Apply:
-        ca-certificates \
-        # DeepGemm uses 'cuobjdump' which does not come with CUDA image
-        cuda-command-line-tools-12-8 && \
+        ca-certificates \
+        # DeepGemm needs cuobjdump at runtime; install only the required tool
+        cuda-cuobjdump-12-8 && \
If your CUDA repo doesn’t expose cuda-cuobjdump-12-8 for Ubuntu 24.04 yet, keep the current meta package. To confirm availability before switching, run inside an NVIDIA CUDA 12.8 runtime base: apt-cache policy cuda-cuobjdump-12-8.

254-257: Set CPATH explicitly and fix the typo

Appending to a potentially undefined CPATH can be brittle. We’ve had issues before with Docker ENV validation when appending to absent vars; setting it explicitly is safer. Also fix “complilation” → “compilation”. (Pulled from prior learning on ENV appends in Dockerfiles.)

Apply:
-# is not properly set for complilation. Set CPATH to help nvcc find the headers.
-ENV CPATH=/usr/local/cuda/include:$CPATH
+# is not properly set for compilation. Set CPATH so nvcc finds headers.
+ENV CPATH=/usr/local/cuda/include
Optionally ensure CUDA binaries are on PATH (some runtime images already do this):
# Add near the PATH line where nvvm is added
ENV PATH=/usr/local/cuda/bin:${PATH}
After building the runtime image, validate:

which nvcc && nvcc --version

which cuobjdump

echo '#include <cuda.h>' > /tmp/t.cu && nvcc -E /tmp/t.cu -o /dev/null

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41dacce and bf497ee.

📒 Files selected for processing (1)

container/Dockerfile.vllm (2 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: grahamking
PR: ai-dynamo/dynamo#1177
File: container/Dockerfile.vllm:102-105
Timestamp: 2025-05-28T22:54:46.875Z
Learning: In Dockerfiles, when appending to environment variables that may not exist in the base image, Docker validation will fail if you reference undefined variables with ${VARIABLE} syntax. In such cases, setting the environment variable directly (e.g., ENV CPATH=/usr/include) rather than appending is the appropriate approach.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and Test - dynamo

GuanLuo · 2025-09-08T22:43:57Z

/ok to test bf497ee

Signed-off-by: GuanLuo <gluo@nvidia.com> Co-authored-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

Signed-off-by: GuanLuo <gluo@nvidia.com> Co-authored-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>

Signed-off-by: GuanLuo <gluo@nvidia.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com> Co-authored-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>

Signed-off-by: GuanLuo <gluo@nvidia.com> Co-authored-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com> Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>

Signed-off-by: GuanLuo <gluo@nvidia.com> Co-authored-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>

Signed-off-by: GuanLuo <gluo@nvidia.com> Co-authored-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

fix: WAR DeepGemm JIT compilation errors

bf497ee

Signed-off-by: GuanLuo <gluo@nvidia.com>

GuanLuo requested review from a team, alec-flowers, ishandhanani, nnshah1, ptarasiewiczNV, richardhuo-nv, rmccorm4 and tanmayv25 as code owners September 8, 2025 21:12

pull-request-size bot added the size/XS label Sep 8, 2025

github-actions bot added the fix label Sep 8, 2025

coderabbitai bot reviewed Sep 8, 2025

View reviewed changes

Merge branch 'main' into gluo/fix-r1

8ec345c

copy-pr-bot bot temporarily deployed to GITLAB September 8, 2025 23:42 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 8, 2025 23:47 Inactive

saturley-hall approved these changes Sep 9, 2025

View reviewed changes

saturley-hall merged commit 04e8a9c into main Sep 9, 2025
11 of 12 checks passed

saturley-hall deleted the gluo/fix-r1 branch September 9, 2025 00:26

saturley-hall mentioned this pull request Sep 9, 2025

fix: WAR DeepGemm JIT compilation errors (#2937) #2945

Merged

alec-flowers pushed a commit that referenced this pull request Sep 9, 2025

fix: WAR DeepGemm JIT compilation errors (#2937)

5368d1b

Signed-off-by: GuanLuo <gluo@nvidia.com> Co-authored-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>

indrajit96 pushed a commit that referenced this pull request Sep 9, 2025

fix: WAR DeepGemm JIT compilation errors (#2937)

0e4c980

Signed-off-by: GuanLuo <gluo@nvidia.com> Co-authored-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com> Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>

indrajit96 pushed a commit that referenced this pull request Sep 9, 2025

fix: WAR DeepGemm JIT compilation errors (#2937)

78587b9

Signed-off-by: GuanLuo <gluo@nvidia.com> Co-authored-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>

tedzhouhk pushed a commit that referenced this pull request Sep 10, 2025

fix: WAR DeepGemm JIT compilation errors (#2937)

1803db8

Signed-off-by: GuanLuo <gluo@nvidia.com> Co-authored-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

coderabbitai bot mentioned this pull request Sep 10, 2025

fix: fix broken links in doc #2994

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: WAR DeepGemm JIT compilation errors #2937

fix: WAR DeepGemm JIT compilation errors #2937

Uh oh!

GuanLuo commented Sep 8, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

GuanLuo commented Sep 8, 2025

Uh oh!

coderabbitai bot commented Sep 8, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

GuanLuo commented Sep 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: WAR DeepGemm JIT compilation errors #2937

fix: WAR DeepGemm JIT compilation errors #2937

Uh oh!

Conversation

GuanLuo commented Sep 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

GuanLuo commented Sep 8, 2025

Uh oh!

coderabbitai bot commented Sep 8, 2025

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Pre-merge checks (2 passed, 1 warning)

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

GuanLuo commented Sep 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GuanLuo commented Sep 8, 2025 •

edited by coderabbitai bot

Loading