Skip to content

Conversation

@Michaelgathara
Copy link
Contributor

@Michaelgathara Michaelgathara commented Aug 22, 2025

Overview:

This PR adds support for the HF_ENDPOINT environment variable in TRTLLM deployments. This feature allows users to specify custom HuggingFace endpoints (like mirrors or enterprise HF servers) when downloading models, which is especially useful in environments with restricted internet access or when using HF mirrors for better performance.

Details:

  • Added HF_ENDPOINT to the list of common environment variables in getCommonTRTLLMEnvVars()
  • The environment variable gets automatically included in the -x flags for mpirun
  • Updated all test cases to verify HF_ENDPOINT is properly forwarded in multinode mpirun commands

Where should the reviewer start?

  • deploy/cloud/operator/internal/dynamo/backend_trtllm.go - Main change in getCommonTRTLLMEnvVars() function
  • deploy/cloud/operator/internal/dynamo/backend_trtllm_test.go - Updated test expectations showing HF_ENDPOINT in mpirun commands

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

  • New Features

    • Added support for configuring a custom Hugging Face Hub endpoint via HF_ENDPOINT.
    • Automatically maps HF_ENDPOINT to HUGGINGFACE_HUB_ENDPOINT when unset for seamless hub access.
    • Ensures HF_ENDPOINT is propagated to all distributed (MPI) processes for consistent behavior across nodes.
  • Tests

    • Updated test cases to validate HF_ENDPOINT forwarding in various multi-node and role scenarios.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 22, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link

👋 Hi Michaelgathara! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added the external-contribution Pull request is from an external contributor label Aug 22, 2025
@Michaelgathara Michaelgathara changed the title [FEATURE]: HF_ENDPOINT addition feat: HF_ENDPOINT addition Aug 22, 2025
@github-actions github-actions bot added the feat label Aug 22, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 22, 2025

Walkthrough

Adds HF_ENDPOINT to TRT-LLM environment propagation (operator and tests) and updates Rust hub logic to mirror HF_ENDPOINT into HUGGINGFACE_HUB_ENDPOINT when unset, leaving existing behavior otherwise unchanged.

Changes

Cohort / File(s) Summary
TRT-LLM MPI env propagation
deploy/cloud/operator/internal/dynamo/backend_trtllm.go, deploy/cloud/operator/internal/dynamo/backend_trtllm_test.go
Include HF_ENDPOINT in common TRT-LLM env vars so mpirun forwards it (adds -x HF_ENDPOINT). Tests updated to expect the new flag across relevant scenarios.
HF endpoint bridging in hub
lib/llm/src/hub.rs
In from_hf, if HF_ENDPOINT is set and HUGGINGFACE_HUB_ENDPOINT is not, set HUGGINGFACE_HUB_ENDPOINT to HF_ENDPOINT before building the API client. No public API changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Env as Process Env
  participant Hub as from_hf()
  participant API as ApiBuilder

  Note over Hub,API: HF endpoint propagation (new)

  Hub->>Env: Read HF_TOKEN (existing)
  Hub->>Env: Read HF_ENDPOINT (new)
  alt HUGGINGFACE_HUB_ENDPOINT is unset AND HF_ENDPOINT is set
    Hub->>Env: Set HUGGINGFACE_HUB_ENDPOINT = HF_ENDPOINT (new)
  else
    Note over Hub: No change to HUGGINGFACE_HUB_ENDPOINT
  end
  Hub->>API: ApiBuilder::new() with env-derived config
  API-->>Hub: Client instance
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Poem

A whisk of wind, a hop, a send—
We carry HF_ENDPOINT end-to-end.
mpirun sings with flags anew,
The hub now knows just what to do.
I twitch my nose, reviews are tight—
Small hops, clear paths, all set right. 🐇✨

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.2.2)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
deploy/cloud/operator/internal/dynamo/backend_trtllm.go (1)

191-195: Also forward HUGGINGFACE_HUB_ENDPOINT to cover canonical var users

Many clients (including Rust hf-hub) look for HUGGINGFACE_HUB_ENDPOINT. Forwarding it too avoids surprises when users set only that var on the Pod. Alphabetical sorting will place it before HUGGING_FACE_HUB_TOKEN in the -x list.

Apply this diff:

 return map[string]bool{
-    "CUDA_VISIBLE_DEVICES": true, "MODEL_PATH": true, "HF_TOKEN": true, "HUGGING_FACE_HUB_TOKEN": true, "HF_ENDPOINT": true,
+    "CUDA_VISIBLE_DEVICES": true, "MODEL_PATH": true, "HF_TOKEN": true, "HUGGING_FACE_HUB_TOKEN": true, "HF_ENDPOINT": true,
+    "HUGGINGFACE_HUB_ENDPOINT": true,
     "TOKENIZERS_PARALLELISM": true, "NCCL_DEBUG": true, "NCCL_IB_DISABLE": true, "NCCL_P2P_DISABLE": true,
     "TENSORRT_LLM_CACHE_DIR": true, "HF_HOME": true, "TRANSFORMERS_CACHE": true, "HF_DATASETS_CACHE": true,
     "PATH": true, "LD_LIBRARY_PATH": true, "PYTHONPATH": true, "HOME": true, "USER": true,
 }

If you adopt this, I can update the expected strings in backend_trtllm_test.go to include -x HUGGINGFACE_HUB_ENDPOINT in the right sorted position.

deploy/cloud/operator/internal/dynamo/backend_trtllm_test.go (1)

63-63: Reduce test brittleness around exact mpirun env flag ordering (optional)

String-equality on the entire mpirun command is fragile whenever we extend the env allowlist. Consider asserting presence of key segments (e.g., contains "-x HF_ENDPOINT") or generating the expected env flags via formatEnvVarFlags(collectAllEnvVars(container.Env)) to keep order in sync.

I can draft a small helper to assemble expected env flags in tests so future env additions don’t require editing long literals.

Also applies to: 119-119, 566-566, 576-576, 594-594, 612-612, 630-630

lib/llm/src/hub.rs (1)

48-55: Avoid process-wide environment mutations by using ApiBuilder::with_endpoint
Setting a global environment variable at runtime can lead to surprising side effects in multithreaded contexts, since it applies across the entire process. The Tokio ApiBuilder already supports a custom endpoint via its with_endpoint method, so it’s safer to configure the client directly instead of mutating std::env.

File: lib/llm/src/hub.rs (lines 48–55)
• Replace the current env-var propagation block with builder configuration.

Suggested refactor:

-    // If HF_ENDPOINT is provided, propagate it to the canonical env var used by some clients
-    // to select an alternate Hugging Face hub endpoint. We only set it if not already present.
-    if let Ok(endpoint) = env::var(HF_ENDPOINT_ENV_VAR) {
-        if env::var(HUGGINGFACE_HUB_ENDPOINT_ENV_VAR).is_err() {
-            env::set_var(HUGGINGFACE_HUB_ENDPOINT_ENV_VAR, &endpoint);
-        }
-    }
+    // Prefer passing a custom endpoint directly to the client instead of mutating process-wide env.
+    let mut builder = ApiBuilder::new()
+        .with_progress(true)
+        .with_token(token);
+
+    if let Ok(endpoint) = std::env::var(HUGGINGFACE_HUB_ENDPOINT_ENV_VAR)
+        .or_else(|_| std::env::var(HF_ENDPOINT_ENV_VAR))
+    {
+        builder = builder.with_endpoint(endpoint);
+    }
+
+    let api = builder.high().build()?;

(Default endpoint is https://huggingface.co.) (docs.rs)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e2e909f and 08316e7.

📒 Files selected for processing (3)
  • deploy/cloud/operator/internal/dynamo/backend_trtllm.go (1 hunks)
  • deploy/cloud/operator/internal/dynamo/backend_trtllm_test.go (7 hunks)
  • lib/llm/src/hub.rs (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
🔇 Additional comments (9)
deploy/cloud/operator/internal/dynamo/backend_trtllm.go (1)

191-195: HF_ENDPOINT propagation via mpirun looks good

Adding HF_ENDPOINT to the common env set ensures it’s forwarded with -x and unblocks custom HF mirrors in multinode TRT-LLM runs.

deploy/cloud/operator/internal/dynamo/backend_trtllm_test.go (7)

63-63: Updated expectation includes -x HF_ENDPOINT — matches production sorting

The insertion point after -x HF_DATASETS_CACHE looks correct with the sorted env var flags.


119-119: HF_ENDPOINT forwarding asserted for LWS path — LGTM

Covers the alternate multinode deployer path.


566-566: Leader (args-first) case checks -x HF_ENDPOINT — LGTM


576-576: Leader (command-first, no GPUs) case asserts -x HF_ENDPOINT — LGTM


594-594: Leader (args take precedence) case includes -x HF_ENDPOINT — LGTM


612-612: Comprehensive env forwarding case updated to include -x HF_ENDPOINT — LGTM


630-630: Deduplication test continues to pass with -x HF_ENDPOINT in the set — LGTM

Still verifies that explicitly provided envs are merged and sorted once.

lib/llm/src/hub.rs (1)

29-31: Clear constant names for env vars — good addition

Names align with what operators and users expect.

@julienmancuso julienmancuso merged commit 45e38d3 into ai-dynamo:main Aug 26, 2025
8 checks passed
ayushag-nv pushed a commit that referenced this pull request Aug 27, 2025
Signed-off-by: ayushag <ayushag@nvidia.com>
jasonqinzhou pushed a commit that referenced this pull request Aug 30, 2025
Signed-off-by: Jason Zhou <jasonzho@jasonzho-mlt.client.nvidia.com>
KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
nnshah1 pushed a commit that referenced this pull request Sep 8, 2025
Signed-off-by: nnshah1 <neelays@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contribution Pull request is from an external contributor feat size/S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants