Skip to content

Conversation

@rmccorm4
Copy link
Contributor

@rmccorm4 rmccorm4 commented Aug 4, 2025

Overview:

With update to TRTLLM 1.0.0rc4, various config field names and requiements have changed, this PR updates them across all example configs.

Agg configs were updated with most of these in #2217, but some of the fixes were missed in the disagg configs.

Fixes nvbugs/5430986:

  1. cache_transceiver_config is now a required field in the YAML config to enable disaggregation.
AssertionError: kv_cache_transceiver is disabled, please set 'cache_transceiver_config: backend:` in config file for disaggregated serving

Fixes nvbugs/5431299:

  1. kv_cache_dtype is now kv_cache_config.dtype
  2. autotuner_enabled is now enable_autotuner
  3. pytorch_weights_path is now speculative_model_dir, which fixes:
ERROR ... Path to EAGLE3 weights must be specified.

NOTE: I will bring these fixes back to main branch quickly afterwards - focus is on release for sanity testing first.

@rmccorm4 rmccorm4 changed the title fix: Add default cache transceiver config to all prefill/decode (disagg) config files (CP #2276) fix: Add default cache transceiver config to all prefill/decode (disagg) config files (release/0.4.0) Aug 4, 2025
@KrishnanPrash KrishnanPrash self-requested a review August 4, 2025 21:53
Copy link
Contributor

@KrishnanPrash KrishnanPrash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@rmccorm4
Copy link
Contributor Author

rmccorm4 commented Aug 4, 2025

FYI - test-end-to-end-trtllm-multi-GPU appears to be flaky. Sometimes passing, sometimes failing, on this branch and on other unrelated branches as well. Should be investigated separately.

@rmccorm4 rmccorm4 changed the title fix: Add default cache transceiver config to all prefill/decode (disagg) config files (release/0.4.0) fix: Update disagg configs for trtllm 1.0.0rc4 changes (release/0.4.0) Aug 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants