Skip to content
Merged
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
700f8d7
fix links of h100_prefill_performance.png and h100_decode_performance…
jasonqinzhou Aug 22, 2025
dc49722
feat: Remove Duplicate Multimodel Nixl Connect Example (#2622)
whoisj Aug 22, 2025
37c4680
feat: add BaseLogitsProcessor core interface (#2613)
bhuvan002 Aug 22, 2025
5c991dc
fix: Tests now pass with RUST_BACKTRACE set (#2647)
grahamking Aug 22, 2025
0767cc1
docs: Update supported model in readme for multimodal (#2651)
krishung5 Aug 22, 2025
3ef8e84
feat: enable dynamo metrics on KVBM (#2626)
ziqifan617 Aug 22, 2025
216f608
feat: [vLLM] implement cli args for tool and reasoning parsers (#2619)
ayushag-nv Aug 22, 2025
b25fbad
fix: [TRTLLM+ LLAMA4 + Eagle 3] Remove the ‘two-models config’ and se…
richardhuo-nv Aug 22, 2025
f668a0b
fix: handle missing span_name in logging test assertions (#2665)
WaelBKZ Aug 22, 2025
5f72551
fix: missing tokenizer args in sla_planner.py (#2667)
tedzhouhk Aug 22, 2025
c3f6ec0
chore: Rust to 1.89 and edition 2024 (#2659)
grahamking Aug 22, 2025
c9163dc
fix: hello world yaml and messages (#2634)
biswapanda Aug 22, 2025
2a57fd6
fix: move metrics registration to service creation (#2664)
keivenchang Aug 22, 2025
f18aee4
fix: Skip checksum tests in release mode since they're not computed (…
rmccorm4 Aug 23, 2025
6c66795
fix: fix manual helm chart (#2648)
julienmancuso Aug 24, 2025
cb59d0e
fix: do not fail if backendFramework cannot be detected (#2655)
julienmancuso Aug 24, 2025
024d0f4
fix: fix env vars override (#2640)
julienmancuso Aug 24, 2025
dbaaf3b
fix: increase shm default size and make it configurable (#2616)
julienmancuso Aug 24, 2025
2a56c5a
fix: pytest robustness and parsing error (#2676)
alec-flowers Aug 24, 2025
d18b01c
fix: correct planner test example after tokenizer fix (#2674)
tedzhouhk Aug 25, 2025
74bf2a5
feat: add initial batch of KVBM metrics on match, offload and onboard…
ziqifan617 Aug 25, 2025
878afde
feat: add gpt oss reasoning parser through harmony (#2656)
nachiketb-nvidia Aug 25, 2025
26aac03
chore: vllm 0.10.1.1 (#2641)
dmitry-tokarev-nv Aug 25, 2025
326d291
feat: add prometheus to the runtime image for sglang (#2689)
hhzhang16 Aug 25, 2025
77775bb
feat: support HF_HOME/_ENDPOINT env for Hugging Face models (#2642)
hhk7734 Aug 25, 2025
183088d
refactor: Switch ModelManager locks from `std::sync::Mutex` to `parki…
paulhendricks Aug 25, 2025
bdcbc56
feat: python bindings for the entire KvPushRouter + per-request route…
PeaBrane Aug 25, 2025
80f6b0b
refactor: move uptime tracking from system_status_server(HTTP) to DRT…
keivenchang Aug 25, 2025
1f86999
feat: enable --dyn-reasoning-parser flag to set reasoning parser for …
nachiketb-nvidia Aug 25, 2025
5fad214
docs: Simplify sphinx build and table of contents on webpage (#2519)
rmccorm4 Aug 25, 2025
1948113
feat: parse normal text along with tool calls (#2709)
ayushag-nv Aug 26, 2025
955ad8e
feat: HF_ENDPOINT addition (#2637)
Michaelgathara Aug 26, 2025
116c07c
docs: Update how containers should be built for SGLang examples (#2707)
Elnifio Aug 26, 2025
6bdaebe
feat: align OpenAI response IDs with distributed trace IDs (#2496)
qimcis Aug 26, 2025
8b16a5f
fix: fix sglang multinode example (#2716)
julienmancuso Aug 26, 2025
49815ad
fix: fix metrics docs; add dcgm-exporter (#2712)
mohammedabdulwahhab Aug 26, 2025
bfd1fc7
fix: sglang -- queue requests until model registration completes (#2701)
hhzhang16 Aug 26, 2025
b9029f7
feat: Add vllm multimodal qwen aggregated support (#2694)
krishung5 Aug 26, 2025
04e0601
feat: add and enable reasoning and tool parser flags for trtllm and s…
nachiketb-nvidia Aug 26, 2025
e9fa569
fix: fix hello world (#2727)
mohammedabdulwahhab Aug 26, 2025
e6f4121
feat: Deployment for Dynamo EPP - aware gateway (#2633)
atchernych Aug 26, 2025
db69d05
fix: add label to persist DGD name on downstream pods (#2729)
mohammedabdulwahhab Aug 27, 2025
4931da1
fix: container/Dockerfile.trtllm - use pytorch 2.8.0a0+5228986c39.nv2…
dmitry-tokarev-nv Aug 27, 2025
ab4f2fa
feat: allow specifying consumer name for NATS queue + manually purge …
PeaBrane Aug 27, 2025
ea075a5
feat: Trtllm metric_labels. (#2666)
tzulingk Aug 27, 2025
d92f7bb
feat: KServe gRPC support (#2638)
GuanLuo Aug 27, 2025
7aa26b5
feat: Sglang metrics labels. (#2679)
tzulingk Aug 27, 2025
8b694f1
chore: Add atchernych to deploy codeowners (#2751)
atchernych Aug 27, 2025
7da5a38
fix: Reflect actual status of Grove PGS in Dynamo DGD (#2710)
julienmancuso Aug 27, 2025
03fb426
fix: revisit grove and LWS selection (#2564)
julienmancuso Aug 27, 2025
66ae167
feat: add reference setup for dynamo logging in k8s with loki (#2699)
mohammedabdulwahhab Aug 28, 2025
7a9a393
feat: Auto-inject kai-scheduler annotations and label (#2748)
julienmancuso Aug 28, 2025
8771641
chore: Update support_matrix.md (#2735)
dmitry-tokarev-nv Aug 28, 2025
8eb06b1
feat: Add vLLM multimodal video support (#2738)
krishung5 Aug 28, 2025
9e6f472
chore: deprecate duplicate params in nvext (#2754)
ryan-lempka Aug 28, 2025
83b4e06
ci: add support for vllm sanity testing on Github (#2526)
nv-anants Aug 28, 2025
fe7373c
refactor: centralize Prometheus metrics naming and sanitization DIS-5…
keivenchang Aug 28, 2025
a7ad862
docs: add mermaid graph to .devcontainer/README.md, remove mount (rev…
keivenchang Aug 28, 2025
ec2ea39
fix: Added description to async-openai/Cargo.toml (#2761)
dmitry-tokarev-nv Aug 28, 2025
a81ffe4
feat: add SGLANG devcontainer documentation (#2741)
keivenchang Aug 28, 2025
1d77cb3
feat: Integrate Model Express Client into Dynamo Model Downloads (#2574)
KavinKrishnan Aug 28, 2025
74d1fc2
feat: Prevent double-tokenization when EPP picks worker (#2559)
atchernych Aug 28, 2025
0b3b16b
docs: Use GFM Admonition style (#2771)
nealvaidya Aug 28, 2025
f76b0e6
docs: Fix dynamo cloud quickstart links (#2765)
rmccorm4 Aug 28, 2025
1551bd4
chore: deprecate nvext.top_k and nvext.repetition_penalty and make av…
ryan-lempka Aug 29, 2025
25ea1ca
fix: [trtllm] add wait_for_instance before register_llm (#2683)
alec-flowers Aug 29, 2025
ec0dc84
feat: Add Grove and Kai scheduler as part of dynamo cloud helm chart …
julienmancuso Aug 29, 2025
ad8ca54
feat: update planner to use DYN_PARENT_DGD_K8S_NAME (#2774)
julienmancuso Aug 29, 2025
54cf15d
feat: add Prometheus metrics integration for KvStats (#2704)
keivenchang Aug 29, 2025
587b2f3
fix: fix Lychee (#2779)
julienmancuso Aug 29, 2025
ebe9faa
feat: Add Encode Worker and NIXL support to trtllm multimodal flow (#…
indrajit96 Aug 29, 2025
948892b
test: add tests for replica calculation and planner scaling (#2525)
hhzhang16 Aug 29, 2025
0ed563b
fix: update concurrency to not cancel main runs (#2780)
nv-anants Aug 29, 2025
f34e68c
chore: added include_stop_str_in_output (#2782)
ayushag-nv Aug 29, 2025
6af7d73
fix: Remove duplicate import and update the test comment. (#2784)
tzulingk Aug 29, 2025
9712602
ci: Skip gitlab job on docs-only changes, only run rust jobs on rust …
rmccorm4 Aug 29, 2025
4d162c1
chore: add more reasoning and tool call parsers explicitly, remove un…
nachiketb-nvidia Aug 29, 2025
b9c3714
chore: rm not needed kvbm Dockerfile (#2789)
ziqifan617 Aug 29, 2025
ccb8288
fix: add dev tools to devcontainer (#2777)
alec-flowers Aug 29, 2025
bd7c943
feat: add benchmarking guide (#2620)
hhzhang16 Aug 30, 2025
fb9f092
feat: add logits processor support for trtllm backend (#2702)
bhuvan002 Aug 30, 2025
d897b3c
feat: DIS-373 dynamo KVBM connector API integration with TRTLLM (#2544)
richardhuo-nv Aug 30, 2025
9b3f701
Merge branch 'main' into jasonzho/png
jasonqinzhou Aug 30, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/benchmarks/pre_deployment_profiling.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ The script will first detect the number of available GPUs on the current nodes (

After the profiling finishes, two plots will be generated in the `output-dir`. For example, here are the profiling results for `examples/llm/configs/disagg.yaml`:

![Prefill Performance](../images/h100_prefill_performance.png)
![Decode Performance](../images/h100_decode_performance.png)
![Prefill Performance](../../docs/images/h100_prefill_performance.png)
![Decode Performance](../../docs/images/h100_decode_performance.png)

For the prefill performance, the script will plot the TTFT for different TP sizes and select the best TP size that meet the target TTFT SLA and delivers the best throughput per GPU. Based on how close the TTFT of the selected TP size is to the SLA, the script will also recommend the upper and lower bounds of the prefill queue size to be used in planner.

Expand Down
Loading