Skip to content

Conversation

@PeaBrane
Copy link
Contributor

@PeaBrane PeaBrane commented May 15, 2025

Overview:

The scope of this PR is well described in the committed README

Introduces two extra Python dependencies: networkx and pandas

Open discussion:

  1. What is the best way to package the benchmarks directory with dynamo?
  2. How should the directory structure be configured?

Summary by CodeRabbit

  • New Features

    • Introduced a comprehensive benchmarking toolkit for performance evaluation, including tools for analyzing and synthesizing prefix-structured data.
    • Added a command-line interface for benchmarking utilities.
  • Documentation

    • Added detailed README files for benchmarking tools, data generator, and usage instructions.
    • Updated performance tuning guide with new recommendations and future plans for automatic tuning.
  • Bug Fixes

    • Improved the testing workflow to ensure benchmark packages are properly installed before running tests.
  • Tests

    • Added unit tests for data hashing, empirical sampling, and synthetic data generation modules.
  • Chores

    • Added configuration files for package management and testing.

@PeaBrane PeaBrane enabled auto-merge (squash) June 5, 2025 00:44
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 5, 2025

Walkthrough

This update introduces a comprehensive benchmarking and data synthesis toolkit under the benchmarks directory, including a modular Python package with CLI tools for analyzing and generating synthetic prefix-structured data. It adds detailed documentation, configuration files, and a suite of unit tests. The GitHub Actions workflow is updated to install the benchmarks package before running tests.

Changes

File(s) Change Summary
.github/workflows/pre-merge-python.yml Modified test workflow to install benchmarks package in editable mode before running pytest inside Docker.
benchmarks/README.md, benchmarks/data_generator/README.md Added documentation files describing benchmarking tools, usage, and data formats.
benchmarks/pyproject.toml Added project configuration, dependencies, entry points, and test/type-checking settings for the benchmarks package.
benchmarks/data_generator/init.py, benchmarks/data_generator/cli.py Introduced CLI entry points for the data generator package.
benchmarks/data_generator/graph_utils.py, .../hasher.py, .../logging.py, .../protocols.py, .../sampler.py Added utility modules for graph operations, hashing, logging, protocol constants, and empirical data sampling.
benchmarks/data_generator/prefix_analyzer.py Added a class and CLI for analyzing prefix statistics and cache hit rates from trace data.
benchmarks/data_generator/synthesizer.py Implemented a synthesizer class and CLI for generating synthetic datasets based on prefix/radix tree statistics.
benchmarks/data_generator/tests/test_hasher.py, .../test_sampler.py, .../test_synthesizer.py Added unit tests for hashing, sampling, and synthesizer graph structure.
docs/guides/kv_router_perf_tuning.md Updated documentation to mention new analysis tools and future auto-tuning plans.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI (datagen)
    participant Analyzer
    participant Synthesizer
    participant FileSystem

    User->>CLI (datagen): Run "datagen analyze <input>"
    CLI (datagen)->>Analyzer: Pass arguments
    Analyzer->>FileSystem: Load dataset
    Analyzer->>Analyzer: Analyze prefixes, cache hits
    Analyzer->>User: Print statistics

    User->>CLI (datagen): Run "datagen synthesize <input> [options]"
    CLI (datagen)->>Synthesizer: Pass arguments
    Synthesizer->>FileSystem: Load dataset
    Synthesizer->>Synthesizer: Build radix tree, sample paths
    Synthesizer->>FileSystem: Write synthetic dataset
    Synthesizer->>User: Print summary
Loading

Poem

In the warren, code does bloom,
Benchmarks sprout and data loom.
Prefix trees and hashes spin,
Synthesizers leap right in!
Rabbits test and analyze,
With graphs and stats, they optimize—
Hopping forward, swift and bright,
New tools make benchmarks light! 🐇✨


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🔭 Outside diff range comments (1)
benchmarks/data_generator/logging.py (1)

1-56: 🛠️ Refactor suggestion

Module name conflicts with Python's built-in logging module.

The filename logging.py conflicts with Python's standard library logging module, which could cause import confusion. Consider renaming to something more specific like statistics.py, metrics_utils.py, or stats_logger.py.

The function implementation itself is well-designed with proper type hints, documentation, and statistical calculations.

🧹 Nitpick comments (12)
benchmarks/README.md (1)

1-14: Consider using standard markdown format for license header.

The HTML comment wrapper for the license header is unusual in markdown files. Consider either removing the comment wrapper entirely or using standard markdown syntax.

The documentation content is clear and concise.

.github/workflows/pre-merge-python.yml (1)

80-80: Add missing newline at end of file.

Static analysis detected a missing newline character at the end of the file.

-          path: ${{ github.event_path }}
+          path: ${{ github.event_path }}
+
🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 80-80: no new line character at the end of file

(new-line-at-end-of-file)

benchmarks/data_generator/protocols.py (1)

20-22: LGTM: Clean protocol constants with good documentation.

The use of negative integers for special nodes is a solid design choice that avoids conflicts with real node IDs. Consider adding type annotations for better code clarity:

-SUPER_ROOT = -1  # Dummy node preceding all real nodes; not an actual data root
-CACHE_END = -2  # Special node indicating end of a path
-END_NODE = -3  # Special node indicating to skip leaf sampling
+SUPER_ROOT: int = -1  # Dummy node preceding all real nodes; not an actual data root
+CACHE_END: int = -2  # Special node indicating end of a path
+END_NODE: int = -3  # Special node indicating to skip leaf sampling
benchmarks/pyproject.toml (1)

42-49: Consider moving pytest-mypy to development dependencies.

The pytest-mypy package is typically used during development/testing and might be better placed in an optional dev dependencies group rather than required dependencies.

Consider restructuring like this:

 dependencies = [
     "networkx",
     "pandas",
     "tabulate",
     "types-tabulate",
     "transformers",
-    "pytest-mypy",
 ]
+
+[project.optional-dependencies]
+dev = [
+    "pytest-mypy",
+]
benchmarks/data_generator/sampler.py (2)

38-47: Remove unnecessary else branch
np.random.rand() is returned only when rng is None, so the else is redundant and flagged by Pylint R1705.

-    if rng is not None:
-        return data[np.searchsorted(cdf, rng.random())]
-    else:
-        return data[np.searchsorted(cdf, np.random.rand())]
+    rnd = rng.random() if rng is not None else np.random.rand()
+    return data[np.searchsorted(cdf, rnd)]
🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 44-47: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)


58-64: Make random seed configurable
Hard-coding default_rng(0) makes every EmpiricalSampler deterministic and highly correlated. Accept an optional seed or Generator instead.

benchmarks/data_generator/README.md (2)

20-25: Specify code-block language to satisfy markdownlint and enable syntax highlight

-```
+```json

(Apply similarly to all fenced blocks.)

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

20-20: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


114-116: Minor grammar fix – singular agreement

“each node need to store” → “each node needs to store”

🧰 Tools
🪛 LanguageTool

[grammar] ~114-~114: “Node” is a singular noun. It appears that the verb form is incorrect.
Context: ...the parent. As a consequence, each node need to store an attribute length to indic...

(PCT_SINGULAR_NOUN_PLURAL_VERB_AGREEMENT)


[uncategorized] ~116-~116: “the” seems less likely than “they”.
Context: ...o sample a path in the core radix tree, the append the path with new hash ids corre...

(AI_HYDRA_LEO_CP_THE_THEY)

benchmarks/data_generator/prefix_analyzer.py (2)

60-67: Docstring & return type mismatch

Docstring says “Tuple”, type annotation is dict[str, list]. Update one of them for consistency.


44-50: Prefer logging over print for large datasets
print statements spam stdout when analyzing big traces and can’t be filtered. Switch to the logging module.

benchmarks/data_generator/synthesizer.py (2)

81-94: Improve assertion error messages for better debugging.

The assertion error messages could be more descriptive to help users understand what went wrong.

-assert (
-    isinstance(self.num_copies, int) and self.num_copies >= 1
-), "num_copies must be an integer greater than or equal to 1"
+assert (
+    isinstance(self.num_copies, int) and self.num_copies >= 1
+), f"num_copies must be an integer >= 1, got {self.num_copies} (type: {type(self.num_copies)})"

-assert (
-    isinstance(self.speedup_ratio, float) and self.speedup_ratio > 0
-), "speedup_ratio must be a positive float"
+assert (
+    isinstance(self.speedup_ratio, float) and self.speedup_ratio > 0
+), f"speedup_ratio must be a positive float, got {self.speedup_ratio} (type: {type(self.speedup_ratio)})"

-assert (
-    isinstance(self.prefix_len_multiplier, float)
-    and self.prefix_len_multiplier > 0
-), "context_len_multiplier must be a positive float"
+assert (
+    isinstance(self.prefix_len_multiplier, float)
+    and self.prefix_len_multiplier > 0
+), f"prefix_len_multiplier must be a positive float, got {self.prefix_len_multiplier} (type: {type(self.prefix_len_multiplier)})"

-assert (
-    isinstance(self.prompt_len_multiplier, float)
-    and self.prompt_len_multiplier > 0
-), "prompt_len_multiplier must be a positive float"
+assert (
+    isinstance(self.prompt_len_multiplier, float)
+    and self.prompt_len_multiplier > 0
+), f"prompt_len_multiplier must be a positive float, got {self.prompt_len_multiplier} (type: {type(self.prompt_len_multiplier)})"

184-184: Add better error message for input length validation.

The assertion should provide more context about what values caused the failure.

-assert np.all(0 < input_lens_mod) and np.all(input_lens_mod <= self.block_size)
+invalid_low = input_lens_mod[input_lens_mod <= 0]
+invalid_high = input_lens_mod[input_lens_mod > self.block_size]
+assert len(invalid_low) == 0 and len(invalid_high) == 0, (
+    f"Invalid input lengths found: {len(invalid_low)} values <= 0, "
+    f"{len(invalid_high)} values > block_size ({self.block_size})"
+)
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a20445d and 44a27bf.

📒 Files selected for processing (17)
  • .github/workflows/pre-merge-python.yml (2 hunks)
  • benchmarks/README.md (1 hunks)
  • benchmarks/data_generator/README.md (1 hunks)
  • benchmarks/data_generator/__init__.py (1 hunks)
  • benchmarks/data_generator/cli.py (1 hunks)
  • benchmarks/data_generator/graph_utils.py (1 hunks)
  • benchmarks/data_generator/hasher.py (1 hunks)
  • benchmarks/data_generator/logging.py (1 hunks)
  • benchmarks/data_generator/prefix_analyzer.py (1 hunks)
  • benchmarks/data_generator/protocols.py (1 hunks)
  • benchmarks/data_generator/sampler.py (1 hunks)
  • benchmarks/data_generator/synthesizer.py (1 hunks)
  • benchmarks/data_generator/tests/test_hasher.py (1 hunks)
  • benchmarks/data_generator/tests/test_sampler.py (1 hunks)
  • benchmarks/data_generator/tests/test_synthesizer.py (1 hunks)
  • benchmarks/pyproject.toml (1 hunks)
  • docs/guides/kv_router_perf_tuning.md (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (5)
benchmarks/data_generator/__init__.py (3)
benchmarks/data_generator/synthesizer.py (1)
  • main (350-455)
benchmarks/data_generator/cli.py (1)
  • main (20-48)
benchmarks/data_generator/prefix_analyzer.py (1)
  • main (156-183)
benchmarks/data_generator/tests/test_sampler.py (1)
benchmarks/data_generator/sampler.py (2)
  • EmpiricalSampler (50-69)
  • sample (66-69)
benchmarks/data_generator/cli.py (2)
benchmarks/data_generator/synthesizer.py (1)
  • main (350-455)
benchmarks/data_generator/prefix_analyzer.py (1)
  • main (156-183)
benchmarks/data_generator/tests/test_hasher.py (1)
benchmarks/data_generator/hasher.py (1)
  • texts_to_hashes (21-74)
benchmarks/data_generator/graph_utils.py (1)
benchmarks/data_generator/sampler.py (1)
  • get_cdf (26-28)
🪛 YAMLlint (1.37.1)
.github/workflows/pre-merge-python.yml

[error] 80-80: no new line character at the end of file

(new-line-at-end-of-file)

🪛 LanguageTool
benchmarks/data_generator/README.md

[uncategorized] ~42-~42: Loose punctuation mark.
Context: ...-size <block_size> ``` - --input-file: Path to your trace file in jsonl format...

(UNLIKELY_OPENING_PUNCTUATION)


[typographical] ~97-~97: The conjunction “so that” does not have a comma in front.
Context: ... of being incremented by a large integer, so that they will be effectively separated into...

(SO_THAT_UNNECESSARY_COMMA)


[style] ~97-~97: Consider using a different adverb to strengthen your wording.
Context: ...tistics of the original one, but having completely different roots. For example, if rows ...

(COMPLETELY_ENTIRELY)


[grammar] ~114-~114: “Node” is a singular noun. It appears that the verb form is incorrect.
Context: ...the parent. As a consequence, each node need to store an attribute length to indic...

(PCT_SINGULAR_NOUN_PLURAL_VERB_AGREEMENT)


[uncategorized] ~116-~116: “the” seems less likely than “they”.
Context: ...o sample a path in the core radix tree, the append the path with new hash ids corre...

(AI_HYDRA_LEO_CP_THE_THEY)


[style] ~131-~131: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...-to-end test. It is important to sample a large number of requests (e.g., hundreds of thousands) ...

(LARGE_NUMBER_OF)


[misspelling] ~131-~131: This word is normally spelled with a hyphen.
Context: ...statistics (such as mean ISL) should be well preserved in the synthetic data. However, the sta...

(EN_COMPOUNDS_WELL_PRESERVED)

🪛 markdownlint-cli2 (0.17.2)
benchmarks/data_generator/README.md

20-20: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


76-76: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


87-87: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


101-101: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


121-121: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


127-127: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

🪛 Pylint (3.3.7)
benchmarks/data_generator/prefix_analyzer.py

[refactor] 22-22: Too few public methods (1/2)

(R0903)

benchmarks/data_generator/sampler.py

[refactor] 44-47: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)


[refactor] 50-50: Too few public methods (1/2)

(R0903)

benchmarks/data_generator/tests/test_synthesizer.py

[refactor] 40-40: Too many arguments (6/5)

(R0913)


[refactor] 40-40: Too many positional arguments (6/5)

(R0917)

benchmarks/data_generator/synthesizer.py

[refactor] 32-32: Too many instance attributes (12/7)

(R0902)


[refactor] 33-33: Too many arguments (7/5)

(R0913)


[refactor] 33-33: Too many positional arguments (7/5)

(R0917)


[refactor] 33-33: Too many local variables (29/15)

(R0914)


[refactor] 33-33: Too many branches (15/12)

(R0912)


[refactor] 33-33: Too many statements (67/50)

(R0915)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (9)
benchmarks/data_generator/__init__.py (1)

16-21: Clean package entry point implementation.

The implementation correctly provides a package-level entry point that delegates to the CLI main function. This follows standard Python packaging patterns.

.github/workflows/pre-merge-python.yml (1)

57-57: Installation step is necessary for testing the new package.

The addition of pip install -e /workspace/benchmarks before running pytest is correct and necessary to ensure the new benchmarks package is available during testing.

docs/guides/kv_router_perf_tuning.md (1)

68-69: LGTM: Excellent documentation enhancement.

The added paragraph effectively introduces users to the available analysis tools while setting expectations for future automatic tuning capabilities. This provides valuable context for the current manual tuning requirements.

benchmarks/data_generator/tests/test_sampler.py (1)

22-46: LGTM: Well-designed statistical test.

The test effectively validates the EmpiricalSampler's distribution behavior:

  • Uses appropriate sample size (1000) for statistical significance
  • Reasonable tolerance range (300-400) for ~333 expected occurrences
  • Properly validates absence of unexpected values
  • Clear test structure and assertions

The statistical bounds are appropriate for catching significant distribution deviations while allowing for normal sampling variance.

benchmarks/data_generator/tests/test_hasher.py (2)

22-42: Well-designed test fixture.

The dummy tokenizer fixture is properly constructed with appropriate normalization, pre-tokenization, and decoding components. The simple vocabulary makes the test predictable and easy to debug.


45-54: Comprehensive test case for block hashing.

The test effectively validates the rolling hash behavior with different text combinations, ensuring that the same prefix ("a b c d") produces consistent hash IDs while different continuations produce different chains.

benchmarks/data_generator/hasher.py (1)

21-74: Solid rolling hash implementation.

The function correctly implements rolling hash computation with efficient batch tokenization. The hash-to-integer mapping ensures deterministic and consistent hash IDs across different runs.

benchmarks/pyproject.toml (1)

16-82: Well-structured project configuration.

The pyproject.toml file is comprehensive with appropriate metadata, dependencies, and tool configurations. The CLI entry point and package setup will enable proper installation and usage of the benchmarking tools.

benchmarks/data_generator/tests/test_synthesizer.py (1)

75-96: Comprehensive graph structure validation.

The test effectively validates the synthesizer's graph construction by checking node relationships and attributes. The test data covers various scenarios with different hash sequences.

@PeaBrane PeaBrane merged commit 9cdba76 into main Jun 5, 2025
9 checks passed
@PeaBrane PeaBrane deleted the rupei/benchmark-tree branch June 5, 2025 01:58
PYTEST_MARKS: "pre_merge or mypy"
run: |
docker run -w /workspace --name ${{ env.CONTAINER_ID }}_pytest ${{ steps.define_image_tag.outputs.image_tag }} pytest --basetemp=/tmp --junitxml=${{ env.PYTEST_XML_FILE }} -m "${{ env.PYTEST_MARKS }}"
docker run -w /workspace --name ${{ env.CONTAINER_ID }}_pytest ${{ steps.define_image_tag.outputs.image_tag }} bash -c "pip install -e /workspace/benchmarks && pytest --basetemp=/tmp --junitxml=${{ env.PYTEST_XML_FILE }} -m \"${{ env.PYTEST_MARKS }}\""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PeaBrane it looks like you don't have signed commits enabled, so the gitlab PR didn't get triggered. These changes look like they're failing in similar tests on gitlab side because the benchmarks package doesn't get installed, so mypy doesn't know about the import.

The reason for duplicate tests on gitlab side is to access wider pool of GPU runners for GPU testing.

ex: https://gitlab-master.nvidia.com/dl/ai-dynamo/dynamo/-/jobs/175820836

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw - was the correct fix to pip install in the test step here? Or would it make more sense to install in the Dockerfile itself so it's available to all? CC @nnshah1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can the benchmarks directory be moved under tests?
  2. Should the benchmark dependencies be added to the Dockerfile? If yes, which container image/stage should they be included in? Or can they be added to requirements.test.txt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend adding it to the docker file - as part of dev or ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants