perf: Only compute checksums on debug builds #2446

jthomson04 · 2025-08-14T18:22:47Z

As pointed out by @ryanolson, we're computing checksums for all our messages. We should only be doing this on debug builds. From some preliminary benchmarking, seems like this could reduce our message send/recv overhead by ~25%.

Summary by CodeRabbit

Refactor
- Optimized message encoding/decoding to reduce overhead in production builds by bypassing checksum computation while preserving the existing wire format and message sizing.
- Development builds retain checksum validation for safety. No changes to public interfaces. Users should see improved throughput and lower latency in production without any change in behavior or compatibility.

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

coderabbitai · 2025-08-14T18:30:04Z

Walkthrough

Adds cfg(debug_assertions)-gated checksum behavior in two_part codec: debug builds compute/verify xxh3_64 over header+payload (only if provided checksum != 0); release builds write 0 and skip hashing/verification, preserving layout and sizes.

Changes

Cohort / File(s)	Summary of Changes
Network codec checksum gating `lib/runtime/src/pipeline/network/codec/two_part.rs`	Conditional checksum: debug builds compute/verify xxh3_64; release builds emit 0 and skip hashing. Decode only validates when checksum != 0. Data layout and sizes unchanged; no public API signature changes.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant Encoder
  participant Stream

  Caller->>Encoder: encode(header, data)
  alt debug_assertions
    Encoder->>Encoder: compute xxh3_64(header+data)
    Encoder->>Stream: write header, data, checksum
  else release
    Encoder->>Stream: write header, data, checksum=0
  end

sequenceDiagram
  participant Stream
  participant Decoder
  participant Caller

  Stream->>Decoder: read header, data, checksum
  alt debug_assertions and checksum != 0
    Decoder->>Decoder: compute xxh3_64(header+data)
    alt match
      Decoder-->>Caller: deliver message
    else mismatch
      Decoder-->>Caller: error(ChecksumMismatch)
    end
  else (release or checksum==0)
    Decoder-->>Caller: deliver message (no verification)
  end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

I nibble bytes with whiskered care,
In debug, hashes scent the air—
In release, I hop past checks,
Still the frames keep their specs.
Carrots compiled, packets neat,
Two-part rhythm, crisp and sweet. 🥕

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (3)

lib/runtime/src/pipeline/network/codec/two_part.rs (3)
70-76: Consider guarding total_len computation against overflow

While pre-existing, total_len = 24 + header_len + body_len can overflow on malformed input. Using checked_add here allows returning a clean error rather than risking panics later.

Rust snippet (applies outside the changed block):
let total_len = 24usize
    .checked_add(header_len)
    .and_then(|v| v.checked_add(body_len))
    .ok_or_else(|| TwoPartCodecError::InvalidMessage("total_len overflow".to_string()))?;
18-18: Gate xxhash import to avoid unused_imports in release builds — fix required

xxh3_64 is only used inside #[cfg(debug_assertions)] blocks but the import is unconditional; in release builds the import will be unused and can trigger warnings (CI runs cargo clippy with -D warnings).

Files to update:

lib/runtime/src/pipeline/network/codec/two_part.rs

line 18: unconditional import use xxhash_rust::xxh3::xxh3_64;

lines ~94 and ~138: uses of xxh3_64 are inside #[cfg(debug_assertions)]

CI: .github/workflows/build-and-test.yml and .github/workflows/pre-merge-rust.yml run cargo clippy ... -D warnings

Suggested change:
-use xxhash_rust::xxh3::xxh3_64;
+#[cfg(debug_assertions)]
+use xxhash_rust::xxh3::xxh3_64;
452-480: Gate checksum-mismatch tests to debug mode to avoid failures under cargo test --release

Short: encode/decode compute/verify checksum only under #[cfg(debug_assertions)], so tests that assert ChecksumMismatch will fail in release builds. I inspected lib/runtime/src/pipeline/network/codec/two_part.rs and verified the guards; CI workflows (.github/workflows/build-and-test.yml, .github/workflows/pre-merge-rust.yml) run cargo test without --release, but this change is recommended to avoid local/other CI --release runs.

Files needing attention:

lib/runtime/src/pipeline/network/codec/two_part.rs

fn test_checksum_mismatch() (around lines 452-480)

async fn test_streaming_corrupted_data() (around lines ~643-672)

Apply these corrections (note: keep the test attribute; add the cfg above it):
-#[test]
-fn test_checksum_mismatch() {
+#[cfg(debug_assertions)]
+#[test]
+fn test_checksum_mismatch() {
     ...
 }
-#[tokio::test]
-async fn test_streaming_corrupted_data() {
+#[cfg(debug_assertions)]
+#[tokio::test]
+async fn test_streaming_corrupted_data() {
     ...
 }
Please apply the change (or confirm if you intentionally want these tests to run under release builds).

🧹 Nitpick comments (2)

lib/runtime/src/pipeline/network/codec/two_part.rs (2)

129-145: Avoid extra copy during checksum computation in debug mode

The current approach allocates a new buffer and copies header+data just to hash. Compute the checksum incrementally to eliminate the copy.

         // Only compute the checksum in debug mode.
         // If we're in release mode, put a dummy value.
         #[cfg(debug_assertions)]
         {
-            // Compute checksum of the data
-            let mut data_to_hash = BytesMut::with_capacity(header_len + body_len);
-            data_to_hash.extend_from_slice(&item.header);
-            data_to_hash.extend_from_slice(&item.data);
-            let checksum = xxh3_64(&data_to_hash);
-
-            dst.put_u64(checksum);
+            // Compute checksum over header and data without extra allocation
+            let mut hasher = xxhash_rust::xxh3::Xxh3::new();
+            hasher.update(&item.header);
+            hasher.update(&item.data);
+            dst.put_u64(hasher.digest());
         }
         #[cfg(not(debug_assertions))]
         {
             dst.put_u64(0);
         }

66-69: Prevent unused variable warning in release by conditionally binding checksum

In release builds, checksum is currently read but unused. Bind it conditionally to avoid an unused variable warning.

Rust snippet (applies outside the changed block):

// Replace:
let checksum = cursor.get_u64();

// With:
#[cfg(debug_assertions)]
let checksum = cursor.get_u64();
#[cfg(not(debug_assertions))]
let _ = cursor.get_u64(); // still advance the cursor by 8 bytes

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d0a6363 and 7063670.

📒 Files selected for processing (1)

lib/runtime/src/pipeline/network/codec/two_part.rs (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Build and Test - dynamo
GitHub Check: pre-merge-rust (.)
GitHub Check: pre-merge-rust (lib/bindings/python)

lib/runtime/src/pipeline/network/codec/two_part.rs

rmccorm4

From some preliminary benchmarking, seems like this could reduce our message send/recv overhead by ~25%

Can you share some commands or snippets related to how you observed this? Would be a good perf/debug tip for others.

rmccorm4

LGTM - but looks like a good suggestion from coderabbit: https://github.com/ai-dynamo/dynamo/pull/2446/files#r2277430507

jthomson04 · 2025-08-14T18:57:45Z

From some preliminary benchmarking, seems like this could reduce our message send/recv overhead by ~25%

Can you share some commands or snippets related to how you observed this? Would be a good perf/debug tip for others.

Extent of my benchmarking so far has been with nvtx. Added a couple nvtx regions, ran nsys profile --trace nvtx,python-gil python3 -m dynamo.frontend, and sent a bunch of requests. Example output from that:

ryanolson

curious if you detect a noticeable change in cpu usage? -- missed the comment ^^

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

Signed-off-by: jthomson04 <jwillthomson19@gmail.com> Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

only do checksums on debug builds

7063670

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

jthomson04 requested a review from a team as a code owner August 14, 2025 18:22

pull-request-size bot added the size/M label Aug 14, 2025

copy-pr-bot bot temporarily deployed to GITLAB August 14, 2025 18:22 Inactive

github-actions bot added the perf label Aug 14, 2025

copy-pr-bot bot had a problem deploying to GITLAB August 14, 2025 18:24 Failure

coderabbitai bot reviewed Aug 14, 2025

View reviewed changes

lib/runtime/src/pipeline/network/codec/two_part.rs Show resolved Hide resolved

rmccorm4 reviewed Aug 14, 2025

View reviewed changes

rmccorm4 approved these changes Aug 14, 2025

View reviewed changes

ryanolson approved these changes Aug 14, 2025

View reviewed changes

Appease the coderabbit

d91c1f0

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

copy-pr-bot bot temporarily deployed to GITLAB August 14, 2025 20:11 Inactive

copy-pr-bot bot had a problem deploying to GITLAB August 14, 2025 20:11 Failure

jthomson04 merged commit 9ddb3ef into main Aug 14, 2025
11 of 12 checks passed

jthomson04 deleted the jthomson04/skip-checksum-release-build branch August 14, 2025 21:39

rmccorm4 mentioned this pull request Aug 22, 2025

fix: Skip checksum tests in release mode since they're not computed #2669

Merged

hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025

perf: Only compute checksums on debug builds (#2446)

3482b82

Signed-off-by: jthomson04 <jwillthomson19@gmail.com> Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: Only compute checksums on debug builds #2446

perf: Only compute checksums on debug builds #2446

Uh oh!

jthomson04 commented Aug 14, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 14, 2025

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

rmccorm4 left a comment •

edited

Loading

Uh oh!

rmccorm4 left a comment

Uh oh!

jthomson04 commented Aug 14, 2025 •

edited

Loading

Uh oh!

ryanolson left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

perf: Only compute checksums on debug builds #2446

perf: Only compute checksums on debug builds #2446

Uh oh!

Conversation

jthomson04 commented Aug 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 14, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rmccorm4 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rmccorm4 left a comment

Choose a reason for hiding this comment

Uh oh!

jthomson04 commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ryanolson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jthomson04 commented Aug 14, 2025 •

edited by coderabbitai bot

Loading

rmccorm4 left a comment •

edited

Loading

jthomson04 commented Aug 14, 2025 •

edited

Loading

ryanolson left a comment •

edited

Loading