Skip to content

Conversation

@jain-ria
Copy link
Contributor

@jain-ria jain-ria commented Jul 24, 2025

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Introduced configuration files for drafter and verifier components, enabling customizable engine and runtime parameters.
    • Added a launch script for orchestrating frontend, verifier, and multiple drafter workers for distributed model serving.
    • Implemented drafter and verifier worker services for distributed runtime, supporting speculative decoding and event publishing.
    • Added utilities for handling draft token generation via internal endpoints and encoding/decoding distributed parameters.
  • Chores

    • Added license headers to new and existing files for compliance.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jul 24, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@jain-ria jain-ria force-pushed the rjain/trtllm-spec-dec branch from 0d66e21 to 7868600 Compare August 7, 2025 19:07
@jain-ria jain-ria requested a review from tanmayv25 August 8, 2025 18:35
@jain-ria jain-ria force-pushed the rjain/trtllm-spec-dec branch 4 times, most recently from 6deadce to 9e929ab Compare August 8, 2025 20:13
@jain-ria jain-ria force-pushed the rjain/trtllm-spec-dec branch from 9e929ab to e9b5a41 Compare August 8, 2025 21:55
draft_tokens.extend(chunk_data.get("token_ids", []))
if len(draft_tokens) >= self.max_draft_len:
break
print(f"DRAFTER: {draft_tokens}\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use logging instead of print.

)
namespace, component, endpoint = parts

# create minimal runtime for client access only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? why not create an endpoint client like this?

export NUM_DRAFTERS=2
export DRAFTER_CUDA_VISIBLE_DEVICES:-"1,2"
./launch/spec_dec.sh
``` No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe the request flow in this setup as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be a mermaid diagram?

@jain-ria jain-ria changed the title feat: async spec dec + trtllm example feat: async + parallel spec dec trtllm example Aug 28, 2025
@jain-ria jain-ria force-pushed the rjain/trtllm-spec-dec branch from a2a6f28 to 16a418e Compare August 28, 2025 23:23
@jain-ria jain-ria force-pushed the rjain/trtllm-spec-dec branch from 16a418e to cc7b21a Compare August 28, 2025 23:41
@github-actions
Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Sep 28, 2025
@github-actions
Copy link

This PR has been closed due to inactivity. If you believe this PR is still relevant, please feel free to reopen it with additional context or information.

@github-actions github-actions bot closed this Oct 18, 2025
@github-actions github-actions bot deleted the rjain/trtllm-spec-dec branch October 18, 2025 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants