Skip to content

Conversation

@sroy745
Copy link
Collaborator

@sroy745 sroy745 commented Mar 23, 2025

No description provided.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the v1 label Mar 23, 2025
sroy745 added 3 commits March 23, 2025 01:14
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Signed-off-by: Sourashis Roy <sroy@roblox.com>
sroy745 added 11 commits March 24, 2025 06:47
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Comment on lines +26 to +32
self, *, target_model_input_ids: Tensor,
target_model_positions: Tensor, target_model_hidden_states: Tensor,
target_model_seq_lens: list[int],
sampled_token_ids: list[list[int]],
next_prompt_token_ids: list[list[int]], is_prefill: list[bool],
num_draft_tokens_to_propose: int,
attention_metadata: FlashAttentionMetadata) -> list[SamplerOutput]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please append , at the end of the input parameters and re-run the formatter, so that each input parameter can take a line. I really recommend this because otw adding/removing an input parameter can change the format again.

Suggested change
self, *, target_model_input_ids: Tensor,
target_model_positions: Tensor, target_model_hidden_states: Tensor,
target_model_seq_lens: list[int],
sampled_token_ids: list[list[int]],
next_prompt_token_ids: list[list[int]], is_prefill: list[bool],
num_draft_tokens_to_propose: int,
attention_metadata: FlashAttentionMetadata) -> list[SamplerOutput]:
self,
*,
target_model_input_ids: Tensor,
target_model_positions: Tensor,
target_model_hidden_states: Tensor,
target_model_seq_lens: list[int],
sampled_token_ids: list[list[int]],
next_prompt_token_ids: list[list[int]],
is_prefill: list[bool],
num_draft_tokens_to_propose: int,
attention_metadata: FlashAttentionMetadata,
) -> list[SamplerOutput]:

Generates speculative draft token IDs using the Eagle model.

This function aligns the Eagle model's KV cache with the target
model’s output before generating speculative tokens. It first
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
models output before generating speculative tokens. It first
model's output before generating speculative tokens. It first

Copy link
Collaborator

@LiuXiaoxuanPKU LiuXiaoxuanPKU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some early partial comments, will finish one pass tomorrow.

"""
self._vllm_config = vllm_config
self._model = model
self._sampling_metadata = sampling_metadata
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a field of eagle proposer? should it be passed in every proposing step?

Tokens: [T12, T13, T14, T15, T22, T23, T24, T32]
Positions: [0, 1, 2, 3, 9, 10, 11, 44]
Previous Hidden States: [H11, H12, H13, H14, H21, H22, H23, H31]
Sampled Tokens: [[T16], [T25], [T33']]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: why is this example different from the example of input? Maybe just use the same example?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the same example IIUC. The above one shows all the information including inputs and outputs, while this part only shows the inputs.

Note that for S1, we drop T11 (position 0). For S2 and S3,
T21 and T31 are skipped since they were processed earlier.
Eagle positions are always one less than the target model
due to dropping the first token.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
due to dropping the first token.
due to dropping the first token. For example, T12 has position 1 when running the traget model, while its position is 0 when running the Eagle head.

model.
target_model_hidden_states: Hidden states from the target model.
target_model_seq_lens: Sequence lengths in the target model.
sampled_token_ids: Previously sampled/accepted tokens from the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sampled_token_ids: Previously sampled/accepted tokens from the
sampled_token_ids: Generated tokens from the previous generation step.

next_prompt_token_ids: The next prompt token for a sequence if it
is a partial prefill sequence and empty otherwise.
is_prefill: Boolean flags indicating prefill sequences.
num_draft_tokens_to_propose: Number of speculative tokens to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A shorter name? num_spec_tokens?

# Determine expected sequence lengths in the Eagle model:
# - For prefill sequences, lengths remain unchanged.
# - For decoding sequences, lengths match the number of
# accepted tokens.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is expected sequence length? A bit more context?

Comment on lines +49 to +52
Tokens: [T11, T12, T13, T14, T21, T22, T23, T31, T32, T33]
Positions: [0, 1, 2, 3, 9, 10, 11, 44, 45, 46]
Hidden States: [H11, H12, H13, H14, H21, H22, H23, H31, H32, H33]
Sampled Tokens: [[T15], [], [T32]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't get this. Where is T33 from?
Do you mean this?

Input Tokens: [[T11, T12, T13, T14], [T21, T22, T23], [T31, T32 (draft), T33 (draft)]]
Sampled Tokens: [[T15], [], [T32' (recovered token)]]

target_model_positions: Tensor, target_model_hidden_states: Tensor,
target_model_seq_lens: list[int],
sampled_token_ids: list[list[int]],
next_prompt_token_ids: list[list[int]], is_prefill: list[bool],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does is_prefill mean?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you know, there's no "prefill" in V1.

@v-lmn
Copy link

v-lmn commented Apr 1, 2025

hwo to load model by tp for eagle

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants