-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] generated result changed when using multiple prompts #1570
Comments
Please correct me if I missed any hyperparameter settings. |
I guess maybe the promble is introduced by the batch dimension and attn_bias When we have multiple prompts, we need to do padding in the worker, and the padding format is pad_on_right def set_attn_bias(
self,
input_metadata: InputMetadata,
dtype: torch.dtype,
) -> None:
del dtype # Unused.
if input_metadata.attn_bias is not None:
# Already set by a previous layer.
return
prompt_lens = [input_metadata.max_prompt_len
] * input_metadata.num_prompts
attn_bias = BlockDiagonalCausalMask.from_seqlens(prompt_lens)
if self.sliding_window is not None:
attn_bias = attn_bias.make_local_attention(self.sliding_window)
input_metadata.attn_bias = attn_bias |
Does #1546 fix your issue? |
LGTM, it works. |
branch: main
commit: 8516999
test gpu: Nvidia A10
test code:
result:
but if we uncomment the last two prompt
result
Since I didn't set the appropriate sampling parameters, I wasn't expecting to generate very good results.
But I think the first two generated results should not change with the greedy search.
The text was updated successfully, but these errors were encountered: