Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

top_k_top_p_sampling_from_probs ocassionally output invalid token_ids #384

Closed
hnyls2002 opened this issue Jul 20, 2024 · 15 comments
Closed

Comments

@hnyls2002
Copy link
Contributor

In some dummy input_ids, there are some unexpected sampled results

tensor([     29919,        903,      30140, 1061800448,        903]
@hnyls2002
Copy link
Contributor Author

Even in the dummy input case, the error is hard to reproduce...

@yzh119
Copy link
Collaborator

yzh119 commented Jul 20, 2024

what's the output of success?

@ispobock
Copy link
Contributor

ispobock commented Jul 20, 2024

@yzh119 @hnyls2002 I got a reproducible example:

import torch
from flashinfer.sampling import top_k_top_p_sampling_from_probs

probs = torch.zeros([1, 32000]).cuda()
probs[0][1] = 1
max_top_k_round, batch_size = 32, probs.shape[0]
top_ks = torch.Tensor([1073741824]).to(torch.int32).cuda()
top_ps = torch.Tensor([1.]).cuda()
for i in range(10):
    uniform_samples = torch.rand((max_top_k_round, batch_size), device=probs.device)
    batch_next_token_ids, _ = top_k_top_p_sampling_from_probs(probs, uniform_samples, top_ks, top_ps)
    print(batch_next_token_ids)

output:

tensor([0], device='cuda:0', dtype=torch.int32)
tensor([1062357376], device='cuda:0', dtype=torch.int32)
tensor([0], device='cuda:0', dtype=torch.int32)
tensor([1056581825], device='cuda:0', dtype=torch.int32)
tensor([1058078609], device='cuda:0', dtype=torch.int32)
tensor([1052083673], device='cuda:0', dtype=torch.int32)
tensor([1065198239], device='cuda:0', dtype=torch.int32)
tensor([1032416633], device='cuda:0', dtype=torch.int32)
tensor([1062065655], device='cuda:0', dtype=torch.int32)
tensor([1058903188], device='cuda:0', dtype=torch.int32)

@yzh119
Copy link
Collaborator

yzh119 commented Jul 20, 2024

Thank you I'll fix it soon.

@yzh119
Copy link
Collaborator

yzh119 commented Jul 20, 2024

@ispobock Unfortunately I cannot reproduce your error locally, I always got 1 as the sampling output.

Could you find a fixed random seed (I tried a lot of them) that can reproduce this error?

@yzh119
Copy link
Collaborator

yzh119 commented Jul 20, 2024

btw, your example is kind of weird because we expect top_ks and top_ps to have shape (batch_size,), but in your example they have shape (1,) which mismatches batch_size (32).

@ispobock
Copy link
Contributor

ispobock commented Jul 20, 2024

@yzh119 The batch_size is assigned by probs.shape[0], and probs.shape[0]=1 in this example.
It may be environment dependent, I can reproduce it with torch==2.3.0+cu118 and flashinfer==0.1.0+cu118torch2.3 on cuda 11.8.

@ispobock
Copy link
Contributor

The issue can not be reproduced when I tried on cuda 12.2 environment with torch==2.3.0+cu121 and 0.1.0+cu121torch2.3.

@yzh119
Copy link
Collaborator

yzh119 commented Jul 20, 2024

Okay that might be the reason, for cu118 we use FlagHeads rather than SubtractLeft because of the cub version: #265

Their semantics might be different, let me double check.

@yzh119
Copy link
Collaborator

yzh119 commented Jul 21, 2024

I think I have figured out the reason, unlike SubtractLeft, the input and output argument in FlagHeads should be different:

    BlockAdjacentDifference<bool, BLOCK_THREADS>(temp_storage->block_prim.adj_diff)
        .FlagHeads<VEC_SIZE>(greater_than_u, greater_than_u, BoolDiffOp());

would result in undefined behavior, we should use different variables:

    BlockAdjacentDifference<bool, BLOCK_THREADS>(temp_storage->block_prim.adj_diff)
        .FlagHeads<VEC_SIZE>(greater_than_u_out, greater_than_u, BoolDiffOp());

After this change, I don't observe weird output anymore.

@hnyls2002
Copy link
Contributor Author

I got this error on cuda 12.4

yzh119 added a commit that referenced this issue Jul 21, 2024
As observed in #384 , we should use different variables for input and
output for `FlagHeads` API in cu118.
@yzh119
Copy link
Collaborator

yzh119 commented Jul 21, 2024

@ispobock The cu118 issue should have been fixed in main branch.

@hnyls2002 can you find a reproducible script?

@ispobock
Copy link
Contributor

@ispobock The cu118 issue should have been fixed in main branch.

@yzh119 It works for me now! Thanks!

@yzh119
Copy link
Collaborator

yzh119 commented Aug 2, 2024

I got this error on cuda 12.4

@hnyls2002 we switched to a deterministic implementation: #417, maybe it can address the issue.

@yzh119
Copy link
Collaborator

yzh119 commented Aug 11, 2024

Lots of potential bugfix PRs have been merged recently. I close this first and feel free to open it again if you still observe such issues.

@yzh119 yzh119 closed this as completed Aug 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants