-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: segfault when using google/gemma-2-27b-it on vLLM #6252
Comments
Facing a similar issue running quantised gemma2-9b-it on a a10g with 24GiB VRAM, seems to be happening in the flashinfer wrapper:
|
i constantly run into seg fault when running gemma 2 9B it on 1 a single H100
|
It would be highly appreciated if any of you could provide a minimum reproducible script. Happy to take a look! |
I'm also seeing this problem. Here is how I reproduce it:
The script is generating a prompt which contains 'a a a a a a a ...' (~2000 tokens). And sends 10 requests at once. The first request completes successfully, and it seg faults after that. I wasn't able to reproduce it with ~1000 context size. Here is the output of the first request:
This is the segmentation fault log:
|
Hello folks, we can reproduce the segfault bug from our side. The bug should be fixed by this. If you need the immediate support, you need to build flashinfer main from source (make sure you clean all build before rebuilding). We will work closely with the flashinfer team to integrate flashinfer's new release. |
Please let me know if the issue is not resolved. |
Yep flashinfer v0.0.9 fixes my segfault 👍 |
Thanks! Fixed everything for me. |
Your current environment
🐛 Describe the bug
Running a simple program that does 1609 inferences on a A100-80G.
Here's the output at the time of the segmentation fault, always exactly after 1579 prompts:
The segfault reports:
The text was updated successfully, but these errors were encountered: