-
Notifications
You must be signed in to change notification settings - Fork 12
Llama3 Branch Still Suffers Segmentation Fault When Generating Datastore Using Qwen2.5 #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
After I changed to the main branch and tried the solution in #19, it is successfully fixed. So there may be some problems in the llama3 branch (even through I do the same modification to it, code in llama3 branch still fails). |
And I found that even though there is no core dumped or any other logged errors, the Reader still encounters
I solved it by modifying Line 114 from I'm not sure if other modifications in the #23 are necessary. (As I don't understand Rust :( sorry) |
Hi @csAugust - thanks so much for your work. Would you be willing to try the Is it possible that the Qwen tokenizer has additional added tokens? There seems to have been a second vocab size issue that could impact datastore writes, which the most recent commit may have fixed. |
Thank you for your help! I retry the newest llama3 branch and it works fine. It seems that additional tokens added by Qwen tokenizer cause these problems. |
I'm trying building a datastore for Qwen2.5 series models using the DraftRetriever but encountered a
Segmentation fault (core dumped)
error when callingwriter.finalize()
in scriptget_datastore_chat.py
Line 54. The dataset I used isShareGPT_Vicuna_unfiltered
, the same as the default option.I'm using "llama3" branch (as it fixed the vocabulary size limit) with python3.9 and the prebuilt wheel. I'm not familiar with Rust, so I will be sincerely appreciated if there would be someone help me out.
For reproduce:
Just modify the
get_datastore_chat.py
Line 13, Line 45 and run it with no arguments.Line 45
Output:
Thanks a lot.
The text was updated successfully, but these errors were encountered: