-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Steps forward - Tokenizer #37
Comments
Thank you very much for all the work and time you are giving to bring a very good functioning Falcon to the masses. Thank you. |
Thank you I hope no big bugs, tested it a lot, need some sleep |
Also thanks again for your amazing work ! In fact i have been disappointed by falcons abilities in german. I'm eager to see the improvements later on my pc. With the new changes, @TheBloke s current falcon models willnot work anymore until he builds new ones? |
Yes I'll ask Bloke to make new ones, it was just 8:00 in the morning "yesteday" so I could not fully test everything. I ran german tests on Openassistant 7B and 40B in medium quality quantization and I believe it is a lot better than before. 40B dialect:
40B:
7B test:
|
Wow That's a real big leap. |
Awesome! I am standing by to hit my big red quantisation button, just say the word |
Yes, it's a good time to requantize. The tokenizer still has minor flaws, I ran it on a 200kb file and there were 30 tokens mismatching. But in all "normal" tests I ran it was identical to the transformers implementation. The difference might also come from some invalid UTF quirk that is being encoded on python and raw on cpp side |
|
Interrupting this thread to point out latest warnings from that pull.. Operating System: Ubuntu 20.04.6 LTS (for testing using this smaller machine) Feel free to drop this from the thread, if it gets in the way. Might need a 'special' discussion, just for warnings reports.
Other unused variables.. (less noise) int vram_reserved=128MB;
Same issue in a couple of other examples.. |
Might also want to update the README, mention that you have to move (or symbolic link) the original tokenizer.json to the new location of the ggml formats, before quantizing.. (or get the convert script to copy it to the destination) We still doing q4_0 instead of q4_k, or are they both good now? falcon_model_quantize_internal: model size = 27531.39 MB main: quantize time = 299602.55 ms |
We currently do not have K quantizers that can handle 7B (superblock of 64), we'd need k quantizers that can do both 256 and 64 based on the quantization. For now I prefer to keep 7B with traditional quantization now and 40B with K quants. |
Fine by me, just want to make sure I am keeping up ;) @cmp-nct did we break something in the latest batch of commits? make clean && make failed to make the falconize binaries this time..
|
Please git pull again. |
All good.. |
I'm currently working on the tokenizer, we need a new one.
The llama tokenizer is not suitable, it has problems forming larger tokens and favors smaller ones and it does not adhere to the merge priority of bpe, instead uses sentencepiece scores.
That's why the progress on the roadmap has stopped a bit, without good tokenization Falcon can not provide good quality results.
Couple problems to be solved:
2.1) Same problem with newlines, these are actual tokens and can be combined and interleaved with spaces forming pure whitespace tokens (most likely a lot for code)
For good quality Falcon flapping we need the tokenizer to be identical or almost identical to the training tokenization
The text was updated successfully, but these errors were encountered: