-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider uploading some quantized checkpoints to hugginface #35
Comments
Only PyTorch -> rwkv.cpp conversion would require to load the whole model in the RAM; quantization is done tensor-by-tensor. You are right about the bandwidth tho. I'll consider it, thanks for the suggestion! |
I have uploaded some quantized RWKV-4-Raven models to HuggingFace at At the time of writing, the available models are:
Feel free to create a discussion if you have a request. |
Correct me if I'm wrong but quantizing would require loading the models in their unquantized form (as per
torch.load
inhttps://github.com/saharNooby/rwkv.cpp/blob/master/rwkv/convert_pytorch_to_ggml.py
, line126
). Not to mention how much heavier the unquantized models are on bandwidths.The text was updated successfully, but these errors were encountered: