-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] Padding infeatures/outfeatures for exllama, exllama v2, and marlin #98
Conversation
Confirmed my suspicion that padding code will impact the shape of saved tensors to disk. Pack has been fixed so that the saved size is correct (original) but now running into load issues since the expanded/padded buffers are now larger than the tensors on disk so accelerate is throwing shape not same errors during qunt model load. We will explore 2 methods to deal with this tomorrow.
|
…size in post_init
Update: We are going with Plan B. |
…lin (ModelCloud#98) * fix padding * fix padding * store original in/out features * fix bad var reference * shorter var name * limit bitblas convert to use 1 thread * ruff * fix qlinear_exllama pack * revert qliner_marlin change * cleanup code * plan b: init with original shape, then model load, then do padding/resize in post_init * fix g_idx post_init * const var reformat to all caps * fix ( -> [ * padding the x that passes in forward * comments/todo * comments --------- Co-authored-by: LRL-ModelCloud <lrl@modelcloud.ai>
Resolves #100