Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Padding infeatures/outfeatures for exllama, exllama v2, and marlin #98

Merged
merged 19 commits into from
Jun 29, 2024

Conversation

Qubitium
Copy link
Contributor

@Qubitium Qubitium commented Jun 28, 2024

Resolves #100

@Qubitium
Copy link
Contributor Author

Qubitium commented Jun 28, 2024

Confirmed my suspicion that padding code will impact the shape of saved tensors to disk. Pack has been fixed so that the saved size is correct (original) but now running into load issues since the expanded/padded buffers are now larger than the tensors on disk so accelerate is throwing shape not same errors during qunt model load. We will explore 2 methods to deal with this tomorrow.

  1. Plan A: monkeypatch accelerate so load is using tensor indexing/slicing so as long as dst.size > src.size, we will use dst[:src.size] = src to copy the smaller tensor over.
  2. Plan B: refractor qlinear init to generate original buffers only, load from disk, then expand/pad the loaded tensors in post_init

@Qubitium
Copy link
Contributor Author

Update: We are going with Plan B.

@Qubitium Qubitium merged commit e526cce into main Jun 29, 2024
1 of 2 checks passed
@Qubitium Qubitium deleted the padding branch June 29, 2024 14:28
DeJoker pushed a commit to DeJoker/GPTQModel that referenced this pull request Jul 19, 2024
…lin (ModelCloud#98)

* fix padding

* fix padding

* store original in/out features

* fix bad var reference

* shorter var name

* limit bitblas convert to use 1 thread

* ruff

* fix qlinear_exllama pack

* revert qliner_marlin change

* cleanup code

* plan b: init with original shape, then model load, then do padding/resize in post_init

* fix g_idx post_init

* const var reformat to all caps

* fix ( -> [

* padding the x that passes in forward

* comments/todo

* comments

---------

Co-authored-by: LRL-ModelCloud <lrl@modelcloud.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Padding of infeatures/outfeatures and packing
1 participant