Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] opt fc1/fc2 layer modules should not be quantized #118

Merged
merged 1 commit into from
Jun 29, 2024
Merged

Conversation

Qubitium
Copy link
Contributor

@Qubitium Qubitium commented Jun 29, 2024

Resolves #117

For OPT model, the fc1/fc2 layer modules are horrible/in-compatible with current quant calibration resulting in massive losses for every single layer. Solution is to disable/skip these 2 modules.

@Qubitium Qubitium merged commit c9a0688 into main Jun 29, 2024
1 of 2 checks passed
@Qubitium Qubitium deleted the fix-opt branch June 29, 2024 10:24
@LeiWang1999
Copy link
Contributor

Looks like this option will cause a crash for vllm when load from opt-gptq model.

rank0]:     return loader.load_model(model_config=model_config,
[rank0]:   File "/root/vllm-bitblas/vllm/model_executor/model_loader/loader.py", line 270, in load_model
[rank0]:     model.load_weights(
[rank0]:   File "/root/vllm-bitblas/vllm/model_executor/models/opt.py", line 355, in load_weights
[rank0]:     param = params_dict[name]
[rank0]: KeyError: 'model.decoder.layers.0.fc1.weight'

@Qubitium
Copy link
Contributor Author

Qubitium commented Jul 1, 2024

@LeiWang1999 Yes. there are some interface regressions (bitblas unittests) failing post this PR merge. In current main, bitblas backend with OPT fc1/fc2 modeuls skipped is outputing non-sense output. But backend=exllamav2 is working fine.

Do you have any idea why?

Btw, this PR is currently required since without bypass fc1/fc2, OPT will fail our pre/post quantization PPL regression test. From our tests, fc1/fc2 are not quantizable. avg_loss is astronomical at all N layers.

@Qubitium
Copy link
Contributor Author

Qubitium commented Jul 1, 2024

Just to add, we tested AutoGPTQ main as well and the same avg_loss is observed there so this is not unique to our refractors. The OPT model quants likely never had optimal quantization from the very beginning and due to PPL tests not performed on pre/post quant, it was not discovered until now.

@Qubitium
Copy link
Contributor Author

Qubitium commented Jul 1, 2024

UPDATE: @ZX-ModelCloud still checking inference regression regarding this PR. Either we need to revert this or our PPL tests are broken and not reflective of real-world generate results. We will adjust accordingly.

Qubitium added a commit that referenced this pull request Jul 2, 2024
Qubitium added a commit that referenced this pull request Jul 2, 2024
DeJoker pushed a commit to DeJoker/GPTQModel that referenced this pull request Jul 19, 2024
DeJoker pushed a commit to DeJoker/GPTQModel that referenced this pull request Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] OPT model and fc1/fc2 modules
2 participants