[FIX] opt fc1/fc2 layer modules should not be quantized #118

Qubitium · 2024-06-29T10:23:18Z

Resolves #117

For OPT model, the fc1/fc2 layer modules are horrible/in-compatible with current quant calibration resulting in massive losses for every single layer. Solution is to disable/skip these 2 modules.

LeiWang1999 · 2024-07-01T08:23:05Z

Looks like this option will cause a crash for vllm when load from opt-gptq model.

rank0]:     return loader.load_model(model_config=model_config,
[rank0]:   File "/root/vllm-bitblas/vllm/model_executor/model_loader/loader.py", line 270, in load_model
[rank0]:     model.load_weights(
[rank0]:   File "/root/vllm-bitblas/vllm/model_executor/models/opt.py", line 355, in load_weights
[rank0]:     param = params_dict[name]
[rank0]: KeyError: 'model.decoder.layers.0.fc1.weight'

Qubitium · 2024-07-01T10:00:04Z

@LeiWang1999 Yes. there are some interface regressions (bitblas unittests) failing post this PR merge. In current main, bitblas backend with OPT fc1/fc2 modeuls skipped is outputing non-sense output. But ~~backend=exllamav2~~ is working fine.

Do you have any idea why?

Btw, this PR is currently required since without bypass fc1/fc2, OPT will fail our pre/post quantization PPL regression test. From our tests, fc1/fc2 are not quantizable. avg_loss is astronomical at all N layers.

Qubitium · 2024-07-01T10:03:06Z

Just to add, we tested AutoGPTQ main as well and the same avg_loss is observed there so this is not unique to our refractors. The OPT model quants likely never had optimal quantization from the very beginning and due to PPL tests not performed on pre/post quant, it was not discovered until now.

Qubitium · 2024-07-01T10:38:59Z

UPDATE: @ZX-ModelCloud still checking inference regression regarding this PR. Either we need to revert this or our PPL tests are broken and not reflective of real-world generate results. We will adjust accordingly.

This reverts commit c9a0688.

…#149) This reverts commit c9a0688.

…loud#118)" (ModelCloud#149) This reverts commit 5bf289a.

fix opt fc1/fc2 layer modules should not be quantized

71eb0f8

Qubitium merged commit c9a0688 into main Jun 29, 2024
1 of 2 checks passed

Qubitium deleted the fix-opt branch June 29, 2024 10:24

Qubitium mentioned this pull request Jul 1, 2024

[BUG] Regression in OPT model generation #139

Closed

Qubitium added a commit that referenced this pull request Jul 2, 2024

Revert "fix opt fc1/fc2 layer modules should not be quantized (#118)"

0617d77

This reverts commit c9a0688.

Qubitium added a commit that referenced this pull request Jul 2, 2024

Revert "fix opt fc1/fc2 layer modules should not be quantized (#118)" (…

83c002d

…#149) This reverts commit c9a0688.

DeJoker pushed a commit to DeJoker/GPTQModel that referenced this pull request Jul 19, 2024

fix opt fc1/fc2 layer modules should not be quantized (ModelCloud#118)

5bf289a

DeJoker pushed a commit to DeJoker/GPTQModel that referenced this pull request Jul 19, 2024

Revert "fix opt fc1/fc2 layer modules should not be quantized (ModelC…

79a09c8

…loud#118)" (ModelCloud#149) This reverts commit 5bf289a.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] opt fc1/fc2 layer modules should not be quantized #118

[FIX] opt fc1/fc2 layer modules should not be quantized #118

Qubitium commented Jun 29, 2024 •

edited

Loading

LeiWang1999 commented Jul 1, 2024

Qubitium commented Jul 1, 2024 •

edited

Loading

Qubitium commented Jul 1, 2024

Qubitium commented Jul 1, 2024 •

edited

Loading

[FIX] opt fc1/fc2 layer modules should not be quantized #118

[FIX] opt fc1/fc2 layer modules should not be quantized #118

Conversation

Qubitium commented Jun 29, 2024 • edited Loading

LeiWang1999 commented Jul 1, 2024

Qubitium commented Jul 1, 2024 • edited Loading

Qubitium commented Jul 1, 2024

Qubitium commented Jul 1, 2024 • edited Loading

Qubitium commented Jun 29, 2024 •

edited

Loading

Qubitium commented Jul 1, 2024 •

edited

Loading

Qubitium commented Jul 1, 2024 •

edited

Loading