-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] opt fc1/fc2 layer modules should not be quantized #118
Conversation
Looks like this option will cause a crash for vllm when load from opt-gptq model. rank0]: return loader.load_model(model_config=model_config,
[rank0]: File "/root/vllm-bitblas/vllm/model_executor/model_loader/loader.py", line 270, in load_model
[rank0]: model.load_weights(
[rank0]: File "/root/vllm-bitblas/vllm/model_executor/models/opt.py", line 355, in load_weights
[rank0]: param = params_dict[name]
[rank0]: KeyError: 'model.decoder.layers.0.fc1.weight' |
@LeiWang1999 Yes. there are some interface regressions (bitblas unittests) failing post this PR merge. In current main, bitblas backend with OPT fc1/fc2 modeuls skipped is outputing non-sense output. But Do you have any idea why? Btw, this PR is currently required since without bypass fc1/fc2, OPT will fail our pre/post quantization PPL regression test. From our tests, fc1/fc2 are not quantizable. avg_loss is astronomical at all N layers. |
Just to add, we tested AutoGPTQ main as well and the same avg_loss is observed there so this is not unique to our refractors. The OPT model quants likely never had optimal quantization from the very beginning and due to PPL tests not performed on pre/post quant, it was not discovered until now. |
UPDATE: @ZX-ModelCloud still checking inference regression regarding this PR. Either we need to revert this or our PPL tests are broken and not reflective of real-world generate results. We will adjust accordingly. |
…loud#118)" (ModelCloud#149) This reverts commit 5bf289a.
Resolves #117
For OPT model, the fc1/fc2 layer modules are horrible/in-compatible with current quant calibration resulting in massive losses for every single layer. Solution is to disable/skip these 2 modules.