Added cache_block_outputs option to enable GPTQ for non-regular models #27032

AlexKoff88 · 2023-10-24T07:21:09Z

cache_block_outputs enables the collection of the block output to speed up GPTQ process. However, it does not work for some models such as ChatGLM where the LayerNorm is the first layer in the block.
Just compare:

OPT structure:
model.decoder.layers.0.self_attn
model.decoder.layers.0.self_attn.k_proj
model.decoder.layers.0.self_attn.v_proj
model.decoder.layers.0.self_attn.q_proj
model.decoder.layers.0.self_attn.out_proj
model.decoder.layers.0.activation_fn
model.decoder.layers.0.self_attn_layer_norm
model.decoder.layers.0.fc1
model.decoder.layers.0.fc2
model.decoder.layers.0.final_layer_norm

ChatGLM structure:
transformer.encoder.layers.0
transformer.encoder.layers.0.input_layernorm
transformer.encoder.layers.0.self_attention
transformer.encoder.layers.0.self_attention.query_key_value
transformer.encoder.layers.0.self_attention.core_attention
transformer.encoder.layers.0.self_attention.core_attention.attention_dropout
transformer.encoder.layers.0.self_attention.dense
transformer.encoder.layers.0.post_attention_layernorm
transformer.encoder.layers.0.mlp
transformer.encoder.layers.0.mlp.dense_h_to_4h
transformer.encoder.layers.0.mlp.dense_4h_to_h

The solution is to disable SA block output caching and collect the quantizing block inputs starting from the beginning of the model. It slows down the optimization a bit but works more stable.

Related PR to Optimum: huggingface/optimum#1479

SunMarc

Thanks @AlexKoff88, let's wait for the optimum PR to be merged. We might not need this argument.

src/transformers/utils/quantization_config.py

SunMarc

LGTM ! A few nits to fix.

src/transformers/utils/quantization_config.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

SunMarc · 2023-10-31T17:56:29Z

@AlexKoff88 please run make style to fix the tests.

AlexKoff88 · 2023-11-01T10:04:50Z

@AlexKoff88 please run make style to fix the tests.

Fixed

amyeroberts

Thanks for adding this!

Just a small nit on the docstring.

src/transformers/utils/quantization_config.py

HuggingFaceDocBuilderDev · 2023-11-01T13:17:30Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

huggingface#27032) * Added cache_block_outputs option to enable GPTQ for non-regular models * Update src/transformers/utils/quantization_config.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/utils/quantization_config.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Fixed style * Update src/transformers/utils/quantization_config.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Added cache_block_outputs option to enable GPTQ for non-regular models

4abc7f1

AlexKoff88 mentioned this pull request Oct 24, 2023

Added cache_block_outputs parameter to handle models with non-regular structure such as ChatGLM huggingface/optimum#1479

Merged

ArthurZucker requested a review from SunMarc October 24, 2023 11:44

SunMarc reviewed Oct 24, 2023

View reviewed changes

src/transformers/utils/quantization_config.py Outdated Show resolved Hide resolved

SunMarc approved these changes Oct 30, 2023

View reviewed changes

src/transformers/utils/quantization_config.py Outdated Show resolved Hide resolved

src/transformers/utils/quantization_config.py Outdated Show resolved Hide resolved

AlexKoff88 and others added 2 commits October 31, 2023 10:44

Update src/transformers/utils/quantization_config.py

62eedeb

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

Update src/transformers/utils/quantization_config.py

8320cb7

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

SunMarc requested a review from amyeroberts October 31, 2023 17:56

Fixed style

f9f72a9

amyeroberts approved these changes Nov 1, 2023

View reviewed changes

src/transformers/utils/quantization_config.py Outdated Show resolved Hide resolved

Update src/transformers/utils/quantization_config.py

027e184

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

amyeroberts merged commit f9b4bea into huggingface:main Nov 1, 2023
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added cache_block_outputs option to enable GPTQ for non-regular models #27032

Added cache_block_outputs option to enable GPTQ for non-regular models #27032

AlexKoff88 commented Oct 24, 2023

SunMarc left a comment

SunMarc left a comment

SunMarc commented Oct 31, 2023

AlexKoff88 commented Nov 1, 2023

amyeroberts left a comment

HuggingFaceDocBuilderDev commented Nov 1, 2023

Added cache_block_outputs option to enable GPTQ for non-regular models #27032

Added cache_block_outputs option to enable GPTQ for non-regular models #27032

Conversation

AlexKoff88 commented Oct 24, 2023

SunMarc left a comment

Choose a reason for hiding this comment

SunMarc left a comment

Choose a reason for hiding this comment

SunMarc commented Oct 31, 2023

AlexKoff88 commented Nov 1, 2023

amyeroberts left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Nov 1, 2023