Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Weight Update Out Of Loop #40

Merged
merged 1 commit into from
Jul 25, 2024
Merged

Move Weight Update Out Of Loop #40

merged 1 commit into from
Jul 25, 2024

Conversation

Satrat
Copy link
Contributor

@Satrat Satrat commented Jul 25, 2024

SUMMARY:
Recent change for activation reordering was updating the scale/zp each block of GPTQ, it only needs to be updated once layer. Moving this line out of the loop sped things up by 20x

TEST PLAN:
Manual testing

Before:

===== Compressing layer 1/1  =====
2024-07-25T14:21:48.809433+0000 | compress_module | INFO - Compressing model.layers.0.model.layers.0.self_attn.q_proj...
2024-07-25T14:22:11.727413+0000 | compress | INFO - time 22.87
2024-07-25T14:22:11.727872+0000 | compress | INFO - error 1503.34
2024-07-25T14:22:11.728176+0000 | compress_module | INFO - Compressing model.layers.0.model.layers.0.self_attn.k_proj...
2024-07-25T14:22:34.115003+0000 | compress | INFO - time 22.39
2024-07-25T14:22:34.115453+0000 | compress | INFO - error 798.58
2024-07-25T14:22:34.115750+0000 | compress_module | INFO - Compressing model.layers.0.model.layers.0.self_attn.v_proj...
2024-07-25T14:22:56.367681+0000 | compress | INFO - time 22.25
2024-07-25T14:22:56.367947+0000 | compress | INFO - error 19.61
2024-07-25T14:22:56.368194+0000 | compress_module | INFO - Compressing model.layers.0.model.layers.0.self_attn.o_proj...
2024-07-25T14:23:19.128502+0000 | compress | INFO - time 22.76
2024-07-25T14:23:19.128925+0000 | compress | INFO - error 0.29
2024-07-25T14:23:19.129270+0000 | compress_module | INFO - Compressing model.layers.0.model.layers.0.mlp.gate_proj...
2024-07-25T14:23:41.729422+0000 | compress | INFO - time 22.60
2024-07-25T14:23:41.729671+0000 | compress | INFO - error 507.43
2024-07-25T14:23:41.729919+0000 | compress_module | INFO - Compressing model.layers.0.model.layers.0.mlp.up_proj...
2024-07-25T14:24:04.349380+0000 | compress | INFO - time 22.62
2024-07-25T14:24:04.349802+0000 | compress | INFO - error 400.27
2024-07-25T14:24:04.350086+0000 | compress_module | INFO - Compressing model.layers.0.model.layers.0.mlp.down_proj...
2024-07-25T14:28:23.691902+0000 | compress | INFO - time 259.34
2024-07-25T14:28:23.692347+0000 | compress | INFO - error 1.29

After:

===== Compressing layer 1/1  =====
2024-07-25T14:42:37.554267+0000 | compress_module | INFO - Compressing model.layers.0.model.layers.0.self_attn.q_proj...
2024-07-25T14:42:38.702161+0000 | compress | INFO - time 1.10
2024-07-25T14:42:38.702430+0000 | compress | INFO - error 1503.34
2024-07-25T14:42:38.702674+0000 | compress_module | INFO - Compressing model.layers.0.model.layers.0.self_attn.k_proj...
2024-07-25T14:42:39.747094+0000 | compress | INFO - time 1.04
2024-07-25T14:42:39.747477+0000 | compress | INFO - error 798.58
2024-07-25T14:42:39.747755+0000 | compress_module | INFO - Compressing model.layers.0.model.layers.0.self_attn.v_proj...
2024-07-25T14:42:40.791669+0000 | compress | INFO - time 1.04
2024-07-25T14:42:40.791917+0000 | compress | INFO - error 19.61
2024-07-25T14:42:40.792149+0000 | compress_module | INFO - Compressing model.layers.0.model.layers.0.self_attn.o_proj...
2024-07-25T14:42:41.844949+0000 | compress | INFO - time 1.05
2024-07-25T14:42:41.845229+0000 | compress | INFO - error 0.29
2024-07-25T14:42:41.845500+0000 | compress_module | INFO - Compressing model.layers.0.model.layers.0.mlp.gate_proj...
2024-07-25T14:42:42.947179+0000 | compress | INFO - time 1.10
2024-07-25T14:42:42.947426+0000 | compress | INFO - error 507.43
2024-07-25T14:42:42.947678+0000 | compress_module | INFO - Compressing model.layers.0.model.layers.0.mlp.up_proj...
2024-07-25T14:42:44.051472+0000 | compress | INFO - time 1.10
2024-07-25T14:42:44.051754+0000 | compress | INFO - error 400.27
2024-07-25T14:42:44.052022+0000 | compress_module | INFO - Compressing model.layers.0.model.layers.0.mlp.down_proj...
2024-07-25T14:42:48.255730+0000 | compress | INFO - time 4.20
2024-07-25T14:42:48.255968+0000 | compress | INFO - error 1.29

@bfineran bfineran merged commit 3f09ca3 into main Jul 25, 2024
7 of 12 checks passed
@bfineran bfineran deleted the fix_group_slowdown branch July 25, 2024 14:54
markmc pushed a commit to markmc/llm-compressor that referenced this pull request Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants