Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about R1 rotation matrix. #6

Open
fingerk28 opened this issue Oct 28, 2024 · 1 comment
Open

Question about R1 rotation matrix. #6

fingerk28 opened this issue Oct 28, 2024 · 1 comment

Comments

@fingerk28
Copy link

Hello, thank you for your interest in the work, it’s very inspiring.

I would like to ask, in QuaRot, because of the residual connection, all R1 must be the same rotation matrix. However, in DuQuant, the rotation matrix needs to be obtained through greedy search, so each R1 is different, right? But how to do that and keep the inference results the same as before the rotation?

figure

Looking forward to your reply.

@FelixMessi
Copy link
Collaborator

Hello, thank you for your interest in our work.

In DuQuant, each $R_1$ is indeed obtained through a greedy search, making each rotation matrix different. In the latest version of our paper, we provide additional results on inference speedup.

We compare pre-filling and decoding stages with QuaRot, particularly for the LLaMA2-7B model. For pre-filling, we measure time usage by sending one sentence with 2048 tokens and we decode 128 steps to compute peak memory usage. As shown below, DuQuant maintains comparable pre-filling speedup and achieves better performance in downstream tasks.

INT4, BS=1 Time (ms) Saving Factor Memory (GB) Saving Factor WiKi QA
FP16 568 - 13.638 - 5.47 63.72
SmoothQuant 248 2.290x 3.890 3.506x 83.12 44.52
QLLM 435 1.306x 3.894 3.502x 9.09 51.60
QuaRot 284 2.000x 3.891 3.505x 6.39 61.25
DuQuant 288 1.972x 3.893 3.503x 6.28 61.76

For more in-depth information, please refer to our paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants