-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert OPT MatMuls with quantized inputs to MatMulInteger #1585
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving tentatively, will test it shortly.
Also, please update the PR description. |
Added the description |
…NX_export_OPT_matmuls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me after the fix
This change enables converting the torch.bmm with two quantized inputs and non-quantized output to MatMulInteger. This procedure is mutually exclusive with the existing one for QATMatMul-based quantized matmuls.
The attached graph shows two MatMulInteger that are the results of this conversion on OPT-125m.
Note that the quantization of these MatMuls on OPT requires this PR.