Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert OPT MatMuls with quantized inputs to MatMulInteger #1585

Merged
merged 10 commits into from
Jun 8, 2023

Conversation

natuan
Copy link
Contributor

@natuan natuan commented May 26, 2023

This change enables converting the torch.bmm with two quantized inputs and non-quantized output to MatMulInteger. This procedure is mutually exclusive with the existing one for QATMatMul-based quantized matmuls.

The attached graph shows two MatMulInteger that are the results of this conversion on OPT-125m.

after

Note that the quantization of these MatMuls on OPT requires this PR.

@natuan natuan requested review from bfineran, anmarques, dbogunowicz and a team May 26, 2023 16:33
dbogunowicz
dbogunowicz previously approved these changes May 29, 2023
Copy link
Contributor

@dbogunowicz dbogunowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving tentatively, will test it shortly.

@dbogunowicz
Copy link
Contributor

Also, please update the PR description.

@natuan
Copy link
Contributor Author

natuan commented May 30, 2023

Also, please update the PR description.

Added the description

Copy link
Member

@anmarques anmarques left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me after the fix

@natuan natuan merged commit 1575944 into main Jun 8, 2023
@natuan natuan deleted the ONNX_export_OPT_matmuls branch June 8, 2023 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants