Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MatMulNBits to ONNX specification #1

Closed
wants to merge 34 commits into from

Conversation

georgen117
Copy link
Owner

@georgen117 georgen117 commented Sep 13, 2024

Description

This adds MatMulNBits to the ONNX standard.

Motivation and Context

Standardize Operators that are showing up in key LLM models.

MatMulNBits is already been defined in ONNXRuntime.

MatMulNBits node is already generated by "ONNX Runtime GenAI Model Builder" and "Intel Neural Compressor"

It can be found used in the int-4 quantized Phi-3 model hosted on hugging face

Minor update to README.md added parameterized and pillow since pytest
would not run without them

Updated the schema number from 23 to 24 since that is the version this
new function is targeting

The text in Changelog.md is currently place holder text till the text
for Operators.md is finalized

defs.cc has been updated to support MatMulNBits the
`TypeAndShapeInferenceFunction` is still a work in progress

This is already enough to produce a single node MatMulNBits model
using the onnx python code.

Signed-off-by: George Nash <george.nash@intel.com>
This commits contains several updates:
1. The g_idx input was removed based on feed back that is does not need to be part of the standard
2. Adds in the first draft of the ONNX Function code to the Schema. This currently assumes that the B
   input is int4 data type and will still need to be updated to handle NBit data.
3. Added an inital stub to the automatic_upgrade_test code that makes some assumptions that may need
   to be updated
4. Added a matmulnbits.py stub that will have a reference python implementation for test generation.

Signed-off-by: George Nash <george.nash@intel.com>
The code still needs additional testing. so far I have only tested it
for 4-bit and 3-bit inputs.

To implement the MatMulNBits I had to add a function DequantizeLinearNBits
this is not in the standard and will need to be added if we want MatMulNBits
to work as a function not an operator.

Signed-off-by: George Nash <george.nash@intel.com>
@georgen117 georgen117 force-pushed the matmulnbits_function branch from 8b55395 to 4c0acdb Compare October 2, 2024 18:01
…renceFuntion

Testing against ONNX Runtime showed that the shape used for B in the MatMulNBits
reference implementation didn't match. The reference implementation has been
updated to match ONNX Runtimes com.microsoft implementation.

The Text of Operators.md has also been updated to show that the actual shape
of B is [N][n_blocks_per_col * blob_size] not [N][n_blocks_per_col][blob_size]

The TypeAndShapeInferenceFuntion in math/defs.cc has been updated to
check all the inputs and to propogate the output shape.

Signed-off-by: George Nash <george.nash@intel.com>
The failures were in the way the test itself was written. This
is now fixed.

Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
@georgen117 georgen117 force-pushed the matmulnbits_function branch from ac8f868 to 070b735 Compare October 4, 2024 19:56
Signed-off-by: George Nash <george.nash@intel.com>
@@ -0,0 +1,190 @@
# Copyright (c) ONNX Project Contributors

Check warning

Code scanning / lintrunner

RUFF/format Warning test

Run lintrunner -a to apply this patch.
@@ -0,0 +1,190 @@
# Copyright (c) ONNX Project Contributors

Check warning

Code scanning / lintrunner

BLACK-ISORT/format Warning test

Run lintrunner -a to apply this patch.
Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
other minor updates based on github lint tools

Signed-off-by: George Nash <george.nash@intel.com>
This updates the zero_points shape specification to match the
shape used by ONNX Runtime implementation.

Updated the text to show 4-bits packing when the bits are uneven.

Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
This updates the text reguarding the packing to make it clear
that the bits are packed LSB to MSB. Including a link to the
int4 documentation.

This also adds a test case with all inputs.

Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
This includes a implementation of the accuracy_level 4 that
does a quantize/dequantize layer that results in a loss of accurcy
that would be expected by doing int8 computation.

Small changes in the documentation to improve the usability of the
operator without breaking compatability with com.microsoft.matmulnbits
operator found in phi3 model

Signed-off-by: George Nash <george.nash@intel.com>
K and N Attributes not longer listed as required. The can be infered
from the input shapes.

The int32 type removed from T2 and T3 tensors since I have not been able
to get any usecase or example how that data type would be used.

The TypeAndShapeInferenceFunction has been updated:
 - handle the fact that B input can be of rank 2 or 3.
 - handle the case that K and N attributes are not provided.

Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Other minor clean up to code. Updated example implementation to
handle rank3 input from B input.

Signed-off-by: George Nash <george.nash@intel.com>
-1.0, -2.0, -3.0, -4.0, -5.0, -6.0, -7.0, -8.0, -9.0, -10.0, -11.0, -12.0, -13.0, -14.0, -15.0,
-16.0,-17.0, -18.0, -19.0, -20.0, -21.0, -22.0, -23.0, -24.0, -25.0, -26.0, -27.0, -28.0, -29.0,
-30.0, -31.0, -32.0, -33.0,], dtype=np.float32).reshape((2,33))
# 4 8 12 16 20 24 28 32 36 40 44 48

Check warning

Code scanning / lintrunner

EDITORCONFIG-CHECKER/editorconfig Warning test

Trailing whitespace
-1.0, -2.0, -3.0, -4.0, -5.0, -6.0, -7.0, -8.0, -9.0, -10.0, -11.0, -12.0, -13.0, -14.0, -15.0,
-16.0,-17.0, -18.0, -19.0, -20.0, -21.0, -22.0, -23.0, -24.0, -25.0, -26.0, -27.0, -28.0, -29.0,
-30.0, -31.0, -32.0, -33.0,], dtype=np.float32).reshape((2,33))
# 4 8 12 16 20 24 28 32 36 40 44 48

Check warning

Code scanning / lintrunner

RUFF/W291 Warning test

Signed-off-by: George Nash <george.nash@intel.com>
Minor updates to test code

Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
@@ -0,0 +1,235 @@
# Copyright (c) ONNX Project Contributors

Check warning

Code scanning / lintrunner

BLACK-ISORT/format Warning

Run lintrunner -a to apply this patch.
Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
@georgen117 georgen117 changed the title [WIP] Add MatMulNBits to ONNX specification Add MatMulNBits to ONNX specification Oct 21, 2024
Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Add type information to the inputs for MatMulNBits

Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
@georgen117 georgen117 closed this Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants