-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MatMulNBits to ONNX specification #1
Conversation
Minor update to README.md added parameterized and pillow since pytest would not run without them Updated the schema number from 23 to 24 since that is the version this new function is targeting The text in Changelog.md is currently place holder text till the text for Operators.md is finalized defs.cc has been updated to support MatMulNBits the `TypeAndShapeInferenceFunction` is still a work in progress This is already enough to produce a single node MatMulNBits model using the onnx python code. Signed-off-by: George Nash <george.nash@intel.com>
0d56210
to
d1075c5
Compare
This commits contains several updates: 1. The g_idx input was removed based on feed back that is does not need to be part of the standard 2. Adds in the first draft of the ONNX Function code to the Schema. This currently assumes that the B input is int4 data type and will still need to be updated to handle NBit data. 3. Added an inital stub to the automatic_upgrade_test code that makes some assumptions that may need to be updated 4. Added a matmulnbits.py stub that will have a reference python implementation for test generation. Signed-off-by: George Nash <george.nash@intel.com>
The code still needs additional testing. so far I have only tested it for 4-bit and 3-bit inputs. To implement the MatMulNBits I had to add a function DequantizeLinearNBits this is not in the standard and will need to be added if we want MatMulNBits to work as a function not an operator. Signed-off-by: George Nash <george.nash@intel.com>
8b55395
to
4c0acdb
Compare
…renceFuntion Testing against ONNX Runtime showed that the shape used for B in the MatMulNBits reference implementation didn't match. The reference implementation has been updated to match ONNX Runtimes com.microsoft implementation. The Text of Operators.md has also been updated to show that the actual shape of B is [N][n_blocks_per_col * blob_size] not [N][n_blocks_per_col][blob_size] The TypeAndShapeInferenceFuntion in math/defs.cc has been updated to check all the inputs and to propogate the output shape. Signed-off-by: George Nash <george.nash@intel.com>
The failures were in the way the test itself was written. This is now fixed. Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
ac8f868
to
070b735
Compare
Signed-off-by: George Nash <george.nash@intel.com>
@@ -0,0 +1,190 @@ | |||
# Copyright (c) ONNX Project Contributors |
Check warning
Code scanning / lintrunner
RUFF/format Warning test
@@ -0,0 +1,190 @@ | |||
# Copyright (c) ONNX Project Contributors |
Check warning
Code scanning / lintrunner
BLACK-ISORT/format Warning test
Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
other minor updates based on github lint tools Signed-off-by: George Nash <george.nash@intel.com>
This updates the zero_points shape specification to match the shape used by ONNX Runtime implementation. Updated the text to show 4-bits packing when the bits are uneven. Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
This updates the text reguarding the packing to make it clear that the bits are packed LSB to MSB. Including a link to the int4 documentation. This also adds a test case with all inputs. Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
This includes a implementation of the accuracy_level 4 that does a quantize/dequantize layer that results in a loss of accurcy that would be expected by doing int8 computation. Small changes in the documentation to improve the usability of the operator without breaking compatability with com.microsoft.matmulnbits operator found in phi3 model Signed-off-by: George Nash <george.nash@intel.com>
K and N Attributes not longer listed as required. The can be infered from the input shapes. The int32 type removed from T2 and T3 tensors since I have not been able to get any usecase or example how that data type would be used. The TypeAndShapeInferenceFunction has been updated: - handle the fact that B input can be of rank 2 or 3. - handle the case that K and N attributes are not provided. Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Other minor clean up to code. Updated example implementation to handle rank3 input from B input. Signed-off-by: George Nash <george.nash@intel.com>
-1.0, -2.0, -3.0, -4.0, -5.0, -6.0, -7.0, -8.0, -9.0, -10.0, -11.0, -12.0, -13.0, -14.0, -15.0, | ||
-16.0,-17.0, -18.0, -19.0, -20.0, -21.0, -22.0, -23.0, -24.0, -25.0, -26.0, -27.0, -28.0, -29.0, | ||
-30.0, -31.0, -32.0, -33.0,], dtype=np.float32).reshape((2,33)) | ||
# 4 8 12 16 20 24 28 32 36 40 44 48 |
Check warning
Code scanning / lintrunner
EDITORCONFIG-CHECKER/editorconfig Warning test
-1.0, -2.0, -3.0, -4.0, -5.0, -6.0, -7.0, -8.0, -9.0, -10.0, -11.0, -12.0, -13.0, -14.0, -15.0, | ||
-16.0,-17.0, -18.0, -19.0, -20.0, -21.0, -22.0, -23.0, -24.0, -25.0, -26.0, -27.0, -28.0, -29.0, | ||
-30.0, -31.0, -32.0, -33.0,], dtype=np.float32).reshape((2,33)) | ||
# 4 8 12 16 20 24 28 32 36 40 44 48 |
Check warning
Code scanning / lintrunner
RUFF/W291 Warning test
See https://docs.astral.sh/ruff/rules/trailing-whitespace
Signed-off-by: George Nash <george.nash@intel.com>
Minor updates to test code Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Add type information to the inputs for MatMulNBits Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Signed-off-by: George Nash <george.nash@intel.com>
Description
This adds MatMulNBits to the ONNX standard.
Motivation and Context
Standardize Operators that are showing up in key LLM models.
MatMulNBits is already been defined in ONNXRuntime.
MatMulNBits node is already generated by "ONNX Runtime GenAI Model Builder" and "Intel Neural Compressor"
It can be found used in the int-4 quantized Phi-3 model hosted on hugging face