Add MatMulNBits to ONNX specification #1

georgen117 · 2024-09-13T19:31:28Z

Description

This adds MatMulNBits to the ONNX standard.

Motivation and Context

Standardize Operators that are showing up in key LLM models.

MatMulNBits is already been defined in ONNXRuntime.

MatMulNBits node is already generated by "ONNX Runtime GenAI Model Builder" and "Intel Neural Compressor"

It can be found used in the int-4 quantized Phi-3 model hosted on hugging face

onnx/defs/math/defs.cc

Minor update to README.md added parameterized and pillow since pytest would not run without them Updated the schema number from 23 to 24 since that is the version this new function is targeting The text in Changelog.md is currently place holder text till the text for Operators.md is finalized defs.cc has been updated to support MatMulNBits the `TypeAndShapeInferenceFunction` is still a work in progress This is already enough to produce a single node MatMulNBits model using the onnx python code. Signed-off-by: George Nash <george.nash@intel.com>

This commits contains several updates: 1. The g_idx input was removed based on feed back that is does not need to be part of the standard 2. Adds in the first draft of the ONNX Function code to the Schema. This currently assumes that the B input is int4 data type and will still need to be updated to handle NBit data. 3. Added an inital stub to the automatic_upgrade_test code that makes some assumptions that may need to be updated 4. Added a matmulnbits.py stub that will have a reference python implementation for test generation. Signed-off-by: George Nash <george.nash@intel.com>

The code still needs additional testing. so far I have only tested it for 4-bit and 3-bit inputs. To implement the MatMulNBits I had to add a function DequantizeLinearNBits this is not in the standard and will need to be added if we want MatMulNBits to work as a function not an operator. Signed-off-by: George Nash <george.nash@intel.com>

…renceFuntion Testing against ONNX Runtime showed that the shape used for B in the MatMulNBits reference implementation didn't match. The reference implementation has been updated to match ONNX Runtimes com.microsoft implementation. The Text of Operators.md has also been updated to show that the actual shape of B is [N][n_blocks_per_col * blob_size] not [N][n_blocks_per_col][blob_size] The TypeAndShapeInferenceFuntion in math/defs.cc has been updated to check all the inputs and to propogate the output shape. Signed-off-by: George Nash <george.nash@intel.com>

The failures were in the way the test itself was written. This is now fixed. Signed-off-by: George Nash <george.nash@intel.com>

Signed-off-by: George Nash <george.nash@intel.com>

onnx/backend/test/case/node/matmulnbits.py

@@ -0,0 +1,190 @@
+# Copyright (c) ONNX Project Contributors


onnx/backend/test/case/node/matmulnbits.py

@@ -0,0 +1,190 @@
+# Copyright (c) ONNX Project Contributors


onnx/backend/test/case/node/matmulnbits.py

Signed-off-by: George Nash <george.nash@intel.com>

onnx/backend/test/case/node/matmulnbits.py

Signed-off-by: George Nash <george.nash@intel.com>

other minor updates based on github lint tools Signed-off-by: George Nash <george.nash@intel.com>

This updates the zero_points shape specification to match the shape used by ONNX Runtime implementation. Updated the text to show 4-bits packing when the bits are uneven. Signed-off-by: George Nash <george.nash@intel.com>

Signed-off-by: George Nash <george.nash@intel.com>

This updates the text reguarding the packing to make it clear that the bits are packed LSB to MSB. Including a link to the int4 documentation. This also adds a test case with all inputs. Signed-off-by: George Nash <george.nash@intel.com>

Signed-off-by: George Nash <george.nash@intel.com>

This includes a implementation of the accuracy_level 4 that does a quantize/dequantize layer that results in a loss of accurcy that would be expected by doing int8 computation. Small changes in the documentation to improve the usability of the operator without breaking compatability with com.microsoft.matmulnbits operator found in phi3 model Signed-off-by: George Nash <george.nash@intel.com>

K and N Attributes not longer listed as required. The can be infered from the input shapes. The int32 type removed from T2 and T3 tensors since I have not been able to get any usecase or example how that data type would be used. The TypeAndShapeInferenceFunction has been updated: - handle the fact that B input can be of rank 2 or 3. - handle the case that K and N attributes are not provided. Signed-off-by: George Nash <george.nash@intel.com>

onnx/backend/test/case/node/matmulnbits.py

Signed-off-by: George Nash <george.nash@intel.com>

onnx/defs/math/defs.cc

Signed-off-by: George Nash <george.nash@intel.com>

Other minor clean up to code. Updated example implementation to handle rank3 input from B input. Signed-off-by: George Nash <george.nash@intel.com>

onnx/backend/test/case/node/matmulnbits.py

+                  -1.0, -2.0, -3.0, -4.0, -5.0, -6.0, -7.0, -8.0, -9.0, -10.0, -11.0, -12.0, -13.0, -14.0, -15.0,
+                  -16.0,-17.0, -18.0, -19.0, -20.0, -21.0, -22.0, -23.0, -24.0, -25.0, -26.0, -27.0, -28.0, -29.0,
+                  -30.0, -31.0, -32.0, -33.0,], dtype=np.float32).reshape((2,33))
+                #  4    8    12   16   20   24   28   32  36   40   44   48 


onnx/backend/test/case/node/matmulnbits.py

+                  -1.0, -2.0, -3.0, -4.0, -5.0, -6.0, -7.0, -8.0, -9.0, -10.0, -11.0, -12.0, -13.0, -14.0, -15.0,
+                  -16.0,-17.0, -18.0, -19.0, -20.0, -21.0, -22.0, -23.0, -24.0, -25.0, -26.0, -27.0, -28.0, -29.0,
+                  -30.0, -31.0, -32.0, -33.0,], dtype=np.float32).reshape((2,33))
+                #  4    8    12   16   20   24   28   32  36   40   44   48 


onnx/backend/test/case/node/matmulnbits.py

Signed-off-by: George Nash <george.nash@intel.com>

Minor updates to test code Signed-off-by: George Nash <george.nash@intel.com>

Signed-off-by: George Nash <george.nash@intel.com>

onnx/reference/ops/op_matmulnbits.py

@@ -0,0 +1,235 @@
+# Copyright (c) ONNX Project Contributors


onnx/reference/ops/op_matmulnbits.py

Signed-off-by: George Nash <george.nash@intel.com>

onnx/reference/ops/op_matmulnbits.py

Signed-off-by: George Nash <george.nash@intel.com>

onnx/reference/ops/op_matmulnbits.py

Signed-off-by: George Nash <george.nash@intel.com>

Add type information to the inputs for MatMulNBits Signed-off-by: George Nash <george.nash@intel.com>

Signed-off-by: George Nash <george.nash@intel.com>

onnx/backend/test/case/node/matmulnbits.py

Signed-off-by: George Nash <george.nash@intel.com>

onnx/backend/test/case/node/matmulnbits.py

Signed-off-by: George Nash <george.nash@intel.com>

github-advanced-security bot found potential problems Sep 13, 2024

View reviewed changes

onnx/defs/math/defs.cc Fixed Show fixed Hide fixed

georgen117 force-pushed the matmulnbits_function branch from 0d56210 to d1075c5 Compare September 13, 2024 20:47

georgen117 added 2 commits September 20, 2024 13:06

georgen117 force-pushed the matmulnbits_function branch from 8b55395 to 4c0acdb Compare October 2, 2024 18:01

georgen117 added 3 commits October 4, 2024 09:17

Fixed failure in the automatic_upgrade_test

ce6295b

The failures were in the way the test itself was written. This is now fixed. Signed-off-by: George Nash <george.nash@intel.com>

Code cleanup and improvments to TypeAndShapeInferenceFunction

070b735

Signed-off-by: George Nash <george.nash@intel.com>

georgen117 force-pushed the matmulnbits_function branch from ac8f868 to 070b735 Compare October 4, 2024 19:56

Corrected the version info

8d28ddf

Signed-off-by: George Nash <george.nash@intel.com>

github-advanced-security bot found potential problems Oct 4, 2024

View reviewed changes

Expanded the MatMulNBits tests

2f68bbe

Signed-off-by: George Nash <george.nash@intel.com>

github-advanced-security bot found potential problems Oct 4, 2024

View reviewed changes

onnx/backend/test/case/node/matmulnbits.py Fixed Show fixed Hide fixed

onnx/backend/test/case/node/matmulnbits.py Fixed Show fixed Hide fixed

onnx/backend/test/case/node/matmulnbits.py Fixed Show fixed Hide fixed

onnx/backend/test/case/node/matmulnbits.py Fixed Show fixed Hide fixed

georgen117 added 8 commits October 4, 2024 16:54

File missed when I corrected the version info

6cdb837

Signed-off-by: George Nash <george.nash@intel.com>

Update zero_points default value is not provided

72954cb

other minor updates based on github lint tools Signed-off-by: George Nash <george.nash@intel.com>

Update zero_points shape specification

d6c6918

This updates the zero_points shape specification to match the shape used by ONNX Runtime implementation. Updated the text to show 4-bits packing when the bits are uneven. Signed-off-by: George Nash <george.nash@intel.com>

Expanded the test cases for MatMulNBits

9366ced

Signed-off-by: George Nash <george.nash@intel.com>

Fixed build error and minor whitespace

74bbf7b

Signed-off-by: George Nash <george.nash@intel.com>

github-advanced-security bot found potential problems Oct 16, 2024

View reviewed changes

onnx/backend/test/case/node/matmulnbits.py Fixed Show fixed Hide fixed

onnx/backend/test/case/node/matmulnbits.py Fixed Show fixed Hide fixed

onnx/backend/test/case/node/matmulnbits.py Fixed Show fixed Hide fixed

onnx/backend/test/case/node/matmulnbits.py Fixed Show fixed Hide fixed

georgen117 added 2 commits October 16, 2024 15:06

Removed non-working function builder

0e39ace

Signed-off-by: George Nash <george.nash@intel.com>

Copied the documentation to Changelog and the doc string

58fc96b

Signed-off-by: George Nash <george.nash@intel.com>

github-advanced-security bot found potential problems Oct 16, 2024

View reviewed changes

onnx/defs/math/defs.cc Fixed Show fixed Hide fixed

georgen117 added 2 commits October 17, 2024 11:34

Documentation updates

3b71e9e

Signed-off-by: George Nash <george.nash@intel.com>

Expanded unit test to cover bit patterns from 2 to 7-bits

22396ce

Other minor clean up to code. Updated example implementation to handle rank3 input from B input. Signed-off-by: George Nash <george.nash@intel.com>

github-advanced-security bot found potential problems Oct 18, 2024

View reviewed changes

fixed issues found in unit tests and shape checker

3e8cfc1

Signed-off-by: George Nash <george.nash@intel.com>

georgen117 added 2 commits October 19, 2024 11:24

Add backend implementation of MatMulNBits

47131e4

Minor updates to test code Signed-off-by: George Nash <george.nash@intel.com>

Add backend test node models

e8d646a

Signed-off-by: George Nash <george.nash@intel.com>

github-advanced-security bot found potential problems Oct 19, 2024

View reviewed changes

georgen117 added 2 commits October 19, 2024 11:43

Code clean up and updates based on linter

df92d83

Signed-off-by: George Nash <george.nash@intel.com>

Fixed zero_points size check

b4124e7

Signed-off-by: George Nash <george.nash@intel.com>

github-advanced-security bot found potential problems Oct 19, 2024

View reviewed changes

georgen117 changed the title ~~[WIP] Add MatMulNBits to ONNX specification~~ Add MatMulNBits to ONNX specification Oct 21, 2024

Minor updates based on linter tool

17e3722

Signed-off-by: George Nash <george.nash@intel.com>

github-advanced-security bot found potential problems Oct 21, 2024

View reviewed changes

onnx/reference/ops/op_matmulnbits.py Fixed Show fixed Hide fixed

georgen117 added 6 commits October 21, 2024 13:04

Minor update to documentation based on review

02074d7

Signed-off-by: George Nash <george.nash@intel.com>

fixed white space

3dbf47a

Signed-off-by: George Nash <george.nash@intel.com>

Minor update to reference implementation

4c2e7e0

Add type information to the inputs for MatMulNBits Signed-off-by: George Nash <george.nash@intel.com>

Removed some debug code

8f81458

Signed-off-by: George Nash <george.nash@intel.com>

Added examples to the Operators.md for MatMulNBits

6b3c276

Signed-off-by: George Nash <george.nash@intel.com>

Fixed input for test_matmulnbits_with_bias

082df4e

Signed-off-by: George Nash <george.nash@intel.com>

sunnyshu-intel reviewed Oct 23, 2024

View reviewed changes

onnx/backend/test/case/node/matmulnbits.py Show resolved Hide resolved

Fixed issues identified by lintrunner

f86a457

Signed-off-by: George Nash <george.nash@intel.com>

github-advanced-security bot found potential problems Oct 23, 2024

View reviewed changes

onnx/backend/test/case/node/matmulnbits.py Fixed Show fixed Hide fixed

deleted uneeded import

294f9ca

Signed-off-by: George Nash <george.nash@intel.com>

georgen117 closed this Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MatMulNBits to ONNX specification #1

Add MatMulNBits to ONNX specification #1

georgen117 commented Sep 13, 2024 •

edited

Loading

Add MatMulNBits to ONNX specification #1

Add MatMulNBits to ONNX specification #1

Conversation

georgen117 commented Sep 13, 2024 • edited Loading

Description

Motivation and Context

georgen117 commented Sep 13, 2024 •

edited

Loading