-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[op compatibility] matMul #27
Comments
Can you please refine this into a precise definition? |
CoreML & MPS expose the concept of accumulator precision. That is useful to strike a balance between precision and performance. By default, the accumulator precision could be the same as the type (e.g. a float16 accumulator for float16 matmul). A high precision accumulator makes a best effort to increase precision (e.g. float32 accumulator for float16 inputs). What do you think about having a an optional 3rd argument to provide the precision?
e.g.:
|
Updated the description a bit but this is the definition from numpy -- I added an example that hopefully makes this clearer.
Great point! I think to resolve this we should try to understand how other accelerators do this -- if some accelerators do not have this option then we can only resolve this by defaulting to one or the other. |
Thank you for the update. It's very clear with the example. |
@BenjaminPoulain the accumulator precision is a great point, especially when one considers supporting lower-precision computation e.g. float16 or bfloat. But this is not a property specific to this one operator. For example, there are implementations of conv2d that results in matrix multiplication, should a consistent accumulator precision be specified for it as well? I suppose this should be a property at a higher scope, may be at an inference session scope, in order to ensure consistency through the graph. Note that DirectML supports float16 with an option for the caller to specify preference of accumulating precision as a hint. It is a hint because the underlying GPU may not be able to satisfy the requirement fully. However by making this option a device setting, it ensures consistency of result throughout the entire graph. |
I believe it should. I raised this issue for MatMul because it was making more progress than Conv2d at the time but I agree the precision should be defined whenever the size of the accumulator and/or the order of operation matters.
There is a legitimate use case to specify the precision per operation. I have nothing against your proposal to make it global at first. Fine grained optimizations are less portable and may not be suitable across browsers. |
References:
Opens:
|
@huningxin thank you! Feel free to submit a PR to add the matMul compatibility table to https://github.com/webmachinelearning/webnn/tree/master/op_compatibility so it can be collaboratively edited. |
@huningxin Please go ahead and submit your PR. I'll fix it up for DML and ONNX. e.g. DML_GEMM_OPERATOR_DESC already supports ND. The current documentation is not very up to date. ONNX's Gemm operator only supports 2D while MatMul supports ND. The fact that both are defined is a bit redundant and confusing. |
@wchao1115 , the table is merged https://github.com/webmachinelearning/webnn/blob/master/op_compatibility/matmul.md. Feel free to create PR to fix it up. Thanks. |
@BenjaminPoulain , any suggestions? Thanks! |
CoreML has
I suppose it is compatible to current proposal. What that, I propose that we can start to craft the PR for matmul op definition. At meanwhile, the mapping details of MPS/BNNS in matmul.md can be filled with a separate PR. @anssiko @wchao1115 @nsthorat @BenjaminPoulain what do you think? |
Sounds good to me. PR review to be requested from folks tagged. |
@huningxin 'dot' operation in XLA-HLO does the same except that it only supports N <= 2, supposedly because broadcasting is factored out as a separate 'broadcast' operation. This is a good example of why XLA-HLO is sort of sub-operator constructs. |
Thanks @wchao1115 . |
Correct. The point is they cut things up into smaller pieces. |
Done. #49 is opened for review. |
this was solved by #49 |
This issue will track op compatibility resolution for matMul.
Signature:
matmul(a, b)
Arguments:
a: n-dim tensor
b: n-dim tensor
Docstring:
If both a and b are 2-D they are multiplied like conventional matrices. If one of a or b are 1-D this is treated as a matrix times vector dot product.
If either argument is N dimensional, N>2, it is treated as a stack of matrices (rank-3) with dimensions corresponding to the inner two indices. The matrix multiplication will be broadcasted accordingly.
Example:
If a has shape [2, 3, 4, 5] and b has shape [5, 4], the resulting tensor will have a shape of [2, 3, 4, 4] as a is treated as a size 2 * 3 = 6 stack of [4, 5] matrices. These get broadcast multiplied over the [5, 4] matrix creating 6 [4, 4] matrices. They keep the original shape of a's outer dimensions, resulting in a shape of [2, 3, 4, 4].
Notes:
To be discussed:
The text was updated successfully, but these errors were encountered: