-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC for aligning StableHLO and TOSA arithmetic #1149
base: main
Are you sure you want to change the base?
Conversation
Initial version of an RFC to discuss aligning StableHLO and TOSA arithmetic operation. Signed-off-by: Eric Kunze <eric.kunze@arm.com>
e8d168b
to
e38a4b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi folks! Apologies for the late reply - there have been out of band conversations about this RFC, and now I'd like to summarize them here on GitHub.
As the RFC mentions, StableHLO dialect currently has support for quantization via:
- Supporting
quant.uniform
element types. - Having dedicated ops like
uniform_quantize
/uniform_dequantize
. - Allowing regular ops like
add
/convolution
to take quantized tensors.
This support was inherited from MHLO when StableHLO was bootstrapped, and MHLO's support was motivated by mobile use cases and inherited from TFLite. TFLite quantization has a specification, but StableHLO quantization does not.
One key aspect of TFLite quantization spec is that it uses a floating-point scale and an integer zero point as quantization parameters. In comparison, the quantization parameters proposed in this RFC involve an integer multiplier and shift. Harmonizing this or coming up with some kind of a compromise solution looks like the main open question at the moment.
Towards that end, I would like to propose for us to collaborate on pull requests to StableHLO specification. @sdasgup3 has created an initial PR #1352 that drafts a specification for QuantizedType and for semantics of quantized add, modelled after TFLite quantization semantics. Let's get together as a community and discuss the details, with the plan to progress to more involved ops like convolution
in the future pull requests.
StableHLO dialect currently supports quantization via: 1) Supporting `quant.uniform` element types. 2) Having dedicated ops like `uniform_quantize` / `uniform_dequantize`. 3) Allowing regular ops like `add` / `convolution` to take quantized tensors. This support was inherited from MHLO when StableHLO was bootstrapped, and MHLO support was motivated by mobile use cases and inherited from TFLite. As pointed out in #1149, StableHLO specification doesn't support quantization at the moment, and this is an important gap that we would like to fix before StableHLO v1.0 (see #588). To continue the discussion started in #1149 and to make progress towards v1.0, this pull request: A) Adds QuantizedType to the StableHLO specification, modelled after [TFLite quantization spec](https://www.tensorflow.org/lite/performance/quantization_spec). B) To start a conversation about the applications of QuantizedType and the semantics of quantized ops, proposes semantics for quantized `add`. TFLite quantization spec doesn't cover everything. It specs constraints on types (which we captured accordingly in this pull request), but it doesn't go into describing semantics of quantized ops. As a result, the proposed semantics for quantized `add` is intentionally naive, as compared with the much more involved implementations in the TensorFlow repository, e.g.: * [tfl.add](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/kernels/add.cc). * [tf.UniformQuantizedAdd](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/uniform_quant_ops/uniform_quantized_add_op.cc). upd: After community discussion, we removed the spec for quantized `add` leaving that for future work, since further alignment is required. --------- Co-authored-by: Eugene Burmako <burmako@google.com>
Initial version of an RFC to discuss aligning StableHLO and TOSA arithmetic operation.
Signed-off-by: Eric Kunze eric.kunze@arm.com