Skip to content

Releases: XiaoMi/mace

v1.1.1

13 Jan 09:55
Compare
Choose a tag to compare

Feature:

  1. Support ION buffer on APU v4 and support input is float
  2. Auto signing libhexagon_nn_skel.so inside
  3. Remove op module when do not use cpu or gpu
  4. Supports boost and preference hints for APU
  5. Support build apu mace_run with no device connected
  6. Add dsp soc id 450
  7. Support fake warmup for OpenCL to speed up GPU warmup
  8. Add Qnn Backend and update qnn library
  9. Add special models to CI and Micro runtime_load_model example
  10. Support opencl3.0
  11. Support mtk ion mode
  12. Support dma_buf_heap
  13. Remove fallbacks caused by Reshape
  14. Add run validation for MACE-Micro
  15. Add MACE-Micro runtime load model interface
  16. Update MTK APU lib

Operator:

  1. Support sigmoid uint8 mode
  2. Support DepthToSpace, SpaceToDepth, ReduceSum and DetectionOutput operator
  3. Support depthwise_deconv2d host configuration
  4. Add keras converter supported ops
  5. Support InstanceNorm operator and fold InstanceNorm from TensorFlow
  6. Supports depth_to_space CRD mode
  7. Support dsp op: leaky relu, reshape
  8. Support htp op: depthwise_deconv, leaky_relu
  9. Support keras op: substract, multiply
  10. Support op: HardSigmoid

Performance:

  1. Optimize cpu op pooling and softmax performance
  2. Optimize Softmax on GPU and support GPU Reduce on channel dimension

Other

  1. Fix some compatibility and stability bugs
  2. Fix some document error
  3. Add some convert bug

v1.0.4

18 Mar 10:07
Compare
Choose a tag to compare
  1. Fix a computing error on MTK GPU.
  2. Optimize warmup performance on MTK GPU.

v1.0.3

03 Mar 01:09
Compare
Choose a tag to compare
  1. Support i/o data types such as fp16, bf16.
  2. Fix building error on APU runtime.
  3. Support hexagon memory usage statistics.
  4. Fix CMake building error for static library.

v1.0.2

12 Jan 02:48
Compare
Choose a tag to compare

1, Support multiple so versions on APU runtime.

v1.0.1

23 Dec 12:23
Compare
Choose a tag to compare

1, Fix building error on the ION buffer.
2, Fix bug in OpenCL buffer transformer.

Attachment

libmace-v1.0.1.tar.gz: Prebuilt MACE library using NDK-19c, which contains armeabi-v7a, arm64-v8a, arm_linux and linux-x86-64 libraries.

v1.0.0

04 Nov 12:43
Compare
Choose a tag to compare

Release Note

The following are the highlights in this release:

Support Quantization For MACE Micro

At the beginning of this year, we released MACE Micro to fully support ultra-low-power inference scenarios of mobile phones and IoT devices. In this version, we support quantization for MACE Micro and integrate CMSIS5 to support Cortex-M chips better.

Support More Model Formats

We find more and more R&D engineers are using the PyTorch framework to train their models. In previous versions, MACE transformed the PyTorch model by using ONNX format as a bridge. In order to serve PyTorch developers better, we support direct transformation for PyTorch models in this version, which improves the performance of the model inference.
At the same time, we cooperated with MEGVII company and support its MegEngine model format. If you trained your models by MegEngine framework, now you can use MACE to deploy the models on mobile phones or IoT devices.

Support More Data Precision

Armv8.2 provides support for half-precision floating-point data processing instructions, in this version we support the fp16 precision computation by Armv8.2 fp16 instructions, which increases inference speed by roughly 40% for models such as mobilenet-v1 model.
The bfloat16 (Brain Floating Point) floating-point format is a computer number format occupying 16 bits in computer memory, we also support bfloat16 precision in this version, which increases inference speed by roughly 40% for models such as mobilenet-v1/2 model on some low-end chips.

Others

In this version, we also add the following features:

  1. Support more operators, such as GroupNorm, ExtractImagePatches, Elu, etc.
  2. Optimize the performance of the framework and operators, such as the Reduce operator.
  3. Support dynamic filter of conv2d/deconv2d.
  4. Integrate MediaTek APU support on mt6873, mt6885, and mt6853.

Acknowledgement

Thanks to the following guys who contribute code which makes MACE better.

@ZhangZhijing1, who contributed the bf16 code which was then committed by someone else.
@yungchienhsu, @Yi-Kai-Chen, @Eric-YK-Chen, @yzchen, @gasgallo, @lq, @huahang, @elswork, @LovelyBuggies, @freewym.

Attachment

libmace-v1.0.0.tar.gz: Prebuilt MACE library using NDK-19c, which contains armeabi-v7a, arm64-v8a, arm_linux and linux-x86-64 libraries.

v0.13.0

03 Apr 12:29
Compare
Choose a tag to compare

Release Note

The following are the highlights in this release:

Support for Mace Micro

Compared with mobile devices such as mobile phones, micro-controllers are small, low-energy computing devices, which are often embedded in hardware that only needs basic computing, including household appliances and IoT devices. Billions of microcontrollers are produced every year. Mace adds micro-controller support to fully support ultra-low-power inference scenarios of mobile phones and IoT devices. Mace's micro-controller engine does not rely on any OS, heap memory allocation, C++ library or other third-party libraries except the math library.

Further Support For Quantization

Mace supports two kinds of quantization mechanisms: quantization-aware training and post-training quantization. In this version, we add a mixed-use of them. Furthermore, we support Armv8.2 dot product instruction for CPU quantization.

Performance Optimization

Mace is continuously optimizing the performance. This time, we add ION buffer support for Qualcomm socs, which greatly improves the inference performance of models that need to switch between GPU and CPU. Moreover, we optimize the operators' performance such as ResizeNearestNeighbor, Deconv.

Others

In this version, We support many new operators, BatchMatMulV2 and Select operators for TensorFlow, Deconv2d, Strided-Slice, Sigmoid for Hexagon DSP and fix some bugs on validation and tuning.

Acknowledgement

Thanks for the following guys who contribute code which makes MACE better.
gasgallo

Attachment

libmace-v0.13.0.tar.gz: Prebuilt MACE library using NDK-19c, which contains armeabi-v7a, arm64-v8a, arm_linux and linux-x86-64 libraries.

v0.12.0

17 Nov 07:45
Compare
Choose a tag to compare

Release Note

The following are the highlights in this release:

Performance Optimization

We found that the lack of OP implementations on devices(GPU, Hexagon DSP, etc.) would lead to inefficient model execution, for the memory synchronization between the device and the CPU consumed much time, so we added and enhanced some operators on the GPU( reshape, lpnorm, mvnorm, etc.) and Hexagon DSP (s2d, d2s, sub, etc.) to improve the efficiency of model execution.

Further Support For Speech Recognition

In the last version, we supported the Kaldi framework. In Xiaomi we did a lot of work to support the speech recognition model, including the support of flatten, unsample and other operators in onnx, as well as some bug fixes.

CMake Support

Mace is continuously optimizing our compilation tools. This time, we support cmake compilation. Because of the use of ccache for acceleration, the compilation speed of cmake is much faster than the original bazel.
Related Docs: https://mace.readthedocs.io/en/latest/user_guide/basic_usage_cmake.html

Others

In this version, We supported detection of perfomance regression by dana , and “ gpu_queue_window” parameter is added to yml file, to solve the UI jam problem caused by GPU task execution.
Related Docs: https://mace.readthedocs.io/en/latest/faq.html

Acknowledgement

Thanks for the following guys who contribute code which make MACE better.

yungchienhsu, gasgallo, albu, yunikkk

v0.11.0-rc1

30 May 08:18
Compare
Choose a tag to compare
  • Remove unimplemented gpu matmul.

  • Fix the length of abbreviated commit id in MACE version.

  • Fix some bugs.

v0.11.0-rc0

15 May 06:35
Compare
Choose a tag to compare

Improvements

  1. Support kaldi framework.
  2. Support ios and os-x.
  3. Support HTA device from Qualcomm.
  4. Support APU device from MTK.
  5. Add new thread pool to replace OpenMP
  6. New strategy to support mixing usage of CPU and GPU.
  7. Support many new ops and bug fixed.

Incompatible Changes

None

New APIs

  1. Add a new CreateEngineFromProto API.
  2. MaceTensor support data type(float and int32).

Acknowledgement

Thanks for the following guys who contribute code which make MACE better.

yungchienhsu, gigadeplex, hanton, idstein, herbakamil.

Attachment

libmace.zip: Prebuilt MACE library using NDK-17b, which contains armeabi-v7a, arm64-v8a, arm_linux and linux-x86-64 libraries.