TVM v0.3 Release Note #854

tqchen · 2018-01-31T19:58:27Z

v0.3 is now tagged, next cycle roadmap issue is available at #1170

Release Note

This release features numerous improvements in TOPI and backends. We make the first step toward object detection support in TOPI, featuring operators necessary for YOLO and SSDs. The topi now supports numpy-style API and operator overloading. RPC is significantly improved to support resource allocation and using a pool of devices. We are adding two new backends: WebGL for running GPUs on the browser, and Vulkan for running on next-generation graphics API.

Change List

TOPI Vision operators
- SSD support
- YOLO support
- NMS operator support in vision
TOPI general numpy-style operators
- numpy style operator overload in topi
- more operators: flip, take
- dilation support on conv2d and depthwise
8bit support
- ARM 8bit gemm
- ARM 8bit conv
Low bit operator support
- popcount intrinsics
- 1-bit fully connected
Contrib: MPSDNN fully-connected and conv2d support
Better RPC support
- RPC Tracker support to allow centralized resource management
- RPC protocol upgrade (this is a non-backward compatible change) to support timeout in the proxy
  - This is a breaking change, need to use the latest version of TVM runtime with the RPC
- Fault-tolerant to early server termination with correct exception propagated
- RPC support enabled for ROCm AMDGPUs
Tutorials and docs
- How to deploy to android devices.
Optimizations for hardware backends
- intel CPU (AVX and AVX512)
Schedule Primitives
- rfactor now support factor_axis to specify the factored dimension in the result
- cache_write now support multiple output operators
- enable warp memory which generates shuffle instructions
Framework bridge
- MXNet bridge supported
C++ compiler API support
- build migration
- topi migration to c++
- Target system in c++
WebGL backend
- runtime and codegen
- topi integration
- end to end pipeline on the browser
Vulkan backend
- vulkan runtime
- spirv code generator
Security
- intel SGX runtime support
- multi-threaded SGX runtime
LLVM 7.0 support
Robustness
- VerifyMemory to verify incorrect GPU schedules that writes into GPU memory from cpu
- Verify compute formulas
Better CPU parallel runtime

Main Contributors

See complete list here. Thanks to all the contributors to contribute to this release.

Code Reviewers

@zhreshold for reviewing many vision ops
@Huyuwei topi operators
@sxjscience for reviewing topi operators

TOPI:

@merrymercy Mali GPU support
@PariksheetPinjari909 topi vision ops, support for darknet operators
@yzhliu intel CPU optimization
@kevinthesun Vision operators, initial ssd, nms operator support
@dingobye Various great TOPI improvements for operator overloading
@Huyuwei dilation support to conv
@masahi Intel CPU topi
@nishi-t improvements in pooling

Compiler:

@nhynes SGX support
@phisiart WebGL backend
@alex-weaver C++ compiler support
@kun-zh bug fix bound checking in code.
@xqdan improvement low-level schedule rewrite.
@yidawang parallel runtime improvement
@eqy AMD GPU backend improvements
@Laurawly Initial improvements for Intel GPU
@cnuernber Improved runtime device stream API

yangjunpro · 2018-02-01T00:23:34Z

Tianqi,

1-bit looks a cool feature. However, in our internal experiments, 1-bit weight(with 4-bit activation) still has non-negligible accuracy degradation(the best result still has around 3 percentage gap). So may I ask why we want to invest resource on this low-bit feature? Internally we are also leveraging TVM for boosting our productivity and already submit small patch to the community. We would like to align our engineering efforts with the community. Model compression is right now on our radars, and we would like to leverage TVM for fully exploiting the potential of model compressions.

Thanks.

tqchen · 2018-02-01T02:26:34Z

We want to keep doors open for different kinds of optimizations, low bits(not necessarily 1bit) is among one of them. We are also thinking about more general low-bits(e.g. 2, 3 or 4 bits) operations. they allow interesting tradeoffs to be made(not necessarily).This direction can also be of particular interest to folks who works on hardware optimizations.

starimpact · 2018-02-01T04:27:41Z

8bit inference is the most important thing 😄

danilopau · 2018-04-23T12:27:30Z

Support for tensor comprehension would be great: optimized c code generation with openmp pragmas. Support for ONNX interoperability
Support for 32 FP down to 16FX, 8FX, 2 (ternary) and 1 bit (binary) weights and activations to enable vector instructions.

tqchen · 2018-04-23T17:13:49Z

@danilopau Just so that you know that many things you mentioned are already supported. Include high performance optimized cpu code generation, ONNX interoperability, 32 FP to 16 FP, part of binary computation

danilopau · 2018-04-24T07:43:50Z

custom hardware backend example .... please look https://blog.st.com/orlando-neural-network-iot/.

tqchen · 2018-05-14T16:07:09Z

Thanks for everyone who have pushed to last release cycle in the past three month. I would like to propose release of v0.3 on May 21th. As usual, the current checklist will go into the release note, and we will move the unfinished ones into the roadmap of next release cycle.

Main contributors to the past release, see complete list here :
Thanks to all the contributors who makes the awesome progress

Code Reviewers

@zhreshold for reviewing many vision ops
@Huyuwei topi operators
@sxjscience for reviewing topi operators

TOPI:

@merrymercy Mali GPU support
@PariksheetPinjari909 topi vision ops, support for darknet operators
@yzhliu intel CPU optimization
@kevinthesun Vision operators, initial ssd, nms operator support
@dingobye Various great TOPI improvements for operator overloading
@Huyuwei dilation support to conv
@masahi Intel CPU topi
@nishi-t improvements in pooling

Compiler:

@nhynes SGX support
@phisiart WebGL backend
@alex-weaver C++ compiler support
@kun-zh bug fix bound checking in code.
@xqdan improvement low level schedule rewrite.
@yidawang parallel runtime improvement
@eqy AMD GPU backend improvements
@Laurawly Initial improvements for intel GPU
@cnuernber Improved runtime device stream API

tqchen · 2018-05-14T16:07:39Z

Please reply to see if there is something we would like to merge in before tag v0.3

tqchen · 2018-05-21T04:27:21Z

v0.3 is now tagged, next cycle roadmap issue is available at #1170

tqchen mentioned this issue Jan 31, 2018

TVM v0.2 Release Note #349

Closed

34 tasks

tqchen added the type: roadmap label Jan 31, 2018

tqchen changed the title ~~TVM v0.3 RoadMap~~ TVM v0.3 Release Note May 21, 2018

apache locked as resolved and limited conversation to collaborators May 21, 2018

tqchen closed this as completed May 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TVM v0.3 Release Note #854

TVM v0.3 Release Note #854

tqchen commented Jan 31, 2018 •

edited

Loading

yangjunpro commented Feb 1, 2018

tqchen commented Feb 1, 2018

starimpact commented Feb 1, 2018

danilopau commented Apr 23, 2018

tqchen commented Apr 23, 2018

danilopau commented Apr 24, 2018

tqchen commented May 14, 2018 •

edited

Loading

tqchen commented May 14, 2018

tqchen commented May 21, 2018 •

edited

Loading

TVM v0.3 Release Note #854

TVM v0.3 Release Note #854

Comments

tqchen commented Jan 31, 2018 • edited Loading

Release Note

Change List

Main Contributors

yangjunpro commented Feb 1, 2018

tqchen commented Feb 1, 2018

starimpact commented Feb 1, 2018

danilopau commented Apr 23, 2018

tqchen commented Apr 23, 2018

danilopau commented Apr 24, 2018

tqchen commented May 14, 2018 • edited Loading

tqchen commented May 14, 2018

tqchen commented May 21, 2018 • edited Loading

tqchen commented Jan 31, 2018 •

edited

Loading

tqchen commented May 14, 2018 •

edited

Loading

tqchen commented May 21, 2018 •

edited

Loading