v3.1
Performance Optimizations
-
Intel Architecture Processors:
- Improved performance for 4th generation Intel Xeon Scalable processor (formerly Sapphire Rapids).
- Introduced initial optimizations for future Intel Xeon Scalable processor (code name Sierra Forest). The functionality is disabled by default and should be enabled via CPU dispatcher control.
-
Intel Graphics Products:
- Improved performance for Intel Data Center GPU Max Series (formerly Ponte Vecchio).
- Improved performance for Intel Arc graphics (formerly Alchemist and DG2) and Intel Data Center GPU Flex Series (formerly Arctic Sound-M).
- Improved concat primitive performance with per-argument scales on Intel GPUs.
-
AArch64-based Processors:
- Improved layer normalization primitive performance with Compute Library for the Arm Architecture (ACL).
-
AMD GPUs:
- Introduced optimized matmul implementation.
-
RISC-V-based Processors:
- Improved pooling primitive performance for processors with RISC-V vector extension (RVV) support.
Functionality
- Enabled Graph API as a production feature. Graph API is intended to simplify oneDNN integration into frameworks.
- Added an option to zero-out weight gradient in RNN primitive. See details in corresponding RFC.
- [experimental] Added support for sparse memory and dense by sparse matrix-matrix multiplication support in the matmul primitive. The functionality is supported on processors with Intel AVX2 and Intel AVX-512 instruction support.
- Introduced out-of-order queues support for OpenCL runtime. See the OpenCL Interoperability section in the Developer Guide for more details.
- Added support for the non-zero alpha parameter in the batch normalization ReLU post-op on Intel GPUs.
- Enabled the layer normalization primitive with f64 datatype support on Intel GPUs.
- Added support of per-argument scales in matmul, convolution, inner product, and reorder primitives on NVIDIA GPUs.
Validation
- Extended benchdnn with functional and performance validation for Graph API.
Breaking Changes
- Builds with OpenCL runtime will fail unless Graph API is disabled with
ONEDNN_BUILD_GRAPH=OFF
.
Known Issues and Limitations
- Graph API constant cache feature is disabled with SYCL CPU runtime due to an issue with the oneAPI DPC++ Compiler runtime. This will result in lower performance for some scenarios.
Thanks to the Contributors
This release contains contributions from the project core team as well as Amy Wignall @AmyWignall-arm, Annop Wongwathanarat @annop-w, @arlesniak, @bdmoore1, Crefeda Rodrigues @cfRod, David Svantesson @davsva01, Fadi Arafeh @fadara01, Jonathan Deakin @jondea, Kentaro Kawakami @kawakami-k, Pavel Zamelin @pazamelin, Pawel Piotrowicz @pawelpiotrowicz, Peter Caday @petercad, @ranzhejiang, and Sanchit Grover @sanchit-grover-intel. We would also like to thank everyone who asked questions and reported issues.