Improve batch gemm performance using MKL #342

xinyu-intel · 2018-06-22T04:27:43Z

This pr is to improve the performance of small size matrix batch gemm around 5-10x by using MKL. This optimization will be useful for attention layer in sockeye.

Performance comparison:

1000 loops

size	mshadow	MKL
[1120, 10, 256] * [1120, 256, 10]	1.4739921093	0.180208921432
[1120, 40, 512] * [1120, 512, 1]	3.45011711121	0.670109033585

@pengzhao-intel

pengzhao-intel · 2018-06-22T04:58:54Z

FYI, @fhieber @tdomhan @mjpost

pengzhao-intel · 2018-06-22T05:40:43Z

@piiswrong @sxjscience please help take a review :) thanks in advance.

sxjscience · 2018-06-22T05:46:29Z

mshadow/dot_engine-inl.h

@@ -291,11 +292,48 @@ struct BLASEngine<cpu, float> {
 const float *A, int lda, const float *B, int ldb,
 float beta, float *C, int ldc, int batch_count,
 float **workspace) {
+#if MSHADOW_USE_MKL


Is cblas_sgemm_batch and cblas_dgemm_batch generally supported in MKL? Do we need to check the version?

According to this page, Intel MKL 11.3 Beta (part of Intel® Parallel Studio XE 2016 Beta) includes a new flavor of GEMM feature called "Batch GEMM".

xinyu-intel added 4 commits June 21, 2018 23:04

improve batch_dot performance by using MKL

a94af43

reduce for loop

a9e7cb8

improve double batch gemm

f9321a8

remove unnecessary reserve

0e7a708

fix lint

4282e07

sxjscience reviewed Jun 22, 2018

View reviewed changes

add MKL version check

8221728

sxjscience approved these changes Jun 23, 2018

View reviewed changes

piiswrong merged commit 757a91c into dmlc:master Jun 23, 2018

pengzhao-intel mentioned this pull request Jun 27, 2018

Switch to MKL versions of MXNet awslabs/sockeye#442

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve batch gemm performance using MKL #342

Improve batch gemm performance using MKL #342

xinyu-intel commented Jun 22, 2018

pengzhao-intel commented Jun 22, 2018

pengzhao-intel commented Jun 22, 2018

sxjscience Jun 22, 2018

xinyu-intel Jun 22, 2018

Improve batch gemm performance using MKL #342

Improve batch gemm performance using MKL #342

Conversation

xinyu-intel commented Jun 22, 2018

pengzhao-intel commented Jun 22, 2018

pengzhao-intel commented Jun 22, 2018

sxjscience Jun 22, 2018

Choose a reason for hiding this comment

xinyu-intel Jun 22, 2018

Choose a reason for hiding this comment