Enhance recurrent layer performance #6512

perchbird · 2017-12-12T07:15:38Z

paddle 中的 RecurrentLayer 通过调用 mkl library 里的 cblas_sgemm 对每个time state进行计算，而cblas_sgemm 这个函数内部在调用 compute 进行计算之前会先使用pack操作将数据转换为适合mkl engine的packed格式。在RNN的case下，同一次forward过程中所有time state共享同一个weight，没有必要在每次compute前对weight进行重复的pack操作。因此，intel 的优化方案是通过调用mkl library 里cblas_sgemm内部的函数，在计算前先完成一次对weight的pack操作，然后每个time state在计算时复用同一个已pack过后的weight，而无须重复进行相对耗时的pack操作。

在merge时，可以选择直接在原有的RecurrentLayer上做修改，也可以重新新建一个作为MKLDNNLayer。但由于并没有使用mkldnn library，名字上有些不合适。请问哪一种方案比较好？

luotao1 · 2017-12-12T07:25:31Z

@hedaoyuan 建议：如果优化代码与原先代码相关性不大的话，可以单独实现一个Layer，类似卷积有ExpandConvLayer和CudnnConvLayer。

但由于并没有使用mkldnn library，名字上有些不合适

取名为MKLRecurrentLayer？

yao-matrix · 2017-12-12T07:51:14Z

@luotao1 从我们角度来讲都是可以的，主要看哪种方式更match PaddlePaddle的design philosophy。
另外，caffe2的做法可供参考： https://github.com/caffe2/caffe2/blob/master/caffe2/mkl/operators/packed_fc_op.cc
他们的想法本质上是跟道远的差不多。

luotao1 · 2017-12-12T08:51:33Z

集成MKLDNN的设计文档在这里，RNN优化计划能否也写成一个设计文档呢？

zhaify mentioned this issue Dec 12, 2017

GatedRecurrentLayer 优化计划 #6506

Closed

luotao1 added the MKL label Dec 13, 2017

This was referenced Dec 14, 2017

how to add MKL Packed interface #6612

Closed

use Intel OpenMP to speedup seq2batch when WITH_MKL #6622

Merged

enable MKL Packed Recurrent Layer #6719

Merged

luotao1 closed this as completed in #6719 Jan 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance recurrent layer performance #6512

Enhance recurrent layer performance #6512

perchbird commented Dec 12, 2017

luotao1 commented Dec 12, 2017

yao-matrix commented Dec 12, 2017

luotao1 commented Dec 12, 2017

Enhance recurrent layer performance #6512

Enhance recurrent layer performance #6512

Comments

perchbird commented Dec 12, 2017

luotao1 commented Dec 12, 2017

yao-matrix commented Dec 12, 2017

luotao1 commented Dec 12, 2017