-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Slow CPU inference in Gluon GRU module #13634
Comments
@ciyongch could you help take a look for GRU inference? |
I think Gluon GRU is calling unfused RNN cells which contain stacked fully connected and activation operators. But ndarray.RNN is calling a fused implementation. So for me the performance is as expectation. |
Next step, @marekjg if you can build with USE_BLAS=mkl, the performance will boost a lot. |
@szha is it possible to apply fused RNN into Gluon? |
Thanks for quick response. @TaoLv yes, they're the same but I've removed the comparison and loading step of the parameters for the sake of brevity. @pengzhao-intel I've installed mxnet-cu92mkl and there was already boost in preformance in compare to mxnet-cu92 which I've installed by mistake earlier. Not sure if it helps but I've checked this script in 1.3, 1.4 (when it was @ master) and 1.5 now. |
@pengzhao-intel @TaoLv @marekjg The current MXNet already supports fusedRNN in Gluon, gluon.rnn.GRU will call fusedGRU, while gluon.rnn.GRUCell will call the fullyconnected + activation implementation. Will take a look at this. |
@ciyongch Thank you for correcting me. Yes,
Yes, pip packages are built with openblas. |
gluon.rnn.GRU supports unrolling of samples with different lengths in the same batch, which is not yet supported in the fused kernel interface. cudnn supports that so for GPU implementation we'd need the integration. CPU version is yet to be implemented. |
@mxnet-label-bot add [Gluon, performance, question] |
CPU kernels were added: #9977 |
Description
Gluon.GRU is slow on the CPU comparing to ndarray.RNN GRU for the same input.
Environment info
Deep Learning AMI 19, Tesla V100
Minimum reproducible example
Steps to reproduce
Run the above script with python
Output
Gluon.GRU is significantly slower than ndarray.RNN
device,method,time:
cpu(0) ndarray 0.07194805145263672
cpu(0) gluon 4.735473394393921
gpu(0) ndarray 0.013593673706054688
gpu(0) gluon 0.04437994956970215
The text was updated successfully, but these errors were encountered: