[Frontend] Unified LSTM cell #8599

vvchernov · 2021-07-30T11:43:19Z

LSTM cell was unified and transferred to common place for all frontends. Here it is simultaneously used by onnx and pytorch frontends of TVM. LSTM cell was analyzed and modified to remove excess memory and other manipulations which potentially can not be fixed by compiler on its side. Performance tests for different modification of LSTM before and after were carried out. The results are collected in the tables:

Table 1. Average time per run (microsec) for 10000 runs. The following parameters are used (small input size): with biases = True, batch first = True, feature size = 5, hidden size = 10, number of stacked layers = 2, sequence length = 3, batch size = 1, trials number = 100

Frontend name/LSTM type	uni	b	s	sb
Onnx	26.8	55.3	50.7	112.7
Onnx dev	20.1	40.5	37.7	81.7
Onnx tuned	5.1	5.8	7.1	11.1
Onnx dev tuned	4.7	6.0	6.2	10.2
Pytorch	12.1	19.9	20.5	37.2
Pytorch dev	8.9	14.1	14.9	27.5
Pytorch tuned	4.8	6.0	6.4	9.9
Pytorch dev tuned	4.7	6.1	6.4	9.8
Onnxruntime	16.0	21.1	24.8	36.7

There are several LSTM types: uni – unidirectional, b – bidirectional, s – stacked (2 layers are used in the tests), sb - stacked bidirectional. Suffix "dev" means implementation in this patch. We had strong difference for performance between implementation on onnx and pytorch without tuning (onnx one is slower). With tuning onnx implementation was slightly worse than pytorch. This patch fixed performance differences for LSTM with tuning and imporved results without tuning for both onnx and pytorch.

Table 2. Average time per run (ms) for 1000 runs. The following parameters are used (big input size): with biases = True, batch first = True, feature size = 40, hidden size = 256, number of stacked layers = 3, sequence length = 160, batch size = 1, trials number = 100

Frontend name/LSTM type	uni	s
Onnx	47.3	205
Onnx dev	57.7
Onnx tuned	8.74	31.8
Onnx dev tuned	7.63
Pytorch	7.77	27.2
Pytorch dev	7.65
Pytorch tuned	7.71	27.3
Pytorch dev tuned	7.60
Onnxruntime	1.50	4.69

@masahi @jwfromm please review

…an current one without tuning

…y code was removed

junrushao · 2021-08-02T23:10:57Z

CC @masahi if you are interested

python/tvm/relay/frontend/onnx.py

python/tvm/relay/op/layers.py

python/tvm/relay/frontend/onnx.py

python/tvm/relay/frontend/pytorch.py

masahi · 2021-08-03T10:49:05Z

cc @mbrookhart

python/tvm/relay/frontend/common.py

python/tvm/relay/frontend/pytorch.py

python/tvm/relay/frontend/common.py

masahi · 2021-08-04T00:44:36Z

Thanks @vvchernov I only have minor comments left.

masahi · 2021-08-06T08:22:46Z

thanks @vvchernov

* fuse dence sum * remove excess copying * dev LSTM in ONNX * alternative implementation of LSTM in onnx frontend. It is quicker than current one without tuning * LSTM_dev2 was implemented in onnx frontend * LSTM dev in pytorch frontend * LSTM cell implementation was transferred to common place. Unneccessary code was removed * lint fixes * Weights permutation for LSTM layer in onnx frontend * LSTM cell description was added * arguments and values were renamed. descriptions of some methods were added * LSTM output shape and actvations input format were fixed in onnx frontend * empty. tvm-ci test * unbind method was transferred from onnx frontend to common.py * unbind method was transferred from pytorch frontend to common.py * lstm cell was transferred from op/layers.py to frontend/common.py * clean up weight dictionary initialization * fix pytorch frontend wrapper over unbind method * minor fix of comments * empty. tvm-ci test restart * empty. tvm-ci test restart Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>

vvchernov requested review from anijain2305, comaniac, Huyuwei, jroesch, junrushao, jwfromm, kazum, MarisaKirisame, mbrookhart, siju-samuel, slyubomirsky, srkreddy1238, tqchen, vinx13, wweic, yzhliu, zhiics and ZihengJiang as code owners July 30, 2021 11:43

vvchernov force-pushed the vc/lstm_perf branch 2 times, most recently from cb9105c to 8b7c4bc Compare July 30, 2021 13:59

vvchernov changed the title ~~Vc/lstm perf~~ [Frontend] Unified LSTM cell Jul 30, 2021

vvchernov changed the title ~~[Frontend] Unified LSTM cell~~ WIP: [Frontend] Unified LSTM cell Jul 30, 2021

vvchernov changed the title ~~WIP: [Frontend] Unified LSTM cell~~ [Frontend] Unified LSTM cell Jul 30, 2021

vvchernov added 7 commits August 2, 2021 16:47

fuse dence sum

df1b994

remove excess copying

2a016d9

dev LSTM in ONNX

d3bf383

alternative implementation of LSTM in onnx frontend. It is quicker th…

f737155

…an current one without tuning

LSTM_dev2 was implemented in onnx frontend

79bef55

LSTM dev in pytorch frontend

e113728

LSTM cell implementation was transferred to common place. Unneccessar…

f2860ce

…y code was removed

vvchernov force-pushed the vc/lstm_perf branch from d61dee0 to 6877742 Compare August 2, 2021 13:48

empty. tvm-ci test

845ebcf

masahi reviewed Aug 3, 2021

View reviewed changes

python/tvm/relay/frontend/onnx.py Outdated Show resolved Hide resolved

masahi reviewed Aug 3, 2021

View reviewed changes

python/tvm/relay/op/layers.py Outdated Show resolved Hide resolved

masahi reviewed Aug 3, 2021

View reviewed changes

python/tvm/relay/frontend/onnx.py Outdated Show resolved Hide resolved

masahi reviewed Aug 3, 2021

View reviewed changes

python/tvm/relay/frontend/pytorch.py Show resolved Hide resolved

vvchernov added 5 commits August 3, 2021 15:19

unbind method was transferred from onnx frontend to common.py

d3afbc6

unbind method was transferred from pytorch frontend to common.py

f819333

lstm cell was transferred from op/layers.py to frontend/common.py

4fc6505

clean up weight dictionary initialization

e103b20

fix pytorch frontend wrapper over unbind method

d3df533