Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Frontend] Unified LSTM cell #8599

Merged
merged 21 commits into from
Aug 6, 2021
Merged

[Frontend] Unified LSTM cell #8599

merged 21 commits into from
Aug 6, 2021

Conversation

vvchernov
Copy link
Contributor

@vvchernov vvchernov commented Jul 30, 2021

LSTM cell was unified and transferred to common place for all frontends. Here it is simultaneously used by onnx and pytorch frontends of TVM. LSTM cell was analyzed and modified to remove excess memory and other manipulations which potentially can not be fixed by compiler on its side. Performance tests for different modification of LSTM before and after were carried out. The results are collected in the tables:

Table 1. Average time per run (microsec) for 10000 runs. The following parameters are used (small input size): with biases = True, batch first = True, feature size = 5, hidden size = 10, number of stacked layers = 2, sequence length = 3, batch size = 1, trials number = 100

Frontend name/LSTM type uni b s sb
Onnx 26.8 55.3 50.7 112.7
Onnx dev 20.1 40.5 37.7 81.7
Onnx tuned 5.1 5.8 7.1 11.1
Onnx dev tuned 4.7 6.0 6.2 10.2
Pytorch 12.1 19.9 20.5 37.2
Pytorch dev 8.9 14.1 14.9 27.5
Pytorch tuned 4.8 6.0 6.4 9.9
Pytorch dev tuned 4.7 6.1 6.4 9.8
Onnxruntime 16.0 21.1 24.8 36.7

There are several LSTM types: uni – unidirectional, b – bidirectional, s – stacked (2 layers are used in the tests), sb - stacked bidirectional. Suffix "dev" means implementation in this patch. We had strong difference for performance between implementation on onnx and pytorch without tuning (onnx one is slower). With tuning onnx implementation was slightly worse than pytorch. This patch fixed performance differences for LSTM with tuning and imporved results without tuning for both onnx and pytorch.

Table 2. Average time per run (ms) for 1000 runs. The following parameters are used (big input size): with biases = True, batch first = True, feature size = 40, hidden size = 256, number of stacked layers = 3, sequence length = 160, batch size = 1, trials number = 100

Frontend name/LSTM type uni b s sb
Onnx 47.3 205
Onnx dev 57.7
Onnx tuned 8.74 31.8
Onnx dev tuned 7.63
Pytorch 7.77 27.2
Pytorch dev 7.65
Pytorch tuned 7.71 27.3
Pytorch dev tuned 7.60
Onnxruntime 1.50 4.69

@masahi @jwfromm please review

@vvchernov vvchernov force-pushed the vc/lstm_perf branch 2 times, most recently from cb9105c to 8b7c4bc Compare July 30, 2021 13:59
@vvchernov vvchernov changed the title Vc/lstm perf [Frontend] Unified LSTM cell Jul 30, 2021
@vvchernov vvchernov changed the title [Frontend] Unified LSTM cell WIP: [Frontend] Unified LSTM cell Jul 30, 2021
@vvchernov vvchernov changed the title WIP: [Frontend] Unified LSTM cell [Frontend] Unified LSTM cell Jul 30, 2021
@junrushao
Copy link
Member

CC @masahi if you are interested

@masahi
Copy link
Member

masahi commented Aug 3, 2021

cc @mbrookhart

@masahi
Copy link
Member

masahi commented Aug 4, 2021

Thanks @vvchernov I only have minor comments left.

@masahi masahi merged commit 2c124c9 into apache:main Aug 6, 2021
@masahi
Copy link
Member

masahi commented Aug 6, 2021

thanks @vvchernov

mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request Aug 11, 2021
* fuse dence sum

* remove excess copying

* dev LSTM in ONNX

* alternative implementation of LSTM in onnx frontend. It is quicker than current one without tuning

* LSTM_dev2 was implemented in onnx frontend

* LSTM dev in pytorch frontend

* LSTM cell implementation was transferred to common place. Unneccessary code was removed

* lint fixes

* Weights permutation for LSTM layer in onnx frontend

* LSTM cell description was added

* arguments and values were renamed. descriptions of some methods were added

* LSTM output shape and actvations input format were fixed in onnx frontend

* empty. tvm-ci test

* unbind method was transferred from onnx frontend to common.py

* unbind method was transferred from pytorch frontend to common.py

* lstm cell was transferred from op/layers.py to frontend/common.py

* clean up weight dictionary initialization

* fix pytorch frontend wrapper over unbind method

* minor fix of comments

* empty. tvm-ci test restart

* empty. tvm-ci test restart

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
@vvchernov vvchernov deleted the vc/lstm_perf branch August 27, 2021 07:57
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
* fuse dence sum

* remove excess copying

* dev LSTM in ONNX

* alternative implementation of LSTM in onnx frontend. It is quicker than current one without tuning

* LSTM_dev2 was implemented in onnx frontend

* LSTM dev in pytorch frontend

* LSTM cell implementation was transferred to common place. Unneccessary code was removed

* lint fixes

* Weights permutation for LSTM layer in onnx frontend

* LSTM cell description was added

* arguments and values were renamed. descriptions of some methods were added

* LSTM output shape and actvations input format were fixed in onnx frontend

* empty. tvm-ci test

* unbind method was transferred from onnx frontend to common.py

* unbind method was transferred from pytorch frontend to common.py

* lstm cell was transferred from op/layers.py to frontend/common.py

* clean up weight dictionary initialization

* fix pytorch frontend wrapper over unbind method

* minor fix of comments

* empty. tvm-ci test restart

* empty. tvm-ci test restart

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
* fuse dence sum

* remove excess copying

* dev LSTM in ONNX

* alternative implementation of LSTM in onnx frontend. It is quicker than current one without tuning

* LSTM_dev2 was implemented in onnx frontend

* LSTM dev in pytorch frontend

* LSTM cell implementation was transferred to common place. Unneccessary code was removed

* lint fixes

* Weights permutation for LSTM layer in onnx frontend

* LSTM cell description was added

* arguments and values were renamed. descriptions of some methods were added

* LSTM output shape and actvations input format were fixed in onnx frontend

* empty. tvm-ci test

* unbind method was transferred from onnx frontend to common.py

* unbind method was transferred from pytorch frontend to common.py

* lstm cell was transferred from op/layers.py to frontend/common.py

* clean up weight dictionary initialization

* fix pytorch frontend wrapper over unbind method

* minor fix of comments

* empty. tvm-ci test restart

* empty. tvm-ci test restart

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants