-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Improve cached_op performance for static mode #14785
Conversation
cc @zhreshold |
Please fix CI and minor issue, this is awesome!! |
Can the performance measurement scripts be shared? |
@larroy I think the test is already part of gluonCV |
@pengzhao-intel @larroy yes, mainly use imagenet classification verify_pretrained.py and eval ssd for gluoncv evaluation. BTW, after Gluoncv #755 merged along with this pr. The performance will get improvement. |
@zhreshold please help to review again :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
This reverts commit 369b66d.
* Fix cached_op * try to fix ci * Fix CI * Fix ci
…apache#14868) This reverts commit 369b66d.
* Fix cached_op * try to fix ci * Fix CI * Fix ci
…apache#14868) This reverts commit 369b66d.
Description
@pengzhao-intel @TaoLv @xinyu-intel @junrushao1994
When gluon model hybridize with
static_shape=True, static_alloc=True
, cached_op with static mode will be used. For this situation, we should try to cache operator state for better performance. This PR is to enable this feature to speed up gluon inference speed, especially for small batch sizes.Below data is collected on SKX-8180 28 cores, SKX GLUON INT8 OPT shows the performance change from this PR, base is SKX GLUON INT8.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments