This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Huge performance decrease by quantization #13720
Labels
Comments
Form your link
But in my case, it double the GPU memory usage. I don't think it can be considered as "smaller". |
FYI again, #13145 (comment) |
I couldn't find any information about why GPU memory usage increased. |
@ThomasDelteil do you have some data to show the memory changes of INT8 flow? @reminisce to comment your question. |
@mxnet-label-bot add [Operator, Performance] |
@mxnet-label-bot add [Quantization] |
Since there is no plan till now, closing this issue. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
I use the code from PR #13715, and I got a huge performance decrease by doing quantization on my model. I tested on Windows 10 with CUDA 10 and cudnn7 on Titan X (Pascal), using pre-release build from pip mxnet-cu100.
Alought by this issue #10897, it claimed that INT8 quantization can save GPU memory during usage, I got almose 2x more VRAM usage by quantization.
Do we excepted that INT8 quantization is super slow and use more memory on GPU?
And I may assume that UINT8 quantization is not yet supported since the UINT8 quantizated parameters is signed integer.
So, do we have any plan for improving INT8 quantization in the near future?
The text was updated successfully, but these errors were encountered: