You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The engine of ZeroQuant inference is not released yet. The code example in DeepSpeed-Example is only to help verify the accuracy of ZeroQuant.
The kernel/engine released is on our calendar and we are actively working on it to make it compatible for various models. Please stay tuned.
For LKD, we will also release it soon.
For the last question, the code for training or accuracy testing is different than the final inference engine. Here, everything is simulated, so we can do quantization aware training or other things
Reza wraps up this #2217 which answers some part of your questions, such as the model size reduction. Regarding the kernels, we are working on a plan to release it soon so that you can give it a try.
Thanks,
Hi,
The engine of ZeroQuant inference is not released yet. The code example in DeepSpeed-Example is only to help verify the accuracy of ZeroQuant.
The kernel/engine released is on our calendar and we are actively working on it to make it compatible for various models. Please stay tuned.
For LKD, we will also release it soon.
For the last question, the code for training or accuracy testing is different than the final inference engine. Here, everything is simulated, so we can do quantization aware training or other things
Originally posted by @yaozhewei in #2207 (comment)
hi ,when the ZeroQuant inference (for GPT model) will be released?
The text was updated successfully, but these errors were encountered: