Question
We are very interested in two post-training quantization papers from han lab!
SmoothQuant use W8A8 for efficient GPU computation.
AWQ uses W4/3A16 for lower memory requirements and higher memory throughput.
But which one is faster in actual production?
If you have any data about this, could you share it with us?