Skip to content

SmoothQuant vs AWQ which one is faster? #56

@codertimo

Description

@codertimo

Question

We are very interested in two post-training quantization papers from han lab!

SmoothQuant use W8A8 for efficient GPU computation.
AWQ uses W4/3A16 for lower memory requirements and higher memory throughput.

But which one is faster in actual production?
If you have any data about this, could you share it with us?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions