Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark LLM #1054

Open
Giuseppe5 opened this issue Oct 14, 2024 · 0 comments
Open

Benchmark LLM #1054

Giuseppe5 opened this issue Oct 14, 2024 · 0 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@Giuseppe5
Copy link
Collaborator

Giuseppe5 commented Oct 14, 2024

Is your feature request related to a problem? Please describe.
We have grown to support quite a few PTQ techniques within our LLM entrypoint, with even more possible combinations of them.
Although some minor benchmarking has been performed, it would be good to do systematic runs and understand what works with what, what to avoid, etc.

Describe the solution you'd like
An extensive search is not feasible, a few suggestions:

  • Weight Only 4b/8b, W8A8, W4A8, W4A4
  • MXFp8/6/4 for Weights/Activations
  • Combination of HQO for zero point + MSE for scale (might require to write custom quantizers)
  • GPxQ (with/without HQO, also with/without MSE), weight only/weight + activations
  • GPxQ (as above) with/without activation quantization

Few suggestions on the model side:

  • Llama 3.1/3.2
  • Mistral
  • Phi3
  • MoE (currently untested)
  • ...

Additional context
Reach out for further clarifications.

@Giuseppe5 Giuseppe5 added enhancement New feature or request good first issue Good for newcomers labels Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant