Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracker] WIP features for torchao 0.4 #493

Closed
8 of 13 tasks
jerryzh168 opened this issue Jul 9, 2024 · 1 comment
Closed
8 of 13 tasks

[Tracker] WIP features for torchao 0.4 #493

jerryzh168 opened this issue Jul 9, 2024 · 1 comment
Labels

Comments

@jerryzh168
Copy link
Contributor

jerryzh168 commented Jul 9, 2024

Release date: Aug 8 2024
Branch cut: Aug 2 2024

Developer Facing API

  • static quantization flow example @jerryzh168
  • QAT refactor to generalize to other dtypes/techniques @andrewor14
  • Int4 weight-only QAT

Developer Facing API use cases

  • hqq, hqq-mix subclasses (defer to next release) @HDCharles
  • AffineQuantizedTensor layout cleanup @jerryzh168
  • [postponed to 0.5] int4 weight only quantization change device support (e.g. cpu -> cuda) @jerryzh168
  • sparse + quantization composability support @jcaip
  • quantize kv_cache to int8 for gpt-fast/torchAO llama @HDCharles

Modeling user API

  • autoquant to use AffineQuantizedTensor @HDCharles
  • [handed off to executorch team] torchchat/executorch compatibility @jerryzh168
  • add sam-fast to torchao @jcaip

Documentation

  • add quantization overview to torchao doc @supriyar
  • Huggingface neural-magic SparseLlama 2:4 notebook
@supriyar supriyar pinned this issue Jul 11, 2024
@supriyar
Copy link
Contributor

tracker for 0.5 is here #667

@jerryzh168 jerryzh168 unpinned this issue Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants