-
Notifications
You must be signed in to change notification settings - Fork 297
Adding int4 quantized tensor subclass #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Summary: Adding int4 tensor subclass support, also refactoring tensor subclass code to be easier to use with multiple subclasses. This subclass uses the tinygemm int4 mixed dtype gemm that was added to pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also added support for .to for tensor subclasses to get the save/loading of meta tensors working for int4. Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
This was referenced Nov 28, 2023
HDCharles
added a commit
that referenced
this pull request
Nov 28, 2023
Summary: Adding int4 tensor subclass support, also refactoring tensor subclass code to be easier to use with multiple subclasses. This subclass uses the tinygemm int4 mixed dtype gemm that was added to pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also added support for .to for tensor subclasses to get the save/loading of meta tensors working for int4. Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 591d99c Pull Request resolved: #15
Summary: Adding int4 tensor subclass support, also refactoring tensor subclass code to be easier to use with multiple subclasses. This subclass uses the tinygemm int4 mixed dtype gemm that was added to pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also added support for .to for tensor subclasses to get the save/loading of meta tensors working for int4. Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
HDCharles
added a commit
that referenced
this pull request
Nov 28, 2023
Summary: Adding int4 tensor subclass support, also refactoring tensor subclass code to be easier to use with multiple subclasses. This subclass uses the tinygemm int4 mixed dtype gemm that was added to pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also added support for .to for tensor subclasses to get the save/loading of meta tensors working for int4. Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 591d99c Pull Request resolved: #15
Summary: Adding int4 quantized tensor subclass support, also refactoring tensor subclass code to be easier to use with multiple subclasses. This subclass uses the tinygemm int4 mixed dtype gemm that was added to pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also added support for .to for tensor subclasses to get the save/loading of meta tensors working for int4. Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
HDCharles
added a commit
that referenced
this pull request
Nov 28, 2023
Summary: Adding int4 quantized tensor subclass support, also refactoring tensor subclass code to be easier to use with multiple subclasses. This subclass uses the tinygemm int4 mixed dtype gemm that was added to pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also added support for .to for tensor subclasses to get the save/loading of meta tensors working for int4. Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e1fdcb9 Pull Request resolved: #15
HDCharles
added a commit
that referenced
this pull request
Nov 28, 2023
Summary: Adding int4 quantized tensor subclass support, also refactoring tensor subclass code to be easier to use with multiple subclasses. This subclass uses the tinygemm int4 mixed dtype gemm that was added to pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also added support for .to for tensor subclasses to get the save/loading of meta tensors working for int4. Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e1fdcb9 Pull Request resolved: #15
dbyoung18
pushed a commit
to dbyoung18/ao
that referenced
this pull request
Jul 31, 2024
Summary: Adding int4 quantized tensor subclass support, also refactoring tensor subclass code to be easier to use with multiple subclasses. This subclass uses the tinygemm int4 mixed dtype gemm that was added to pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also added support for .to for tensor subclasses to get the save/loading of meta tensors working for int4. Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e1fdcb9 Pull Request resolved: pytorch#15
jerryzh168
pushed a commit
that referenced
this pull request
Sep 4, 2024
* initial flow for autoround Signed-off-by: yiliu30 <yi4.liu@intel.com> * update flow Signed-off-by: yiliu30 <yi4.liu@intel.com> * use int4 kernel Signed-off-by: yiliu30 <yi4.liu@intel.com> * remove debug code Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the forward Signed-off-by: yiliu30 <yi4.liu@intel.com> * clean code Signed-off-by: yiliu30 <yi4.liu@intel.com> * e2e example Signed-off-by: yiliu30 <yi4.liu@intel.com> * refine code Signed-off-by: yiliu30 <yi4.liu@intel.com> * add requirements for test Signed-off-by: yiliu30 <yi4.liu@intel.com> * update test Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the readme Signed-off-by: yiliu30 <yi4.liu@intel.com> * add readme Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the filenames Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the np version Signed-off-by: yiliu30 <yi4.liu@intel.com> * add demo Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * add more docs Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * add doc Signed-off-by: yiliu30 <yi4.liu@intel.com> * use `AffineQuantizedTensor` Signed-off-by: yiliu30 <yi4.liu@intel.com> * impl ar using multensors Signed-off-by: yiliu30 <yi4.liu@intel.com> * clean code Signed-off-by: yiliu30 <yi4.liu@intel.com> * use hook + multensors Signed-off-by: yiliu30 <yi4.liu@intel.com> * separate mul_tensors into a new file Signed-off-by: yiliu30 <yi4.liu@intel.com> * fix typos Signed-off-by: yiliu30 <yi4.liu@intel.com> * rename mul_tensor to multi_tensor Signed-off-by: yiliu30 <yi4.liu@intel.com> * enable amp Signed-off-by: yiliu30 <yi4.liu@intel.com> * eval model Signed-off-by: yiliu30 <yi4.liu@intel.com> * add gen examples Signed-off-by: yiliu30 <yi4.liu@intel.com> * add warmup to benchmark Signed-off-by: yiliu30 <yi4.liu@intel.com> * add benchmark Signed-off-by: yiliu30 <yi4.liu@intel.com> * clean code Signed-off-by: yiliu30 <yi4.liu@intel.com> * format code Signed-off-by: yiliu30 <yi4.liu@intel.com> * use tiny kernel Signed-off-by: yiliu30 <yi4.liu@intel.com> * add more note Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * correct typos Signed-off-by: yiliu30 <yi4.liu@intel.com> * remove hard code Signed-off-by: yiliu30 <yi4.liu@intel.com> * use intx Signed-off-by: yiliu30 <yi4.liu@intel.com> * enable offload for multitensor Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the default config Signed-off-by: yiliu30 <yi4.liu@intel.com> * refine note Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the version check Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * update Signed-off-by: yiliu30 <yi4.liu@intel.com> * add ut Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * add scripts Signed-off-by: yiliu30 <yi4.liu@intel.com> * format code Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * update Signed-off-by: yiliu30 <yi4.liu@intel.com> * fix typo Signed-off-by: yiliu30 <yi4.liu@intel.com> * refine bench code Signed-off-by: yiliu30 <yi4.liu@intel.com> * Enable `use_optimized_layer_output` and AO' llama (#12) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Refine the Doc (#14) --------- Signed-off-by: yiliu30 <yi4.liu@intel.com> * add more docstring Signed-off-by: yiliu30 <yi4.liu@intel.com> * add paper link Signed-off-by: yiliu30 <yi4.liu@intel.com> * correct some note Signed-off-by: yiliu30 <yi4.liu@intel.com> * add cmd Signed-off-by: yiliu30 <yi4.liu@intel.com> * udpdate the scripts Signed-off-by: yiliu30 <yi4.liu@intel.com> * revert some change Signed-off-by: yiliu30 <yi4.liu@intel.com> * Add a lightweight configuration for quick benchmarking (#15) Signed-off-by: yiliu30 <yi4.liu@intel.com> * update quant method name Signed-off-by: yiliu30 <yi4.liu@intel.com> * Wrap model's buffers and params to `MultiTensor` & update the results (#16) * wrap model's buffers and params to `MultiTensor` and update the results Signed-off-by: yiliu30 <yi4.liu@intel.com> --------- Signed-off-by: yiliu30 <yi4.liu@intel.com>
jerryzh168
pushed a commit
to jerryzh168/ao
that referenced
this pull request
Sep 4, 2024
* initial flow for autoround Signed-off-by: yiliu30 <yi4.liu@intel.com> * update flow Signed-off-by: yiliu30 <yi4.liu@intel.com> * use int4 kernel Signed-off-by: yiliu30 <yi4.liu@intel.com> * remove debug code Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the forward Signed-off-by: yiliu30 <yi4.liu@intel.com> * clean code Signed-off-by: yiliu30 <yi4.liu@intel.com> * e2e example Signed-off-by: yiliu30 <yi4.liu@intel.com> * refine code Signed-off-by: yiliu30 <yi4.liu@intel.com> * add requirements for test Signed-off-by: yiliu30 <yi4.liu@intel.com> * update test Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the readme Signed-off-by: yiliu30 <yi4.liu@intel.com> * add readme Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the filenames Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the np version Signed-off-by: yiliu30 <yi4.liu@intel.com> * add demo Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * add more docs Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * add doc Signed-off-by: yiliu30 <yi4.liu@intel.com> * use `AffineQuantizedTensor` Signed-off-by: yiliu30 <yi4.liu@intel.com> * impl ar using multensors Signed-off-by: yiliu30 <yi4.liu@intel.com> * clean code Signed-off-by: yiliu30 <yi4.liu@intel.com> * use hook + multensors Signed-off-by: yiliu30 <yi4.liu@intel.com> * separate mul_tensors into a new file Signed-off-by: yiliu30 <yi4.liu@intel.com> * fix typos Signed-off-by: yiliu30 <yi4.liu@intel.com> * rename mul_tensor to multi_tensor Signed-off-by: yiliu30 <yi4.liu@intel.com> * enable amp Signed-off-by: yiliu30 <yi4.liu@intel.com> * eval model Signed-off-by: yiliu30 <yi4.liu@intel.com> * add gen examples Signed-off-by: yiliu30 <yi4.liu@intel.com> * add warmup to benchmark Signed-off-by: yiliu30 <yi4.liu@intel.com> * add benchmark Signed-off-by: yiliu30 <yi4.liu@intel.com> * clean code Signed-off-by: yiliu30 <yi4.liu@intel.com> * format code Signed-off-by: yiliu30 <yi4.liu@intel.com> * use tiny kernel Signed-off-by: yiliu30 <yi4.liu@intel.com> * add more note Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * correct typos Signed-off-by: yiliu30 <yi4.liu@intel.com> * remove hard code Signed-off-by: yiliu30 <yi4.liu@intel.com> * use intx Signed-off-by: yiliu30 <yi4.liu@intel.com> * enable offload for multitensor Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the default config Signed-off-by: yiliu30 <yi4.liu@intel.com> * refine note Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the version check Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * update Signed-off-by: yiliu30 <yi4.liu@intel.com> * add ut Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * add scripts Signed-off-by: yiliu30 <yi4.liu@intel.com> * format code Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * update Signed-off-by: yiliu30 <yi4.liu@intel.com> * fix typo Signed-off-by: yiliu30 <yi4.liu@intel.com> * refine bench code Signed-off-by: yiliu30 <yi4.liu@intel.com> * Enable `use_optimized_layer_output` and AO' llama (pytorch#12) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Refine the Doc (pytorch#14) --------- Signed-off-by: yiliu30 <yi4.liu@intel.com> * add more docstring Signed-off-by: yiliu30 <yi4.liu@intel.com> * add paper link Signed-off-by: yiliu30 <yi4.liu@intel.com> * correct some note Signed-off-by: yiliu30 <yi4.liu@intel.com> * add cmd Signed-off-by: yiliu30 <yi4.liu@intel.com> * udpdate the scripts Signed-off-by: yiliu30 <yi4.liu@intel.com> * revert some change Signed-off-by: yiliu30 <yi4.liu@intel.com> * Add a lightweight configuration for quick benchmarking (pytorch#15) Signed-off-by: yiliu30 <yi4.liu@intel.com> * update quant method name Signed-off-by: yiliu30 <yi4.liu@intel.com> * Wrap model's buffers and params to `MultiTensor` & update the results (pytorch#16) * wrap model's buffers and params to `MultiTensor` and update the results Signed-off-by: yiliu30 <yi4.liu@intel.com> --------- Signed-off-by: yiliu30 <yi4.liu@intel.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
Summary: Adding int4 quantized tensor subclass support, also refactoring tensor
subclass code to be easier to use with multiple subclasses. This
subclass uses the tinygemm int4 mixed dtype gemm that was added to
pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also
added support for .to for tensor subclasses to get the save/loading of
meta tensors working for int4.
Test Plan: python test/test.py
Reviewers:
Subscribers:
Tasks:
Tags: