Adding int4 quantized tensor subclass #15

HDCharles · 2023-11-28T05:31:18Z

Stack from ghstack (oldest at bottom):

-> Adding int4 quantized tensor subclass #15

Summary: Adding int4 quantized tensor subclass support, also refactoring tensor
subclass code to be easier to use with multiple subclasses. This
subclass uses the tinygemm int4 mixed dtype gemm that was added to
pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also
added support for .to for tensor subclasses to get the save/loading of
meta tensors working for int4.

Test Plan: python test/test.py

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: Adding int4 tensor subclass support, also refactoring tensor subclass code to be easier to use with multiple subclasses. This subclass uses the tinygemm int4 mixed dtype gemm that was added to pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also added support for .to for tensor subclasses to get the save/loading of meta tensors working for int4. Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Adding int4 tensor subclass support, also refactoring tensor subclass code to be easier to use with multiple subclasses. This subclass uses the tinygemm int4 mixed dtype gemm that was added to pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also added support for .to for tensor subclasses to get the save/loading of meta tensors working for int4. Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 591d99c Pull Request resolved: #15

Summary: Adding int4 tensor subclass support, also refactoring tensor subclass code to be easier to use with multiple subclasses. This subclass uses the tinygemm int4 mixed dtype gemm that was added to pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also added support for .to for tensor subclasses to get the save/loading of meta tensors working for int4. Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Adding int4 tensor subclass support, also refactoring tensor subclass code to be easier to use with multiple subclasses. This subclass uses the tinygemm int4 mixed dtype gemm that was added to pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also added support for .to for tensor subclasses to get the save/loading of meta tensors working for int4. Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 591d99c Pull Request resolved: #15

Summary: Adding int4 quantized tensor subclass support, also refactoring tensor subclass code to be easier to use with multiple subclasses. This subclass uses the tinygemm int4 mixed dtype gemm that was added to pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also added support for .to for tensor subclasses to get the save/loading of meta tensors working for int4. Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Adding int4 quantized tensor subclass support, also refactoring tensor subclass code to be easier to use with multiple subclasses. This subclass uses the tinygemm int4 mixed dtype gemm that was added to pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also added support for .to for tensor subclasses to get the save/loading of meta tensors working for int4. Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e1fdcb9 Pull Request resolved: #15

Summary: Adding int4 quantized tensor subclass support, also refactoring tensor subclass code to be easier to use with multiple subclasses. This subclass uses the tinygemm int4 mixed dtype gemm that was added to pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also added support for .to for tensor subclasses to get the save/loading of meta tensors working for int4. Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e1fdcb9 Pull Request resolved: pytorch#15

* initial flow for autoround Signed-off-by: yiliu30 <yi4.liu@intel.com> * update flow Signed-off-by: yiliu30 <yi4.liu@intel.com> * use int4 kernel Signed-off-by: yiliu30 <yi4.liu@intel.com> * remove debug code Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the forward Signed-off-by: yiliu30 <yi4.liu@intel.com> * clean code Signed-off-by: yiliu30 <yi4.liu@intel.com> * e2e example Signed-off-by: yiliu30 <yi4.liu@intel.com> * refine code Signed-off-by: yiliu30 <yi4.liu@intel.com> * add requirements for test Signed-off-by: yiliu30 <yi4.liu@intel.com> * update test Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the readme Signed-off-by: yiliu30 <yi4.liu@intel.com> * add readme Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the filenames Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the np version Signed-off-by: yiliu30 <yi4.liu@intel.com> * add demo Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * add more docs Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * add doc Signed-off-by: yiliu30 <yi4.liu@intel.com> * use `AffineQuantizedTensor` Signed-off-by: yiliu30 <yi4.liu@intel.com> * impl ar using multensors Signed-off-by: yiliu30 <yi4.liu@intel.com> * clean code Signed-off-by: yiliu30 <yi4.liu@intel.com> * use hook + multensors Signed-off-by: yiliu30 <yi4.liu@intel.com> * separate mul_tensors into a new file Signed-off-by: yiliu30 <yi4.liu@intel.com> * fix typos Signed-off-by: yiliu30 <yi4.liu@intel.com> * rename mul_tensor to multi_tensor Signed-off-by: yiliu30 <yi4.liu@intel.com> * enable amp Signed-off-by: yiliu30 <yi4.liu@intel.com> * eval model Signed-off-by: yiliu30 <yi4.liu@intel.com> * add gen examples Signed-off-by: yiliu30 <yi4.liu@intel.com> * add warmup to benchmark Signed-off-by: yiliu30 <yi4.liu@intel.com> * add benchmark Signed-off-by: yiliu30 <yi4.liu@intel.com> * clean code Signed-off-by: yiliu30 <yi4.liu@intel.com> * format code Signed-off-by: yiliu30 <yi4.liu@intel.com> * use tiny kernel Signed-off-by: yiliu30 <yi4.liu@intel.com> * add more note Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * correct typos Signed-off-by: yiliu30 <yi4.liu@intel.com> * remove hard code Signed-off-by: yiliu30 <yi4.liu@intel.com> * use intx Signed-off-by: yiliu30 <yi4.liu@intel.com> * enable offload for multitensor Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the default config Signed-off-by: yiliu30 <yi4.liu@intel.com> * refine note Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the version check Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * update Signed-off-by: yiliu30 <yi4.liu@intel.com> * add ut Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * add scripts Signed-off-by: yiliu30 <yi4.liu@intel.com> * format code Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * update Signed-off-by: yiliu30 <yi4.liu@intel.com> * fix typo Signed-off-by: yiliu30 <yi4.liu@intel.com> * refine bench code Signed-off-by: yiliu30 <yi4.liu@intel.com> * Enable `use_optimized_layer_output` and AO' llama (#12) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Refine the Doc (#14) --------- Signed-off-by: yiliu30 <yi4.liu@intel.com> * add more docstring Signed-off-by: yiliu30 <yi4.liu@intel.com> * add paper link Signed-off-by: yiliu30 <yi4.liu@intel.com> * correct some note Signed-off-by: yiliu30 <yi4.liu@intel.com> * add cmd Signed-off-by: yiliu30 <yi4.liu@intel.com> * udpdate the scripts Signed-off-by: yiliu30 <yi4.liu@intel.com> * revert some change Signed-off-by: yiliu30 <yi4.liu@intel.com> * Add a lightweight configuration for quick benchmarking (#15) Signed-off-by: yiliu30 <yi4.liu@intel.com> * update quant method name Signed-off-by: yiliu30 <yi4.liu@intel.com> * Wrap model's buffers and params to `MultiTensor` & update the results (#16) * wrap model's buffers and params to `MultiTensor` and update the results Signed-off-by: yiliu30 <yi4.liu@intel.com> --------- Signed-off-by: yiliu30 <yi4.liu@intel.com>

* initial flow for autoround Signed-off-by: yiliu30 <yi4.liu@intel.com> * update flow Signed-off-by: yiliu30 <yi4.liu@intel.com> * use int4 kernel Signed-off-by: yiliu30 <yi4.liu@intel.com> * remove debug code Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the forward Signed-off-by: yiliu30 <yi4.liu@intel.com> * clean code Signed-off-by: yiliu30 <yi4.liu@intel.com> * e2e example Signed-off-by: yiliu30 <yi4.liu@intel.com> * refine code Signed-off-by: yiliu30 <yi4.liu@intel.com> * add requirements for test Signed-off-by: yiliu30 <yi4.liu@intel.com> * update test Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the readme Signed-off-by: yiliu30 <yi4.liu@intel.com> * add readme Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the filenames Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the np version Signed-off-by: yiliu30 <yi4.liu@intel.com> * add demo Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * add more docs Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * add doc Signed-off-by: yiliu30 <yi4.liu@intel.com> * use `AffineQuantizedTensor` Signed-off-by: yiliu30 <yi4.liu@intel.com> * impl ar using multensors Signed-off-by: yiliu30 <yi4.liu@intel.com> * clean code Signed-off-by: yiliu30 <yi4.liu@intel.com> * use hook + multensors Signed-off-by: yiliu30 <yi4.liu@intel.com> * separate mul_tensors into a new file Signed-off-by: yiliu30 <yi4.liu@intel.com> * fix typos Signed-off-by: yiliu30 <yi4.liu@intel.com> * rename mul_tensor to multi_tensor Signed-off-by: yiliu30 <yi4.liu@intel.com> * enable amp Signed-off-by: yiliu30 <yi4.liu@intel.com> * eval model Signed-off-by: yiliu30 <yi4.liu@intel.com> * add gen examples Signed-off-by: yiliu30 <yi4.liu@intel.com> * add warmup to benchmark Signed-off-by: yiliu30 <yi4.liu@intel.com> * add benchmark Signed-off-by: yiliu30 <yi4.liu@intel.com> * clean code Signed-off-by: yiliu30 <yi4.liu@intel.com> * format code Signed-off-by: yiliu30 <yi4.liu@intel.com> * use tiny kernel Signed-off-by: yiliu30 <yi4.liu@intel.com> * add more note Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * correct typos Signed-off-by: yiliu30 <yi4.liu@intel.com> * remove hard code Signed-off-by: yiliu30 <yi4.liu@intel.com> * use intx Signed-off-by: yiliu30 <yi4.liu@intel.com> * enable offload for multitensor Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the default config Signed-off-by: yiliu30 <yi4.liu@intel.com> * refine note Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the version check Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * update Signed-off-by: yiliu30 <yi4.liu@intel.com> * add ut Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * add scripts Signed-off-by: yiliu30 <yi4.liu@intel.com> * format code Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * update Signed-off-by: yiliu30 <yi4.liu@intel.com> * fix typo Signed-off-by: yiliu30 <yi4.liu@intel.com> * refine bench code Signed-off-by: yiliu30 <yi4.liu@intel.com> * Enable `use_optimized_layer_output` and AO' llama (pytorch#12) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Refine the Doc (pytorch#14) --------- Signed-off-by: yiliu30 <yi4.liu@intel.com> * add more docstring Signed-off-by: yiliu30 <yi4.liu@intel.com> * add paper link Signed-off-by: yiliu30 <yi4.liu@intel.com> * correct some note Signed-off-by: yiliu30 <yi4.liu@intel.com> * add cmd Signed-off-by: yiliu30 <yi4.liu@intel.com> * udpdate the scripts Signed-off-by: yiliu30 <yi4.liu@intel.com> * revert some change Signed-off-by: yiliu30 <yi4.liu@intel.com> * Add a lightweight configuration for quick benchmarking (pytorch#15) Signed-off-by: yiliu30 <yi4.liu@intel.com> * update quant method name Signed-off-by: yiliu30 <yi4.liu@intel.com> * Wrap model's buffers and params to `MultiTensor` & update the results (pytorch#16) * wrap model's buffers and params to `MultiTensor` and update the results Signed-off-by: yiliu30 <yi4.liu@intel.com> --------- Signed-off-by: yiliu30 <yi4.liu@intel.com>

This was referenced Nov 28, 2023

Adding subclass and api for weight-only quant #11

Merged

Adding tests for save/load support #12

Merged

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 28, 2023

HDCharles mentioned this pull request Nov 28, 2023

Adding tests for save/load support #16

Merged

HDCharles changed the title ~~Adding int4 tensor subclass~~ Adding int4 quantized tensor subclass Nov 28, 2023

HDCharles merged commit 701b120 into gh/HDCharles/4/base Nov 28, 2023

HDCharles deleted the gh/HDCharles/4/head branch November 28, 2023 23:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding int4 quantized tensor subclass #15

Adding int4 quantized tensor subclass #15

Uh oh!

HDCharles commented Nov 28, 2023 •

edited

Loading

Uh oh!

Uh oh!

Adding int4 quantized tensor subclass #15

Adding int4 quantized tensor subclass #15

Uh oh!

Conversation

HDCharles commented Nov 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

HDCharles commented Nov 28, 2023 •

edited

Loading