Skip to content

Adding int4 quantized tensor subclass #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 28, 2023

Conversation

HDCharles
Copy link
Contributor

@HDCharles HDCharles commented Nov 28, 2023

Stack from ghstack (oldest at bottom):

Summary: Adding int4 quantized tensor subclass support, also refactoring tensor
subclass code to be easier to use with multiple subclasses. This
subclass uses the tinygemm int4 mixed dtype gemm that was added to
pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also
added support for .to for tensor subclasses to get the save/loading of
meta tensors working for int4.

Test Plan: python test/test.py

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: Adding int4 tensor subclass support, also refactoring tensor
subclass code to be easier to use with multiple subclasses. This
subclass uses the tinygemm int4 mixed dtype gemm that was added to
pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also
added support for .to for tensor subclasses to get the save/loading of
meta tensors working for int4.

Test Plan: python test/test.py

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 28, 2023
HDCharles added a commit that referenced this pull request Nov 28, 2023
Summary: Adding int4 tensor subclass support, also refactoring tensor
subclass code to be easier to use with multiple subclasses. This
subclass uses the tinygemm int4 mixed dtype gemm that was added to
pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also
added support for .to for tensor subclasses to get the save/loading of
meta tensors working for int4.

Test Plan: python test/test.py

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 591d99c
Pull Request resolved: #15
Summary: Adding int4 tensor subclass support, also refactoring tensor
subclass code to be easier to use with multiple subclasses. This
subclass uses the tinygemm int4 mixed dtype gemm that was added to
pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also
added support for .to for tensor subclasses to get the save/loading of
meta tensors working for int4.

Test Plan: python test/test.py

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
HDCharles added a commit that referenced this pull request Nov 28, 2023
Summary: Adding int4 tensor subclass support, also refactoring tensor
subclass code to be easier to use with multiple subclasses. This
subclass uses the tinygemm int4 mixed dtype gemm that was added to
pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also
added support for .to for tensor subclasses to get the save/loading of
meta tensors working for int4.

Test Plan: python test/test.py

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 591d99c
Pull Request resolved: #15
@HDCharles HDCharles changed the title Adding int4 tensor subclass Adding int4 quantized tensor subclass Nov 28, 2023
Summary: Adding int4 quantized tensor subclass support, also refactoring tensor
subclass code to be easier to use with multiple subclasses. This
subclass uses the tinygemm int4 mixed dtype gemm that was added to
pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also
added support for .to for tensor subclasses to get the save/loading of
meta tensors working for int4.

Test Plan: python test/test.py

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
HDCharles added a commit that referenced this pull request Nov 28, 2023
Summary: Adding int4 quantized tensor subclass support, also refactoring tensor
subclass code to be easier to use with multiple subclasses. This
subclass uses the tinygemm int4 mixed dtype gemm that was added to
pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also
added support for .to for tensor subclasses to get the save/loading of
meta tensors working for int4.

Test Plan: python test/test.py

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: e1fdcb9
Pull Request resolved: #15
@HDCharles HDCharles merged commit 701b120 into gh/HDCharles/4/base Nov 28, 2023
HDCharles added a commit that referenced this pull request Nov 28, 2023
Summary: Adding int4 quantized tensor subclass support, also refactoring tensor
subclass code to be easier to use with multiple subclasses. This
subclass uses the tinygemm int4 mixed dtype gemm that was added to
pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also
added support for .to for tensor subclasses to get the save/loading of
meta tensors working for int4.

Test Plan: python test/test.py

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: e1fdcb9
Pull Request resolved: #15
@HDCharles HDCharles deleted the gh/HDCharles/4/head branch November 28, 2023 23:41
dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024
Summary: Adding int4 quantized tensor subclass support, also refactoring tensor
subclass code to be easier to use with multiple subclasses. This
subclass uses the tinygemm int4 mixed dtype gemm that was added to
pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also
added support for .to for tensor subclasses to get the save/loading of
meta tensors working for int4.

Test Plan: python test/test.py

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: e1fdcb9
Pull Request resolved: pytorch#15
jerryzh168 pushed a commit that referenced this pull request Sep 4, 2024
* initial flow for autoround

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update flow

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* use int4 kernel

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* remove debug code

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update the forward

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* clean code

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* e2e example

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* refine code

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add requirements for test

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update test

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update the readme

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add readme

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update the filenames

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update the np version

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add demo

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* format

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add more docs

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* format

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add doc

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* use `AffineQuantizedTensor`

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* impl ar using multensors

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* clean code

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* use hook + multensors

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* separate mul_tensors into a new file

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* fix typos

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* rename mul_tensor to multi_tensor

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* enable amp

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* eval model

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add gen examples

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add warmup to benchmark

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add benchmark

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* clean code

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* format code

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* use tiny kernel

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add more note

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* format

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* correct typos

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* remove hard code

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* use intx

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* enable offload for multitensor

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update the default config

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* refine note

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update the version check

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* format

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add ut

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* format

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add scripts

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* format code

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* format

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* fix typo

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* refine bench code

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Enable `use_optimized_layer_output` and AO' llama (#12)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Refine the Doc (#14)

---------

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add more docstring

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add paper link

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* correct some note

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add cmd

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* udpdate the scripts

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* revert some change

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Add a lightweight configuration for quick benchmarking (#15)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update quant method name

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Wrap model's buffers and params to `MultiTensor` & update the results (#16)

* wrap model's buffers and params to `MultiTensor` and update the results

Signed-off-by: yiliu30 <yi4.liu@intel.com>

---------

Signed-off-by: yiliu30 <yi4.liu@intel.com>
jerryzh168 pushed a commit to jerryzh168/ao that referenced this pull request Sep 4, 2024
* initial flow for autoround

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update flow

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* use int4 kernel

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* remove debug code

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update the forward

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* clean code

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* e2e example

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* refine code

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add requirements for test

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update test

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update the readme

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add readme

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update the filenames

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update the np version

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add demo

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* format

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add more docs

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* format

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add doc

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* use `AffineQuantizedTensor`

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* impl ar using multensors

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* clean code

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* use hook + multensors

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* separate mul_tensors into a new file

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* fix typos

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* rename mul_tensor to multi_tensor

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* enable amp

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* eval model

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add gen examples

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add warmup to benchmark

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add benchmark

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* clean code

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* format code

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* use tiny kernel

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add more note

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* format

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* correct typos

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* remove hard code

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* use intx

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* enable offload for multitensor

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update the default config

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* refine note

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update the version check

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* format

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add ut

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* format

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add scripts

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* format code

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* format

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* fix typo

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* refine bench code

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Enable `use_optimized_layer_output` and AO' llama (pytorch#12)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Refine the Doc (pytorch#14)

---------

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add more docstring

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add paper link

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* correct some note

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add cmd

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* udpdate the scripts

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* revert some change

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Add a lightweight configuration for quick benchmarking (pytorch#15)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update quant method name

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Wrap model's buffers and params to `MultiTensor` & update the results (pytorch#16)

* wrap model's buffers and params to `MultiTensor` and update the results

Signed-off-by: yiliu30 <yi4.liu@intel.com>

---------

Signed-off-by: yiliu30 <yi4.liu@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants