Adding tests for save/load support #12

HDCharles · 2023-11-22T00:02:02Z

Stack from ghstack (oldest at bottom):

Summary: we are able to save a model quantized with a tensor subclass,
save the state dict, then later, load model as meta tensor (i.e. only
load tensor metadata not actually parameters) apply quantization api,
and then load the quantized model state dict.

We change the dtype of the subclass to match the dtype of the
dequantized form, both to align with subclass design guidelines and to
make this work

Test Plan: python test/test.py

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: we are able to save a model quantized with a tensor subclass, save the state dict, then later, load model as meta tensor (i.e. only load tensor metadata not actually parameters) apply quantization api, and then load the quantized model state dict. We change the dtype of the subclass to match the dtype of the dequantized form, both to align with subclass design guidelines and to make this work Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: we are able to save a model quantized with a tensor subclass, save the state dict, then later, load model as meta tensor (i.e. only load tensor metadata not actually parameters) apply quantization api, and then load the quantized model state dict. We change the dtype of the subclass to match the dtype of the dequantized form, both to align with subclass design guidelines and to make this work Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 84a69d8 Pull Request resolved: #12

Summary: we are able to save a model quantized with a tensor subclass, save the state dict, then later, load model as meta tensor (i.e. only load tensor metadata not actually parameters) apply quantization api, and then load the quantized model state dict. We change the dtype of the subclass to match the dtype of the dequantized form, both to align with subclass design guidelines and to make this work Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

* initial flow for autoround Signed-off-by: yiliu30 <yi4.liu@intel.com> * update flow Signed-off-by: yiliu30 <yi4.liu@intel.com> * use int4 kernel Signed-off-by: yiliu30 <yi4.liu@intel.com> * remove debug code Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the forward Signed-off-by: yiliu30 <yi4.liu@intel.com> * clean code Signed-off-by: yiliu30 <yi4.liu@intel.com> * e2e example Signed-off-by: yiliu30 <yi4.liu@intel.com> * refine code Signed-off-by: yiliu30 <yi4.liu@intel.com> * add requirements for test Signed-off-by: yiliu30 <yi4.liu@intel.com> * update test Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the readme Signed-off-by: yiliu30 <yi4.liu@intel.com> * add readme Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the filenames Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the np version Signed-off-by: yiliu30 <yi4.liu@intel.com> * add demo Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * add more docs Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * add doc Signed-off-by: yiliu30 <yi4.liu@intel.com> * use `AffineQuantizedTensor` Signed-off-by: yiliu30 <yi4.liu@intel.com> * impl ar using multensors Signed-off-by: yiliu30 <yi4.liu@intel.com> * clean code Signed-off-by: yiliu30 <yi4.liu@intel.com> * use hook + multensors Signed-off-by: yiliu30 <yi4.liu@intel.com> * separate mul_tensors into a new file Signed-off-by: yiliu30 <yi4.liu@intel.com> * fix typos Signed-off-by: yiliu30 <yi4.liu@intel.com> * rename mul_tensor to multi_tensor Signed-off-by: yiliu30 <yi4.liu@intel.com> * enable amp Signed-off-by: yiliu30 <yi4.liu@intel.com> * eval model Signed-off-by: yiliu30 <yi4.liu@intel.com> * add gen examples Signed-off-by: yiliu30 <yi4.liu@intel.com> * add warmup to benchmark Signed-off-by: yiliu30 <yi4.liu@intel.com> * add benchmark Signed-off-by: yiliu30 <yi4.liu@intel.com> * clean code Signed-off-by: yiliu30 <yi4.liu@intel.com> * format code Signed-off-by: yiliu30 <yi4.liu@intel.com> * use tiny kernel Signed-off-by: yiliu30 <yi4.liu@intel.com> * add more note Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * correct typos Signed-off-by: yiliu30 <yi4.liu@intel.com> * remove hard code Signed-off-by: yiliu30 <yi4.liu@intel.com> * use intx Signed-off-by: yiliu30 <yi4.liu@intel.com> * enable offload for multitensor Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the default config Signed-off-by: yiliu30 <yi4.liu@intel.com> * refine note Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the version check Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * update Signed-off-by: yiliu30 <yi4.liu@intel.com> * add ut Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * add scripts Signed-off-by: yiliu30 <yi4.liu@intel.com> * format code Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * update Signed-off-by: yiliu30 <yi4.liu@intel.com> * fix typo Signed-off-by: yiliu30 <yi4.liu@intel.com> * refine bench code Signed-off-by: yiliu30 <yi4.liu@intel.com> * Enable `use_optimized_layer_output` and AO' llama (#12) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Refine the Doc (#14) --------- Signed-off-by: yiliu30 <yi4.liu@intel.com> * add more docstring Signed-off-by: yiliu30 <yi4.liu@intel.com> * add paper link Signed-off-by: yiliu30 <yi4.liu@intel.com> * correct some note Signed-off-by: yiliu30 <yi4.liu@intel.com> * add cmd Signed-off-by: yiliu30 <yi4.liu@intel.com> * udpdate the scripts Signed-off-by: yiliu30 <yi4.liu@intel.com> * revert some change Signed-off-by: yiliu30 <yi4.liu@intel.com> * Add a lightweight configuration for quick benchmarking (#15) Signed-off-by: yiliu30 <yi4.liu@intel.com> * update quant method name Signed-off-by: yiliu30 <yi4.liu@intel.com> * Wrap model's buffers and params to `MultiTensor` & update the results (#16) * wrap model's buffers and params to `MultiTensor` and update the results Signed-off-by: yiliu30 <yi4.liu@intel.com> --------- Signed-off-by: yiliu30 <yi4.liu@intel.com>

* initial flow for autoround Signed-off-by: yiliu30 <yi4.liu@intel.com> * update flow Signed-off-by: yiliu30 <yi4.liu@intel.com> * use int4 kernel Signed-off-by: yiliu30 <yi4.liu@intel.com> * remove debug code Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the forward Signed-off-by: yiliu30 <yi4.liu@intel.com> * clean code Signed-off-by: yiliu30 <yi4.liu@intel.com> * e2e example Signed-off-by: yiliu30 <yi4.liu@intel.com> * refine code Signed-off-by: yiliu30 <yi4.liu@intel.com> * add requirements for test Signed-off-by: yiliu30 <yi4.liu@intel.com> * update test Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the readme Signed-off-by: yiliu30 <yi4.liu@intel.com> * add readme Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the filenames Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the np version Signed-off-by: yiliu30 <yi4.liu@intel.com> * add demo Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * add more docs Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * add doc Signed-off-by: yiliu30 <yi4.liu@intel.com> * use `AffineQuantizedTensor` Signed-off-by: yiliu30 <yi4.liu@intel.com> * impl ar using multensors Signed-off-by: yiliu30 <yi4.liu@intel.com> * clean code Signed-off-by: yiliu30 <yi4.liu@intel.com> * use hook + multensors Signed-off-by: yiliu30 <yi4.liu@intel.com> * separate mul_tensors into a new file Signed-off-by: yiliu30 <yi4.liu@intel.com> * fix typos Signed-off-by: yiliu30 <yi4.liu@intel.com> * rename mul_tensor to multi_tensor Signed-off-by: yiliu30 <yi4.liu@intel.com> * enable amp Signed-off-by: yiliu30 <yi4.liu@intel.com> * eval model Signed-off-by: yiliu30 <yi4.liu@intel.com> * add gen examples Signed-off-by: yiliu30 <yi4.liu@intel.com> * add warmup to benchmark Signed-off-by: yiliu30 <yi4.liu@intel.com> * add benchmark Signed-off-by: yiliu30 <yi4.liu@intel.com> * clean code Signed-off-by: yiliu30 <yi4.liu@intel.com> * format code Signed-off-by: yiliu30 <yi4.liu@intel.com> * use tiny kernel Signed-off-by: yiliu30 <yi4.liu@intel.com> * add more note Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * correct typos Signed-off-by: yiliu30 <yi4.liu@intel.com> * remove hard code Signed-off-by: yiliu30 <yi4.liu@intel.com> * use intx Signed-off-by: yiliu30 <yi4.liu@intel.com> * enable offload for multitensor Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the default config Signed-off-by: yiliu30 <yi4.liu@intel.com> * refine note Signed-off-by: yiliu30 <yi4.liu@intel.com> * update the version check Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * update Signed-off-by: yiliu30 <yi4.liu@intel.com> * add ut Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * add scripts Signed-off-by: yiliu30 <yi4.liu@intel.com> * format code Signed-off-by: yiliu30 <yi4.liu@intel.com> * format Signed-off-by: yiliu30 <yi4.liu@intel.com> * update Signed-off-by: yiliu30 <yi4.liu@intel.com> * fix typo Signed-off-by: yiliu30 <yi4.liu@intel.com> * refine bench code Signed-off-by: yiliu30 <yi4.liu@intel.com> * Enable `use_optimized_layer_output` and AO' llama (pytorch#12) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Refine the Doc (pytorch#14) --------- Signed-off-by: yiliu30 <yi4.liu@intel.com> * add more docstring Signed-off-by: yiliu30 <yi4.liu@intel.com> * add paper link Signed-off-by: yiliu30 <yi4.liu@intel.com> * correct some note Signed-off-by: yiliu30 <yi4.liu@intel.com> * add cmd Signed-off-by: yiliu30 <yi4.liu@intel.com> * udpdate the scripts Signed-off-by: yiliu30 <yi4.liu@intel.com> * revert some change Signed-off-by: yiliu30 <yi4.liu@intel.com> * Add a lightweight configuration for quick benchmarking (pytorch#15) Signed-off-by: yiliu30 <yi4.liu@intel.com> * update quant method name Signed-off-by: yiliu30 <yi4.liu@intel.com> * Wrap model's buffers and params to `MultiTensor` & update the results (pytorch#16) * wrap model's buffers and params to `MultiTensor` and update the results Signed-off-by: yiliu30 <yi4.liu@intel.com> --------- Signed-off-by: yiliu30 <yi4.liu@intel.com>

And logic in that file: - Enable it on all pushes, not just one modifying the file - Use `pushd`/`popd` - Fix typo in `output-path` - FIx model_path definition - Download tinystories/tokenizer from the right location - Setup Python-3.8 - Add numpy to requirements - Skip export (unsupported in torch-2.2) - Break job into several smaller steps

HDCharles mentioned this pull request Nov 22, 2023

Adding subclass and api for weight-only quant #11

Merged

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 22, 2023

HDCharles mentioned this pull request Nov 28, 2023

Adding int4 quantized tensor subclass #15

Merged

HDCharles merged commit 8949fb2 into gh/HDCharles/3/base Nov 28, 2023

HDCharles deleted the gh/HDCharles/3/head branch November 28, 2023 05:36

HDCharles restored the gh/HDCharles/3/head branch November 28, 2023 05:36

facebook-github-bot deleted the gh/HDCharles/3/head branch December 28, 2023 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding tests for save/load support #12

Adding tests for save/load support #12

Uh oh!

HDCharles commented Nov 22, 2023 •

edited

Loading

Uh oh!

Uh oh!

Adding tests for save/load support #12

Adding tests for save/load support #12

Uh oh!

Conversation

HDCharles commented Nov 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

HDCharles commented Nov 22, 2023 •

edited

Loading