Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add Channel Wise Quantization Support #441

Merged
merged 1 commit into from
Feb 12, 2024

Conversation

rahul-tuli
Copy link
Member

@rahul-tuli rahul-tuli commented Feb 12, 2024

This PR adds Channel Wise Quantization Support to deepsparse.analyze API and ModelAnalysis class

Before this PR:

deepsparse.analyze /network/alexandre/tyler/single_layer/deployment/model.onnx
  File "/home/ubuntu/venv/bin/deepsparse.analyze", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/ubuntu/venv/lib/python3.10/site-packages/sparsezoo/analyze/cli.py", line 98, in wrap_common_options
    return command(*args, **kwargs)
  File "/home/ubuntu/venv/lib/python3.10/site-packages/sparsezoo/analyze/cli.py", line 152, in wrap_with_performance_options
    return command(*args, **kwargs)
  File "/home/ubuntu/venv/lib/python3.10/site-packages/deepsparse/analyze.py", line 77, in main
    analysis = ModelAnalysis.create(model_path)
  File "/home/ubuntu/venv/lib/python3.10/site-packages/sparsezoo/analyze/analysis.py", line 1308, in create
    result = ModelAnalysis.from_onnx(
  File "/home/ubuntu/venv/lib/python3.10/site-packages/sparsezoo/analyze/analysis.py", line 922, in from_onnx
    node_analyses = cls.analyze_nodes(model_graph)
  File "/home/ubuntu/venv/lib/python3.10/site-packages/sparsezoo/analyze/analysis.py", line 1371, in analyze_nodes
    node_analysis = NodeAnalysis.from_node(
  File "/home/ubuntu/venv/lib/python3.10/site-packages/sparsezoo/analyze/analysis.py", line 365, in from_node
    sparse_node = is_sparse_layer(model_graph, node)
  File "/home/ubuntu/venv/lib/python3.10/site-packages/sparsezoo/utils/onnx/analysis.py", line 257, in is_sparse_layer
    return get_node_sparsity(model_graph, node) > 0
  File "/home/ubuntu/venv/lib/python3.10/site-packages/sparsezoo/utils/onnx/analysis.py", line 318, in get_node_sparsity
    num_zeros, weight_size = get_node_num_zeros_and_size(model_graph, node)
  File "/home/ubuntu/venv/lib/python3.10/site-packages/sparsezoo/utils/onnx/analysis.py", line 148, in get_node_num_zeros_and_size
    zero_point = get_zero_point(model_graph, node)
  File "/home/ubuntu/venv/lib/python3.10/site-packages/sparsezoo/utils/onnx/analysis.py", line 244, in get_zero_point
    raise NotImplementedError("Channel-wise zero points are not supported")
NotImplementedError: Channel-wise zero points are not supported

After This PR (command runs successfully):

2024-02-12 09:00:04 deepsparse.analyze INFO     Starting Analysis ...
INFO:deepsparse.analyze:Starting Analysis ...
2024-02-12 09:00:40 deepsparse.analyze INFO     Analysis complete, collating results...
INFO:deepsparse.analyze:Analysis complete, collating results...
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.7.0.20240104 COMMUNITY | (86c38139) (release) (optimized) (system=avx512, binary=avx512)
[7f3c2eff4740 >WARN<  operator() ./src/include/wand/utility/warnings.hpp:14] Generating emulated code for quantized (INT8) operations since no VNNI instructions were detected. Set NM_FAST_VNNI_EMULATION=1 to increase performance at the expense of accuracy.
Node Timings for Benchmark # 1:
 NODE_NAME                      AVG_RUNTIME                    
 /model/layers.0/input_layernor 2.10                           
 m/ReduceMean                                                  
 /model/layers.0/input_layernor 3.89                           
 m/Mul                                                         
 /model/layers.0/self_attn/v_pr 183.68                         
 oj/module/MatMul_quant                                        
 /model/layers.0/self_attn/k_pr 182.47                         
 oj/module/MatMul_quant                                        
 /model/layers.0/self_attn/q_pr 181.16                         
 oj/module/MatMul_quant                                        
 /model/layers.0/self_attn/attn 125.34                         
 _weights_matmul/MatMul_quant                                  
 /model/Sub                     5.61                           
 /model/Add_1                   3.01                           
 /model/layers.0/self_attn/attn 157.15                         
 _output_matmul/MatMul_quant                                   
 /model/layers.0/self_attn/o_pr 186.67                         
 oj/module/MatMul_quant                                        
 /model/layers.0/mlp/up_proj/mo 565.44                         
 dule/MatMul_quant                                             
 /model/layers.0/mlp/gate_proj/ 567.57                         
 module/MatMul_quant                                           
 /model/layers.0/mlp/Mul        8.28                           
 /model/layers.0/mlp/down_proj/ 506.74                         
 module/MatMul_quant                                           
 /lm_head/module/MatMul_quant   25.92                          

Params:
 MODEL                          SPARSITY                       QUANTIZED                      COUNT                          SIZE                           
 /home/rahul/models/llama-      29.77                          100.00                         464519168                      2609854493                     
 single-layer-channel-                                                                                                                                      
 quant/deployment/model.onnx                                                                                                                                

Ops:
 MODEL                          SPARSITY                       QUANTIZED                      COUNT                          SIZE                           
 /home/rahul/models/llama-      29.77                          100.00                         464519374                      2609859089                     
 single-layer-channel-                                                                                                                                      
 quant/deployment/model.onnx                                                                                                                                

Overall:
 MODEL                          LATENCY                        THROUGHPUT                     SUPPORTED_GRAPH                SPARSITY                       QUANTIZED                      
 /home/rahul/models/llama-      2895.26                        0.35                           1.00                           29.77                          100.00                         
 single-layer-channel-                                                                                                                                                                     
 quant/deployment/model.onnx 

Can Also be tested with Sparsezoo using the following snippet:

from sparsezoo.analyze import ModelAnalysis

model_path ="/network/alexandre/tyler/single_layer/deployment/model.onnx"
analysis = ModelAnalysis.create(model_path)
my_yaml = analysis.yaml()
print(my_yaml)

@rahul-tuli rahul-tuli force-pushed the analyze/add-channel-wise-quantization-support branch from ff51003 to 68b685a Compare February 12, 2024 14:32
@rahul-tuli rahul-tuli self-assigned this Feb 12, 2024
@rahul-tuli rahul-tuli marked this pull request as ready for review February 12, 2024 14:34
Copy link
Contributor

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - let's sync more on why we need to group four block for the zero points - I don't believe we need to

@bfineran bfineran merged commit 944128f into main Feb 12, 2024
4 checks passed
@bfineran bfineran deleted the analyze/add-channel-wise-quantization-support branch February 12, 2024 19:38
rahul-tuli added a commit that referenced this pull request Feb 12, 2024
bfineran added a commit that referenced this pull request Feb 13, 2024
Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
Satrat added a commit that referenced this pull request Feb 22, 2024
* `RegistryMixin` improved alias management (#404)

* initial commit

* add docstrings

* simplify

* hardening

* refactor

* format registry lookup strings to be lowercases

* standardise aliases

* Move evaluator registry (#411)

* More control over external data size (#412)

* When splitting external data, avoid renaming `model.data` to `model.data.1` if only one external data file gets eventually saved (#414)

* [model.download] fix function returning nothing (#420)

* [BugFix] Path not expanded (#418)

* [Fix] Allow for processing Path in the sparsezoo analysis (#417)

* Raise TypeError instead of ValueError (#426)

* Fix misleading docstring (#416)

Add test

* add support for benchmark.yaml (#415)

* add support for benchmark.yaml

recent zoo models use `benchmark.yaml` instead of `benchmarks.yaml`. adding this additional pathway so `benchmark.yaml` is downloaded in the bulk model download

* update files filter

* fix tests

---------

Co-authored-by: dbogunowicz <damian@neuralmagic.com>

* [BugFix] Add analyze to init (#421)

* Add analyze to init

* Move onnxruntime to deps

* Print model analysis (#423)

* [model.download] fix function returning nothing (#420)

* [BugFix] Path not expanded (#418)

* print model-analysis

* [Fix] Allow for processing Path in the sparsezoo analysis (#417)

* add print statement at the end of cli run

---------

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com>

* Omit scalar weight (#424)

* ommit scalar weights:

* remove unwanted files

* comment

* Update src/sparsezoo/utils/onnx/analysis.py

Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>

---------

Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>

---------

Co-authored-by: George <george@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com>
Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>

* update analyze help message for correctness (#432)

* initial commit (#430)

* [sparsezoo.analyze] Fix pathway such that it works for larger models (#437)

* fix analyze to work with larger models

* update for failing tests; add comments

* Update src/sparsezoo/utils/onnx/external_data.py

Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com>

---------

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.coom>
Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com>

* Delete hehe.py (#439)

* Download deployment dir for llms (#435)

* Download deployment dir for llms

* Use path instead of download

* only set save_as_external_data to true if the model originally had external data (#442)

* Add Channel Wise Quantization Support (#441)

* Chunk download (#429)

* chunk download, break down into 10

* lint

* threads download

* draft

* chunk download draft

* job based download and combining/deleteing chunks

* delete old code

* lint

* fix num jobs if file_size is less than the chunk size

* doc string and return types

* test

* lint

* fix type hints (#445)

* fix bug if the value is a dict (#447)

* [deepsparse.analyze] Fix v1 functionality to  work with llms (#451)

* fix equivalent changes made to analyze_v2 such that inference session works for llms; update wanrings to be debug printouts

* typo

* overwrite file (#450)

Co-authored-by: 21 <a21@21s-MacBook-Pro.local>

* Adds a `numpy_array_representer` to yaml (#454)

on runtime, to avoid serialization issues

* Avoid division by zero (#457)

Avoid log of zero

* op analysis total counts had double sparse counts (#461)

* Rename legacy analyze to analyze_v1 (#459)

* Fixing Quant % Calcuation (#462)

* initial fix

* style

* Include Sparsity in Size Calculation (#463)

* initial fix

* style

* incorporate sparsity into size calculation

* quality

* op analysis total counts had double sparse counts (#461)

* Fixing Quant % Calcuation (#462)

* initial fix

* style

* Include Sparsity in Size Calculation (#463)

* initial fix

* style

* incorporate sparsity into size calculation

* quality

* Revert "Merge branch 'main' into analyze_cherry_picks"

This reverts commit 509fa1a, reversing
changes made to 08f94c4.

---------

Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
Co-authored-by: dbogunowicz <damian@neuralmagic.com>
Co-authored-by: George <george@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.coom>
Co-authored-by: 21 <a21@21s-MacBook-Pro.local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants