More models #2018

jainapurva · 2025-04-04T16:44:52Z

No description provided.

Differential Revision: D71503133 Pull Request resolved: #1991

Differential Revision: D72179480 Pull Request resolved: #1998

Differential Revision: D71370592 Pull Request resolved: #1994

stack-info: PR: #1999, branch: drisspg/stack/45

Summary: fixing CI before branch cut Test Plan: python test/quantization/test_galore_quant.py and CI Reviewers: Subscribers: Tasks: Tags:

* up * up

Differential Revision: D71370597 Pull Request resolved: #2004

Differential Revision: D71370604 Pull Request resolved: #2006

…ention Differential Revision: D71370603 Pull Request resolved: #2008

Differential Revision: D71370598 Pull Request resolved: #2010

Differential Revision: D71370602 Pull Request resolved: #2011

Add KleidiAI gemm kernels (#2000) Summary: This PR pulls in two new KleidiAI kernels: * kai_matmul_clamp_f32_qai8dxp1x4_qsi4c32p8x4_1x8_neon_dotprod (GEMV) * kai_matmul_clamp_f32_qai8dxp4x4_qsi4c32p8x4_4x8_neon_dotprod (GEMM) and adds them for automatic mr-based kernel selection when TORCHAO_ENABLE_ARM_NEON_DOT is set. It also adds new tests for these kernels, and refactors the kleidiai testing code so that in future new kleidiai kernels can be tested with a one line addition: ``` TEST( test_linear_8bit_act_xbit_weight, matmul_clamp_f32_qai8dxp1x8_qsi4c32p8x8_1x8x32_neon_dotprod) { test_linear_8bit_act_xbit_weight_kleidiai< matmul_clamp_f32_qai8dxp1x8_qsi4c32p8x8_1x8x32_neon_dotprod>(); } ``` The exisitng testing code (still exists for more coverage) depended on code generation. Reviewed By: Jack-Khuu Differential Revision: D72179835

…2013)

remove float8nocompmile CI since it's flaky on sm89

**Summary:** Previously, `Int8DynActInt4QATQuantizer` had slightly diverging numerics between the prepare and convert steps. This is because the prepare step uses quantization primitives shared with AQT (specifically `quantize_affine` and `dequantize_affine`), while the convert step relies on old ops from the `torch.ops.quantized_decomposed` namespace. The diverging numerics is negligible for small models, but the quantization errors begin to compound for larger models with many linear layers. More specifically, there are three different places where the divergence occurs during activation quantization: 1. **Choose qparams.** The prepare step casts the qparams to `torch.float32`, whereas the convert step casts the scales to `torch.float64` and zero points to `torch.int64`. 2. **Quantize.** The prepare step performs round before adding zero points and uses torch functions, while the convert step adds before rounding and uses torch tensor methods. ``` x = torch.clamp( torch.round(x * (1.0 / scale)) + zero_point, qmin, qmax, ) x = ( x.mul(1.0 / scale) .add(zero_point) .round() .clamp(qmin, qmax) .to(quantize_dtype) ) ``` 3. **Dequantize.** The prepare step casts to `torch.int32` before adding the zero points, and casts back to the original dtype before multiplying the scale. The convert step only casts at the very end. ``` x = x.to(torch.int32) - zero_point.to(torch.int32) x = x.to(orig_dtype) x = x * scale x = x - zero_point x = x * scale x = x.to(orig_dtype) ``` This commit makes the convert path use the same torchao quantization primitives as the prepare path, thereby resolving the 3 above differences. Now, the prepare and convert steps match exactly in terms of numerics over many trials. **Test Plan:** python test/quantization/test_qat.py -k test_fake_quantize_per_token_vs_convert python test/quantization/test_qat.py -k test_qat_8da4w_prepare_vs_convert

[ghstack-poisoned]

jainapurva · 2025-04-04T16:44:53Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2025-04-04T16:44:55Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2018

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures

As of commit faf3c0f with merge base 6922733 ():

NEW FAILURES - The following jobs have failed:

Code Analysis with Ruff / build (3.9) (gh)
PR Label Check / Check PR Labels (gh)
Process completed with exit code 1.
Run TorchAO Experimental Tests / test-mps-ops (macos-m1-stable) (gh)
test_accuracy_6

This comment was automatically generated by Dr. CI and updates every 15 minutes.

metascroy and others added 20 commits April 1, 2025 13:43

Reintroduce has_weight_zeros as a template param

5e4d50c

Differential Revision: D71503133 Pull Request resolved: #1991

Claen up op interface

c9b1490

Differential Revision: D72179480 Pull Request resolved: #1998

quantized matmul

8776dd3

Differential Revision: D71370592 Pull Request resolved: #1994

Allow builds on less than sm75 raise runtime failure (#1999)

e4eff3a

stack-info: PR: #1999, branch: drisspg/stack/45

Skip galore test if not cuda (#2003)

9a9ecde

Summary: fixing CI before branch cut Test Plan: python test/quantization/test_galore_quant.py and CI Reviewers: Subscribers: Tasks: Tags:

Fix experimental CI (#2005)

e2369d3

* up * up

Add fp32xint8 matmul

b49f23c

Differential Revision: D71370597 Pull Request resolved: #2004

Add quantized q @ k test for intented used in quantized attention

8e8472c

Differential Revision: D71370604 Pull Request resolved: #2006

Update version.txt (#2009)

e52867a

Initial prototype of differentiable _scaled_grouped_mm function (#1969)

620356d

Add quantized attn_scores @ v test for intented used in quantized att…

6987576

…ention Differential Revision: D71370603 Pull Request resolved: #2008

add fallback kernel and interface

97d6d74

Differential Revision: D71370598 Pull Request resolved: #2010

Add fallback kernel and interface for rhs only quantized matmul

83d58e3

Differential Revision: D71370602 Pull Request resolved: #2011

Update float8nocompile test code to use new float8 matmul function (#…

0231a68

…2013)

Remove float8nocompile CI (#1976)

b375781

remove float8nocompmile CI since it's flaky on sm89

Update clean_release_notes.py (#2014)

66d6a64

Update

d9e267b

[ghstack-poisoned]

Update

faf3c0f

[ghstack-poisoned]

jainapurva mentioned this pull request Apr 4, 2025

Add GPU profiler #1997

Draft

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More models #2018

More models #2018

jainapurva commented Apr 4, 2025 •

edited

Loading

jainapurva commented Apr 4, 2025 •

edited

Loading

pytorch-bot bot commented Apr 4, 2025 •

edited

Loading

More models #2018

Are you sure you want to change the base?

More models #2018

Conversation

jainapurva commented Apr 4, 2025 • edited Loading

jainapurva commented Apr 4, 2025 • edited Loading

pytorch-bot bot commented Apr 4, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2018

❌ 3 New Failures

jainapurva commented Apr 4, 2025 •

edited

Loading

jainapurva commented Apr 4, 2025 •

edited

Loading

pytorch-bot bot commented Apr 4, 2025 •

edited

Loading