Skip to content

Conversation

Ninja91
Copy link
Contributor

@Ninja91 Ninja91 commented Aug 7, 2025

Summary:
This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

Key Changes

1. New Quantization Configuration Function

  • Add get_16a8w_quantization_config() in fbcode/executorch/backends/arm/quantizer/arm_quantizer.py
  • Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
  • Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
  • Technically supported by TOSA through EXT-INT16 extension/profile

2. Test Implementation

  • Add test_linear_16a8w_tosa_INT() test in fbcode/executorch/backends/arm/test/ops/test_linear.py
  • Demonstrates usage of new 16A8W quantization configuration

Benefits

  • Better Precision: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.
  • Configurable: Supports same parameters as existing quantization configurations

Testing

The implementation provides the utility function and test infrastructure. Note: The test reveals that TOSA backend has limited INT16 support for some operations (view operations only support INT8/INT32/FP32/BOOL), which is expected and shows the configuration correctly produces INT16 tensors.

Differential Revision: D79763381

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218

@Ninja91 Ninja91 requested a review from digantdesai as a code owner August 7, 2025 06:08
Copy link

pytorch-bot bot commented Aug 7, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13175

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures, 3 Unrelated Failures

As of commit 247e13a with merge base b427bd7 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 7, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79763381

Copy link

github-actions bot commented Aug 7, 2025

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79763381

Ninja91 added a commit to Ninja91/executorch that referenced this pull request Aug 7, 2025
…#13175)

Summary:
Pull Request resolved: pytorch#13175

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

**2. Test Implementation**
- Add `test_linear_16a8w_tosa_INT()` test in `fbcode/executorch/backends/arm/test/ops/test_linear.py`
- Demonstrates usage of new 16A8W quantization configuration

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.
- **Configurable**: Supports same parameters as existing quantization configurations

## Testing
The implementation provides the utility function and test infrastructure. Note: The test reveals that TOSA backend has limited INT16 support for some operations (view operations only support INT8/INT32/FP32/BOOL), which is expected and shows the configuration correctly produces INT16 tensors.

Differential Revision: D79763381
@digantdesai digantdesai added ciflow/trunk module: arm Issues related to arm backend labels Aug 7, 2025
@mergennachin mergennachin requested review from gggekov, per and robell August 13, 2025 15:50
@mergennachin mergennachin added the partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm label Aug 13, 2025
Ninja91 added a commit to Ninja91/executorch that referenced this pull request Aug 15, 2025
…#13175)

Summary:

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Differential Revision: D79763381
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79763381

Ninja91 added a commit to Ninja91/executorch that referenced this pull request Aug 15, 2025
…#13175)

Summary:

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Differential Revision: D79763381
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79763381

Ninja91 added a commit to Ninja91/executorch that referenced this pull request Aug 15, 2025
…#13175)

Summary:

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Differential Revision: D79763381
Ninja91 added a commit to Ninja91/executorch that referenced this pull request Aug 15, 2025
…#13175)

Summary:

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Differential Revision: D79763381
Ninja91 added a commit to Ninja91/executorch that referenced this pull request Aug 15, 2025
…#13175)

Summary:

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Differential Revision: D79763381
Ninja91 pushed a commit to Ninja91/executorch that referenced this pull request Aug 15, 2025
…#13175)

Summary:
Pull Request resolved: pytorch#13175

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Differential Revision: D79763381
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79763381

Ninja91 added a commit to Ninja91/executorch that referenced this pull request Aug 15, 2025
…#13175)

Summary:
Pull Request resolved: pytorch#13175

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Differential Revision: D79763381
Ninja91 pushed a commit to Ninja91/executorch that referenced this pull request Aug 20, 2025
…#13175)

Summary:
Pull Request resolved: pytorch#13175

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Differential Revision: D79763381
Ninja91 added a commit to Ninja91/executorch that referenced this pull request Aug 20, 2025
…#13175)

Summary:

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_symmetric_a16w8_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Reviewed By: 3l1

Differential Revision: D79763381
Ninja91 added a commit to Ninja91/executorch that referenced this pull request Aug 20, 2025
…#13175)

Summary:

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_symmetric_a16w8_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Reviewed By: 3l1

Differential Revision: D79763381
…#13175)

Summary:
Pull Request resolved: pytorch#13175

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_symmetric_a16w8_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Reviewed By: 3l1

Differential Revision: D79763381
Ninja91 pushed a commit to Ninja91/executorch that referenced this pull request Aug 20, 2025
…#13175)

Summary:
Pull Request resolved: pytorch#13175

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Differential Revision: D79763381
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79763381

facebook-github-bot pushed a commit that referenced this pull request Aug 21, 2025
Summary:

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_symmetric_a16w8_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Reviewed By: 3l1

Differential Revision: D79763381
facebook-github-bot pushed a commit that referenced this pull request Aug 21, 2025
Summary:

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_symmetric_a16w8_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Reviewed By: 3l1

Differential Revision: D79763381
Ninja91 pushed a commit to Ninja91/executorch that referenced this pull request Aug 21, 2025
…#13175)

Summary:
Pull Request resolved: pytorch#13175

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Differential Revision: D79763381
@digantdesai
Copy link
Contributor

seems like you have PRs which include same changes from the older PR. Do you want to clean that up or drop this PR in favor of #13448?

@Ninja91
Copy link
Contributor Author

Ninja91 commented Aug 22, 2025

@digantdesai , @mergennachin requested review from @robell , @gggekov . Can any of you please review this PR?

@Ninja91
Copy link
Contributor Author

Ninja91 commented Aug 23, 2025

Closing this PR as this commit is covered in #13448

@Ninja91 Ninja91 closed this Aug 23, 2025
Ninja91 added a commit to Ninja91/executorch that referenced this pull request Aug 23, 2025
…#13175)

Summary:

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_symmetric_a16w8_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Reviewed By: 3l1

Differential Revision: D79763381
Ninja91 added a commit to Ninja91/executorch that referenced this pull request Aug 23, 2025
…#13175)

Summary:

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_symmetric_a16w8_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Reviewed By: 3l1

Differential Revision: D79763381
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79763381

Ninja91 added a commit to Ninja91/executorch that referenced this pull request Aug 23, 2025
…#13175)

Summary:
Pull Request resolved: pytorch#13175

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_symmetric_a16w8_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Reviewed By: 3l1

Differential Revision: D79763381
Ninja91 added a commit to Ninja91/executorch that referenced this pull request Aug 23, 2025
…#13175)

Summary:

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_symmetric_a16w8_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Reviewed By: 3l1

Differential Revision: D79763381
Ninja91 pushed a commit to Ninja91/executorch that referenced this pull request Aug 23, 2025
…#13175)

Summary:
Pull Request resolved: pytorch#13175

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Differential Revision: D79763381
Ninja91 pushed a commit to Ninja91/executorch that referenced this pull request Aug 23, 2025
…#13175)

Summary:
Pull Request resolved: pytorch#13175

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Differential Revision: D79763381
Ninja91 added a commit to Ninja91/executorch that referenced this pull request Aug 25, 2025
…#13175)

Summary:

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_symmetric_a16w8_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Reviewed By: 3l1

Differential Revision: D79763381
Ninja91 pushed a commit to Ninja91/executorch that referenced this pull request Aug 25, 2025
…#13175)

Summary:
Pull Request resolved: pytorch#13175

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Differential Revision: D79763381
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79763381

Ninja91 pushed a commit to Ninja91/executorch that referenced this pull request Aug 25, 2025
…#13175)

Summary:
Pull Request resolved: pytorch#13175

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Differential Revision: D79763381
Ninja91 added a commit to Ninja91/executorch that referenced this pull request Aug 25, 2025
…#13175)

Summary:


This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_symmetric_a16w8_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Reviewed By: 3l1

Differential Revision: D79763381
Ninja91 pushed a commit to Ninja91/executorch that referenced this pull request Aug 25, 2025
…#13175)

Summary:
Pull Request resolved: pytorch#13175

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Differential Revision: D79763381
Ninja91 pushed a commit to Ninja91/executorch that referenced this pull request Aug 25, 2025
…#13175)

Summary:
Pull Request resolved: pytorch#13175

This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.

## Key Changes

**1. New Quantization Configuration Function**
- Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**

## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.

Differential Revision: D79763381
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported module: arm Issues related to arm backend partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants