You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use weight cache for quantized tensor scale data (#14448)
Summary:
When enabling the XNNPACK weight cache and running a model with qb4 or qc8-quantized linear weights, it triggers an assertion that is intended to make sure all data is in the weight cache. This can be reproduced by running the XNNPACK backend linear op tests with weight cache enabled.
The root cause appears to be that tensor scale data was bypassing the weight cache - likely an oversight in initial implementation. This isn't a correctness issue, but does cause the aforementioned assert to fail and uses marginally more memory than it otherwise needs to.
This PR updates the XNNPACK compileModel call to use the weight cache for scale data (instead of putting it in the unpacked_buffers list). With this change, the linear op tests pass with weight cache enabled.
Test Plan:
```
buck test -c executorch.xnnpack_weights_cache=1 fbcode//executorch/backends/xnnpack/test:test_xnnpack_ops -- linear
```
Reviewed By: digantdesai
Differential Revision: D82862629
Pulled By: GregoryComer
0 commit comments