Skip to content

Commit 9af5ece

Browse files
committed
Update on "Refactor custom SDPA op to separate kv cache update from the custom sdpa op"
Differential Revision: [D62301837](https://our.internmc.facebook.com/intern/diff/D62301837/) [ghstack-poisoned]
2 parents 7fdfd95 + 6f48610 commit 9af5ece

File tree

49 files changed

+1278
-1266
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+1278
-1266
lines changed

.ci/scripts/test_llama.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -188,7 +188,7 @@ EXPORTED_MODEL_NAME="${EXPORTED_MODEL_NAME}.pte"
188188
echo "Exporting ${EXPORTED_MODEL_NAME}"
189189
EXPORT_ARGS="-c ${CHECKPOINT_FILE_NAME} -p ${PARAMS} -d ${DTYPE} -n ${EXPORTED_MODEL_NAME} -kv"
190190
if [[ "${XNNPACK}" == "ON" ]]; then
191-
EXPORT_ARGS="${EXPORT_ARGS} -X -qmode 8da4w -G 128"
191+
EXPORT_ARGS="${EXPORT_ARGS} -X --xnnpack-extended-ops -qmode 8da4w -G 128"
192192
fi
193193
if [[ "${CUSTOM}" == "ON" ]]; then
194194
EXPORT_ARGS="${EXPORT_ARGS} --use_sdpa_with_kv_cache"

backends/cadence/aot/compiler.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,6 @@
3030
)
3131
from executorch.backends.transforms.remove_clone_ops import RemoveCloneOpsTransform
3232
from executorch.exir import EdgeCompileConfig, EdgeProgramManager, to_edge
33-
from torch._export import capture_pre_autograd_graph
3433
from torch.ao.quantization.pt2e.export_utils import model_is_exported
3534
from torch.ao.quantization.quantize_pt2e import convert_pt2e, prepare_pt2e
3635

@@ -58,7 +57,7 @@ def convert_pt2(
5857
"""
5958

6059
# Export with dynamo
61-
model_gm = capture_pre_autograd_graph(model, inputs)
60+
model_gm = torch.export.export_for_training(model, inputs).module()
6261

6362
if model_gm_has_SDPA(model_gm): # pyre-fixme[6]
6463
# Decompose SDPA
File renamed without changes.

backends/mediatek/scripts/README.md

Lines changed: 34 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -10,41 +10,60 @@ Before you begin, ensure you have the following prerequisites installed and conf
1010

1111
- **Download Buck2**: Obtain Buck2 from the official [releases page](https://github.com/facebook/buck2/releases/tag/2024-02-01).
1212
- **Add to PATH**: Extract the downloaded file and add the directory to your system's `$PATH` environment variable.
13-
```bash
14-
export PATH=<path_to_buck>:$PATH
15-
```
13+
```bash
14+
export PATH=<path_to_buck>:$PATH
15+
```
1616

1717
### 2. Android NDK
1818

1919
- **Download Android NDK**: Acquire the Android NDK from the [Android developer site](https://developer.android.com/ndk/downloads).
2020
- **Set NDK Path**: Ensure that the `$ANDROID_NDK` environment variable is set to the path where the NDK is located.
21-
```bash
22-
export ANDROID_NDK=<path_to_android_ndk>
23-
```
21+
```bash
22+
export ANDROID_NDK=<path_to_android_ndk>
23+
```
2424

2525
### 3. MediaTek ExercuTorch Libraries
2626

27-
Download the following libraries from MediaTek's NeuroPilot portal (link to be added):
27+
Download [NeuroPilot Express SDK](https://neuropilot.mediatek.com/resources/public/npexpress/en/docs/npexpress) from MediaTek's NeuroPilot portal:
2828

2929
- `libneuronusdk_adapter.mtk.so`: This universal SDK contains the implementation required for executing target-dependent code on the MediaTek chip.
3030
- `libneuron_buffer_allocator.so`: This utility library is designed for allocating DMA buffers necessary for model inference.
31-
```bash
32-
export NEURON_BUFFER_ALLOCATOR_LIB=<path_to_buffer_allocator>
33-
```
31+
- `mtk_converter-8.8.0.dev20240723+public.d1467db9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl`: This library preprocess the model into a MediaTek representation.
32+
- `mtk_neuron-8.2.2-py3-none-linux_x86_64.whl`: This library converts the model to binaries.
3433

3534
## Setup
3635

37-
Follow the steps below to set up your build environment:
36+
Follow the steps below to setup your build environment:
37+
38+
1. **Setup ExercuTorch Environment**: Refer to the [Setting up ExercuTorch](https://pytorch.org/executorch/stable/getting-started-setup) guide for detailed instructions on setting up the ExercuTorch environment.
39+
40+
2. **Setup MediaTek Backend Environment**
41+
- Install the dependent libs. Ensure that you are inside backends/mediatek/ directory
42+
```bash
43+
pip3 install -r requirements.txt
44+
```
45+
- Install the two .whl downloaded from NeuroPilot Portal
46+
```bash
47+
pip3 install mtk_neuron-8.2.2-py3-none-linux_x86_64.whl
48+
pip3 install mtk_converter-8.8.0.dev20240723+public.d1467db9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
49+
```
50+
- Set evironment variables for building backend
51+
```bash
52+
export NEURON_BUFFER_ALLOCATOR_LIB=<path_to_buffer_allocator>
53+
```
3854

39-
1. **ExercuTorch Official Tutorial**: Refer to the [Setting up ExercuTorch](https://pytorch.org/executorch/stable/getting-started-setup) guide for detailed instructions on setting up the ExercuTorch environment.
55+
## Build
4056

41-
2. **Build Script**: Once the prerequisites are in place, run the `mtk_build.sh` script to start the build process.
57+
1. **Build MediaTek Backend**: Once the prerequisites are in place, run the `mtk_build.sh` script to start the build process, MediaTek backend will be built under `cmake-android-out/backends/` as `libneuron_backend.so`
4258

4359
```bash
4460
./mtk_build.sh
4561
```
46-
3. **Push MediaTek universal SDK to the device**: push libneuronusdk_adapter.mtk.so to the phone and export it to the `$LD_LIBRARY_PATH` environment variable before executing ExercuTorch with MediaTek backend.
62+
63+
## Run
64+
65+
1. **Push MediaTek universal SDK and MediaTek backend to the device**: push `libneuronusdk_adapter.mtk.so` and `libneuron_backend.so` to the phone and export it to the `$LD_LIBRARY_PATH` environment variable before executing ExercuTorch with MediaTek backend.
4766

4867
```bash
49-
export LD_LIBRARY_PATH=<path_to_usdk>:$LD_LIBRARY_PATH
68+
export LD_LIBRARY_PATH=<path_to_usdk>:<path_to_neuron_backend>:$LD_LIBRARY_PATH
5069
```

backends/qualcomm/README.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,3 +73,67 @@ examples/qualcomm
7373
Please see this [README.md](../../examples/qualcomm/README.md).
7474

7575
Further, an example build script is provided as [build.sh](scripts/build.sh).
76+
77+
## Issues
78+
If you want to address the problem encountered, it would be great to have reproduction information for indicating maintainers. Please also follow the [policy](../../CONTRIBUTING.md#issues) to emit issues.
79+
80+
## Pull Requests
81+
PRs are always welcome to help improve the codebase in a comprehensive manner. Before submitting changes, please apply:
82+
83+
- **Check the Coding Style**:<br/>
84+
Make sure your code follows the [style guides](../../CONTRIBUTING.md#coding-style) and passes the [lint checks](../../CONTRIBUTING.md#lintrunner).
85+
86+
- **Add Unit Tests**:<br/>
87+
Following is an example of adding test case after [creating new operator builder](builders/README.md), please navigate to `backends/qualcomm/tests` folder and put minimum example module in `model.py`. e.g.:
88+
```python
89+
class IndexPut(torch.nn.Module):
90+
...
91+
92+
# please insert implementation in alphabetical order
93+
class LayerNorm(torch.nn.Module):
94+
def __init__(self):
95+
super().__init__()
96+
self.layer_norm = torch.nn.LayerNorm([768], eps=1e-6)
97+
98+
def forward(self, x):
99+
return self.layer_norm(x)
100+
101+
102+
class LeakyReLUDefault(torch.nn.Module):
103+
...
104+
```
105+
Also extend sections `TestQNNFloatingPointOperator`, `TestQNNQuantizedOperator` in `test_qnn_delegate.py`. e.g.:
106+
```python
107+
class TestQNNQuantizedOperator(TestQNN):
108+
def test_qnn_backend_interpolate_nearest_2d(self):
109+
...
110+
111+
# please insert it implementation alphabetical order
112+
def test_qnn_backend_layer_norm(self):
113+
module = LayerNorm() # noqa: F405
114+
sample_input = (torch.randn(196, 768),)
115+
module = self.get_qdq_module(module, sample_input)
116+
self.lower_module_and_test_output(module, sample_input)
117+
118+
def test_qnn_backend_leaky_relu(self):
119+
...
120+
```
121+
122+
- **Verify Unit Test Results**:<br/>
123+
```bash
124+
cd $PATH_TO_EXECUTORCH
125+
# example usage of performing unit test
126+
python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedOperator.test_qnn_backend_layer_norm -s $DEVICE_SERIAL -m SM8650 -b build-android/ -a $PATH_TO_TEST_ARTIFACTS
127+
```
128+
The test graph is expected to have 1 delegated node with only placeholders / output nodes being left. Check the execution report for more information.
129+
130+
- **Code Reviews**:<br/>
131+
Please ping authors in Qualcomm AI Engine Direct related PRs for reviewing, possible candidates are listed below:
132+
- [chiwwang](https://github.com/chiwwang)
133+
- [shewu-quic](https://github.com/shewu-quic)
134+
- [chunit-quic](https://github.com/chunit-quic)
135+
- [winskuo-quic](https://github.com/winskuo-quic)
136+
- [chuntl](https://github.com/chuntl)
137+
- [haowhsu-quic](https://github.com/haowhsu-quic)
138+
139+
Thanks again for your contribution!

0 commit comments

Comments
 (0)