Skip to content

Commit c69dd4b

Browse files
committed
Update on "[Executorch] parallelize op_choose_qparams"
When doing prefill for quantized kv cache, with large prefill length, parallelizing this op helps. Differential Revision: [D84962234](https://our.internmc.facebook.com/intern/diff/D84962234/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D84962234/)! [ghstack-poisoned]
2 parents ac3c427 + 08ce624 commit c69dd4b

File tree

2 files changed

+7
-0
lines changed

2 files changed

+7
-0
lines changed

extension/llm/custom_ops/TARGETS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,5 +60,6 @@ runtime.python_test(
6060
],
6161
deps = [
6262
"//caffe2:torch",
63+
"//executorch/extension/pybindings:portable_lib",
6364
],
6465
)

extension/llm/custom_ops/test_quantized_sdpa.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
import torch.nn.functional as F
1313

1414
from executorch.extension.llm.custom_ops import custom_ops # noqa
15+
from executorch.extension.pybindings.portable_lib import _unsafe_reset_threadpool
1516

1617

1718
def is_fbcode():
@@ -40,6 +41,11 @@ def setUp(self):
4041
self.q_shape = None
4142
self.kv_shape = None
4243
self.is_seq_at_dim_2 = True
44+
# For some reason 4 threads doesnt work
45+
# This setting is needed to make this test not flaky due to OMP
46+
# error of "OMP: Error #131: Thread identifier invalid"
47+
# Not clear why that happens but having smaller threadpool resolves it
48+
_unsafe_reset_threadpool(3)
4349

4450
def _scale_tensor(self, tensor, min_value, max_value, scale=True):
4551
normalized_tensor = (tensor - tensor.min()) / (tensor.max() - tensor.min())

0 commit comments

Comments
 (0)