-
Notifications
You must be signed in to change notification settings - Fork 643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apple's ANE Optimised MultiHeadAttention
Export Fails With Flexible Input Shape
#1763
Comments
MultiHeadAttention
Export Fails With Flexible Inputs
MultiHeadAttention
Export Fails With Flexible InputsMultiHeadAttention
Export Fails With Flexible Input Shape
@rsomani95 - when trying to run your code, the following line: layer = MaskedMultiHeadAttentionANE(512, 8).eval() Gives the following error:
Please update your code. |
@TobyRoseman apologies, I overlooked removing that line when I edited the issue to make it more concise. |
@TobyRoseman are you able to run the code now? |
Yes, I can now reproduce the issue (after locally cloning and using the ml-ane-transformers repository). Thanks for updating the code. The issue here is that |
@TobyRoseman understood, thank you. |
It's unclear how much work will be necessary here. As a starting point you could try just removing that assertions and seeing if it breaks anywhere else. |
Commenting out the assert allows the conversion to finish, but where fixed size uses the ANE, flexible shape does not. |
Instead of using |
Does Does it seem feasible to use the enumerated shapes path? |
Enumerated shapes are limited to 128. So it won't be possible to use that if you need to support 50k shapes. |
I haven't checked output correctness yet, but commenting out the three assert statements does allow it to export. What's odd though, is that the XCode benchmark suggests that all the models run on ANE, even the flexible shape model. Could this be happening because the default input size is 1 and the benchmark only runs on the default size? |
Right, the default size model is expected to utilize the ANE, even with flexible shapes, but if you run the model with a different shape, it would likely only run on CPU/GPU. |
Filed internal feedback assistant report at FB12038137 fwiw |
Hello, author of ml-ane-transformers here 👋 Great to hear that you are attempting to extend the reference implementation's capabilities with flexible shapes. Some notes:
|
Hi! Thank you so much @atiorh - its awesome to have your feedback and input. Over the last few days there has been a lot of interest from the larger WhisperCPP community who is also tackling the same problem, and we are combining our efforts taking place here: ggerganov/whisper.cpp#548 I hope folks don't mind, but im pinging @wangchou @RobertRiachi each of whom have been tackling this independently. No attempt to dog - pile, mostly just getting super smart folks on same page to make informed decisions. Thanks again everyone. Really exciting to be on the cusp of some really fast on device inference. |
One quick question for @atiorh and friends at Apple - since it's not super clear to me, in theory, the ANE supports |
Yes, thats right, the static shaped model should already be ANE resident, and if you update such a model with enumerated shapes it should run on the ANE. |
Thank you very much for the clarifications @aseemw! |
My understanding is that (please correct me if I am mistaken):
|
FWIW I have the exact same issue (fails on same line) when trying to plug the translated DistilBERT model from I am guessing from the info above it's because HF exporter wraps the model in an extra module layer that has a bunch of conditional logic for models of different types, so the input shape becomes 'flexible' |
...after a bit more digging, the culprit in HF exporters is not the Wrapper module it's this bit https://github.com/huggingface/exporters/blob/main/src/exporters/coreml/convert.py#L541 -> i.e. for some models (e.g. the one I was trying above, for sequence classification) it will use a which is because of this definition: https://github.com/huggingface/exporters/blob/main/src/exporters/coreml/config.py#L247 sequence_length=(1, self.maxSequenceLength), So I am thinking the solution for me in this case is to hack HF exporters code to use a fixed sequence length with padding, instead of flexible Also... I assume the only reason that HF exporters isn't more generally affected by this issue is most of the HF models don't use |
I found rewrite def _attention_fn(self, q, k, v, qk_mask, k_mask, return_weights):
...
attn_weights = [aw.softmax(dim=1) for aw in attn_weights
] # n_head * (batch_size, src_seq_len, 1, tgt_seq_len)
mh_w = [self.dropout(aw) for aw in attn_weights
] # n_head * (batch_size, src_seq_len, 1, tgt_seq_len)
mh_w = [wi.reshape(wi.shape[1], wi.shape[3]) for wi in mh_w]
mh_v = [vi.reshape(vi.shape[1], vi.shape[3]) for vi in mh_v]
attn = [
torch.einsum('kq,ck->cq', wi, vi)
for wi, vi in zip(mh_w, mh_v)
] # n_head * (batch_size, d_v/n_head, 1, tgt_seq_len)
attn = [
a.reshape(1, a.shape[0], 1, a.shape[1]) for a in attn
]
attn = torch.cat(attn, dim=1) # (batch_size, d_v, 1, tgt_seq_len)
if return_weights:
return attn, attn_weights
return attn, None Here is my test code. import torch # 1.13.1
import numpy as np # 1.22.3
import coremltools as ct # 6.2
from ane_transformers.reference.multihead_attention import MultiHeadAttention
N = 10
x = torch.rand(1, 512, 1, N)
with torch.no_grad():
layer = MultiHeadAttention(512, n_head=8, dropout=0.0).eval()
jit = torch.jit.trace(layer, (x, x, x))
# Flexible input shape
flexible_shape = ct.Shape(shape = (1, 512, 1, ct.RangeDim(1, 448)))
mlmod_flexible_shape = ct.convert(
jit,
inputs = [
ct.TensorType("q", flexible_shape),
ct.TensorType("k", flexible_shape),
ct.TensorType("v", flexible_shape),
]
)
out = layer(x, x, x)
out_dict = mlmod_flexible_shape.predict({'q': x.detach().numpy().astype(np.float32),
'k': x.detach().numpy().astype(np.float32),
'v': x.detach().numpy().astype(np.float32)})
np.allclose(out[0], out_dict['var_451'], rtol=0.001, atol=0.001) # OK I think coremltools MIL einsum has something wrong with flexible shape |
I found more simple code to reproduce this issue. import torch # 1.13.1
import numpy as np # 1.22.3
import coremltools as ct # 6.2
import torch.nn as nn
class Net(nn.Module):
def __init__(self, embed_dim):
super().__init__()
self.d_qk = embed_dim
self.d_v = embed_dim
self.d_out = embed_dim
self.n_head = 8
self.k_proj = nn.Conv2d(embed_dim, self.d_qk, 1)
def forward(self, q, k, v):
k = self.k_proj(k)
mh_q = q.split(
self.d_qk // self.n_head,
dim=1) # n_head * (batch_size, d_qk/n_head, 1, tgt_seq_len)
mh_k = k.transpose(1, 3).split(
self.d_qk // self.n_head,
dim=3) # n_head * (batch_size, src_seq_len, 1, d_qk/n_head)
mh_v = v.split(
self.d_v // self.n_head,
dim=1) # n_head * (batch_size, d_v/n_head, 1, src_seq_len)
mh_w = [
torch.einsum('bchq,bkhc->bkhq', [qi, ki]) for qi, ki in zip(mh_q, mh_k)
] # n_head * (batch_size, src_seq_len, 1, tgt_seq_len)
attn = [
torch.einsum('bkhq,bchk->bchq', wi, vi) for wi, vi in zip(mh_w, mh_v)
] # n_head * (batch_size, d_v/n_head, 1, tgt_seq_len)
return attn
N = 10
x = torch.rand(1, 512, 1, N)
with torch.no_grad():
layer = Net(512).eval()
jit = torch.jit.trace(layer, (x, x, x))
# Flexible input shape - fails
flexible_shape = ct.Shape(shape=(1, 512, 1, ct.RangeDim(1, 448)))
mlmod_flexible_shape = ct.convert(
jit,
inputs=[
ct.TensorType("q", flexible_shape),
ct.TensorType("k", flexible_shape),
ct.TensorType("v", flexible_shape),
]
)
out = layer(x, x, x)
out_dict = mlmod_flexible_shape.predict({'q': x.detach().numpy().astype(np.float32),
'k': x.detach().numpy().astype(np.float32),
'v': x.detach().numpy().astype(np.float32)})
np.allclose(out[0], out_dict['var_195'], rtol=0.001, atol=0.001) In strange, I also embedded print debug code as with k = self.k_proj(k)
without k = self.k_proj(k)
For some reason, |
🐞Bug Description
I'm trying to export Apple's ANE optimised
MultiHeadAttention
layer defined hereThe layer exports successfully with a fixed shape, but fails with flexible shapes.
I'm using this layer in a custom sequence model, so flexible shapes are imperative.
The error thrown is an
AssertionError: input shapes incompatible
.Stack Trace
To Reproduce
System environment (please complete the following information):
6.2
1.13.1
1.22.3
macOS 13.0, MacBook Pro 16-inch, 2021
Additional context
I'm quite certain that the shape error is happening as part of an
einsum
operation in the layer definition. While debugging, I printed out the equations and shapes of all einsum ops being converted (I did this by adding two print statements right below these lines). It appears that the error happens in one of the later einsum ops and not right away.Perhaps this issue is tangentially related: #1754
The text was updated successfully, but these errors were encountered: