Apple's ANE Optimised `MultiHeadAttention` Export Fails With Flexible Input Shape #1763

rsomani95 · 2023-02-09T14:45:15Z

🐞Bug Description

I'm trying to export Apple's ANE optimised MultiHeadAttention layer defined here

The layer exports successfully with a fixed shape, but fails with flexible shapes.
I'm using this layer in a custom sequence model, so flexible shapes are imperative.

The error thrown is an AssertionError: input shapes incompatible.

Stack Trace

Tuple detected at graph output. This will be flattened in the converted model.
Converting PyTorch Frontend ==> MIL Ops:  73%|▋| 129/177 [00:00<00:00, 8531.06 o

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[28], line 2
      1 flexible_shape = ct.Shape(shape = (1, 512, 1, ct.RangeDim(1, 448)))
----> 2 mlmod_flexible_shape = ct.convert(
      3     jit,
      4     inputs = [
      5         ct.TensorType("q", flexible_shape),
      6         ct.TensorType("k", flexible_shape),
      7         ct.TensorType("v", flexible_shape),
      8     ]
      9 )

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/_converters_entry.py:444, in convert(model, source, inputs, outputs, classifier_config, minimum_deployment_target, convert_to, compute_precision, skip_model_load, compute_units, package_dir, debug)
    441 if specification_version is None:
    442     specification_version = _set_default_specification_version(exact_target)
--> 444 mlmodel = mil_convert(
    445     model,
    446     convert_from=exact_source,
    447     convert_to=exact_target,
    448     inputs=inputs,
    449     outputs=outputs_as_tensor_or_image_types, # None or list[ct.ImageType/ct.TensorType]
    450     classifier_config=classifier_config,
    451     transforms=tuple(transforms),
    452     skip_model_load=skip_model_load,
    453     compute_units=compute_units,
    454     package_dir=package_dir,
    455     debug=debug,
    456     specification_version=specification_version,
    457 )
    459 if exact_target == 'milinternal':
    460     return mlmodel # Returns the MIL program

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/converter.py:187, in mil_convert(model, convert_from, convert_to, compute_units, **kwargs)
    148 @_profile
    149 def mil_convert(
    150     model,
   (...)
    154     **kwargs
    155 ):
    156     """
    157     Convert model from a specified frontend `convert_from` to a specified
    158     converter backend `convert_to`.
   (...)
    185         See `coremltools.converters.convert`
    186     """
--> 187     return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/converter.py:211, in _mil_convert(model, convert_from, convert_to, registry, modelClass, compute_units, **kwargs)
    208     weights_dir = _tempfile.TemporaryDirectory()
    209     kwargs["weights_dir"] = weights_dir.name
--> 211 proto, mil_program = mil_convert_to_proto(
    212                         model,
    213                         convert_from,
    214                         convert_to,
    215                         registry,
    216                         **kwargs
    217                      )
    219 _reset_conversion_state()
    221 if convert_to == 'milinternal':

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/converter.py:281, in mil_convert_to_proto(model, convert_from, convert_to, converter_registry, **kwargs)
    278 kwargs.setdefault("convert_to", convert_to)
    279 frontend_converter = frontend_converter_type()
--> 281 prog = frontend_converter(model, **kwargs)
    283 if convert_to.lower() != "neuralnetwork":
    284     passes = kwargs.get("transforms", list())

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/converter.py:109, in TorchFrontend.__call__(self, *args, **kwargs)
    106 def __call__(self, *args, **kwargs):
    107     from .frontend.torch import load
--> 109     return load(*args, **kwargs)

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/load.py:57, in load(model_spec, inputs, specification_version, debug, outputs, cut_at_symbols, **kwargs)
     55 inputs = _convert_to_torch_inputtype(inputs)
     56 converter = TorchConverter(torchscript, inputs, outputs, cut_at_symbols, specification_version)
---> 57 return _perform_torch_convert(converter, debug)

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/load.py:96, in _perform_torch_convert(converter, debug)
     94 def _perform_torch_convert(converter, debug):
     95     try:
---> 96         prog = converter.convert()
     97     except RuntimeError as e:
     98         if debug and "convert function" in str(e):

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/converter.py:281, in TorchConverter.convert(self)
    278 self.convert_const()
    280 # Add the rest of the operations
--> 281 convert_nodes(self.context, self.graph)
    283 graph_outputs = [self.context[name] for name in self.graph.outputs]
    285 # An output can be None when it's a None constant, which happens
    286 # in Fairseq MT.

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/ops.py:89, in convert_nodes(context, graph)
     84     raise RuntimeError(
     85         "PyTorch convert function for op '{}' not implemented.".format(node.kind)
     86     )
     88 context.prepare_for_conversion(node)
---> 89 add_op(context, node)
     91 # We've generated all the outputs the graph needs, terminate conversion.
     92 if _all_outputs_present(context, graph):

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/ops.py:1120, in einsum(context, node)
   1118 b = context[node.inputs[1]][1]
   1119 equation = context[node.inputs[0]].val
-> 1120 x = build_einsum_mil(a, b, equation, node.name)
   1121 context.add(x)

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/frontend/_utils.py:164, in build_einsum_mil(a_var, b_var, equation, name)
    162         x = mb.einsum(values=(a_var, b_var), equation=equation, name=name)
    163     else:
--> 164         x = mb.einsum(values=(b_var, a_var), equation=equation_rev, name=name)
    165 elif vec_chw_whu_chu in [parsed_vectors, parsed_vectors_rev]:
    166     if parsed_vectors == vec_chw_whu_chu:

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/mil/ops/registry.py:176, in SSAOpRegistry.register_op.<locals>.class_wrapper.<locals>.add_op(cls, **kwargs)
    173 else:
    174     op_cls_to_add = op_reg[op_type]
--> 176 return cls._add_op(op_cls_to_add, **kwargs)

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/mil/builder.py:182, in Builder._add_op(cls, op_cls, **kwargs)
    180 curr_block()._insert_op_before(new_op, before_op=before_op)
    181 new_op.build_nested_blocks()
--> 182 new_op.type_value_inference()
    183 if len(new_op.outputs) == 1:
    184     return new_op.outputs[0]

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/mil/operation.py:253, in Operation.type_value_inference(self, overwrite_output)
    243 def type_value_inference(self, overwrite_output=False):
    244     """
    245     Perform type inference and auto_val computation based on new input Vars
    246     in kwargs. If self._output_vars is None then we generate _output_vars;
   (...)
    251     existing _output_vars
    252     """
--> 253     output_types = self.type_inference()
    254     if not isinstance(output_types, tuple):
    255         output_types = (output_types,)

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/linear.py:290, in einsum.type_inference(self)
    287 print(f"x, y shapes: {x_shape, y_shape}")
    289 assert len(x_shape) == len(y_shape), "inputs not of the same rank"
--> 290 assert x_shape[-1] == y_shape[-3], "input shapes incompatible"
    291 if x_shape[-2] != 1 and y_shape[-2] != 1:
    292     assert x_shape[-2] == y_shape[-2], "input shapes incompatible"

AssertionError: input shapes incompatible

To Reproduce

import torch  # 1.13.1
import numpy as np  # 1.22.3
import coremltools as ct  # 6.2

from ane_transformers.reference.multihead_attention import MultiHeadAttention

N = 10
x = torch.rand(1, 512, 1, N)

layer = MultiHeadAttention(512, n_head=8, dropout=0.0).eval()
jit = torch.jit.trace(layer, (x, x, x))


# Fixed input shape - works
mlmod_fixed_shape = ct.convert(
    jit,
    inputs = [
        ct.TensorType("q", x.shape),
        ct.TensorType("k", x.shape),
        ct.TensorType("v", x.shape),
    ]
)


# Flexible input shape - fails
flexible_shape = ct.Shape(shape = (1, 512, 1, ct.RangeDim(1, 448)))
mlmod_flexible_shape = ct.convert(
    jit,
    inputs = [
        ct.TensorType("q", flexible_shape),
        ct.TensorType("k", flexible_shape),
        ct.TensorType("v", flexible_shape),
    ]
)


# Enumerated shape (not ideal, but better than fixed) also throws the same `AssertionError`
enumerated_shapes = ct.EnumeratedShapes(
    [(1, 512, 1, i) for i in np.array(list(range(1, 449)))[::4]]
)
mlmodel_enumerated_shape = ct.convert(
    jit,
    inputs = [
        ct.TensorType("q", enumerated_shapes),
        ct.TensorType("k", enumerated_shapes),
        ct.TensorType("v", enumerated_shapes),
    ],
)

System environment (please complete the following information):

coremltools version: 6.2
torch version: 1.13.1
numpy version: 1.22.3
OS: macOS 13.0, MacBook Pro 16-inch, 2021

Additional context

I'm quite certain that the shape error is happening as part of an einsum operation in the layer definition. While debugging, I printed out the equations and shapes of all einsum ops being converted (I did this by adding two print statements right below these lines). It appears that the error happens in one of the later einsum ops and not right away.

Einsum conversion equation: bkhc,bchq->bkhq
x, y shapes: ((1, is148, 1, 64), (1, 64, 1, is147))
Einsum conversion equation: bkhc,bchq->bkhq
x, y shapes: ((1, is148, 1, 64), (1, 64, 1, is147))
Einsum conversion equation: bkhc,bchq->bkhq
x, y shapes: ((1, is148, 1, 64), (1, 64, 1, is147))
Einsum conversion equation: bkhc,bchq->bkhq
x, y shapes: ((1, is148, 1, 64), (1, 64, 1, is147))
Einsum conversion equation: bkhc,bchq->bkhq
x, y shapes: ((1, is148, 1, 64), (1, 64, 1, is147))
Einsum conversion equation: bkhc,bchq->bkhq
x, y shapes: ((1, is148, 1, 64), (1, 64, 1, is147))
Einsum conversion equation: bkhc,bchq->bkhq
x, y shapes: ((1, is148, 1, 64), (1, 64, 1, is147))
Einsum conversion equation: bkhc,bchq->bkhq
x, y shapes: ((1, is148, 1, 64), (1, 64, 1, is147))
Einsum conversion equation: bchk,bkhq->bchq
x, y shapes: ((1, 64, 1, is149), (1, is148, 1, is147))

Perhaps this issue is tangentially related: #1754

The text was updated successfully, but these errors were encountered:

TobyRoseman · 2023-02-09T23:10:54Z

@rsomani95 - when trying to run your code, the following line:

layer = MaskedMultiHeadAttentionANE(512, 8).eval()

Gives the following error:

NameError: name 'MaskedMultiHeadAttentionANE' is not defined

Please update your code.

rsomani95 · 2023-02-10T04:56:24Z

@TobyRoseman apologies, I overlooked removing that line when I edited the issue to make it more concise.
I've updated the code, you should be able to reproduce now.

rsomani95 · 2023-02-15T09:03:35Z

@TobyRoseman are you able to run the code now?

TobyRoseman · 2023-02-16T21:35:15Z

Yes, I can now reproduce the issue (after locally cloning and using the ml-ane-transformers repository). Thanks for updating the code.

The issue here is that x_shape and y_shape contain symbols.

rsomani95 · 2023-02-17T04:38:43Z

@TobyRoseman understood, thank you.
Do you have any pointers for changing the source Python code that could alleviate this issue?

TobyRoseman · 2023-02-20T19:59:24Z

Do you have any pointers for changing the source Python code that could alleviate this issue?

It's unclear how much work will be necessary here. As a starting point you could try just removing that assertions and seeing if it breaks anywhere else.

Janus289 · 2023-02-20T20:46:36Z

Commenting out the assert allows the conversion to finish, but where fixed size uses the ANE, flexible shape does not.

aseemw · 2023-02-23T05:50:49Z

Commenting out the assert allows the conversion to finish, but where fixed size uses the ANE, flexible shape does not.

Instead of using ct.RangeDim, can you try using the ct.EnumeratedShapes option? That should possibly use the ANE.

vade · 2023-02-28T02:11:53Z

Does ct.EnumeratedShapesallow for support of up to 50,000+ possible shapes, ie (1, 1), (1, 2), ... (1, 51865)? We are accruing tokens from Whispers encoder - which requires continuing to run token prediction and passing in an accrued sequence of additional tokens until you hit an end of transcript token, or the max vocal size (51865).

Does it seem feasible to use the enumerated shapes path?

aseemw · 2023-02-28T05:26:10Z

Enumerated shapes are limited to 128. So it won't be possible to use that if you need to support 50k shapes.

rsomani95 · 2023-02-28T05:41:12Z

I haven't checked output correctness yet, but commenting out the three assert statements does allow it to export.
Enumerated shapes isn't practical because of the reasons @aseemw pointed out.

What's odd though, is that the XCode benchmark suggests that all the models run on ANE, even the flexible shape model. Could this be happening because the default input size is 1 and the benchmark only runs on the default size?

aseemw · 2023-02-28T05:45:12Z

Right, the default size model is expected to utilize the ANE, even with flexible shapes, but if you run the model with a different shape, it would likely only run on CPU/GPU.

vade · 2023-03-06T21:52:05Z

Filed internal feedback assistant report at FB12038137 fwiw

atiorh · 2023-03-08T18:39:50Z

Hello, author of ml-ane-transformers here 👋 Great to hear that you are attempting to extend the reference implementation's capabilities with flexible shapes. Some notes:

50k+ different shapes needed while ct.EnumeratedShapes accommodates 128 max : I recommend enumerating only sequence_length=[2**n for n in range(7,17)], advancing from one sequence_length to the next one right before you overflow and use masking (decoder_k_mask) to disable attention on unused indices.
Benchmarking the flexible shape models you were able to generate: I recommend using model = coremltools.models.MLModel(path_to_mlpackage_file) and model.predict(dict_of_inputs_with_varying_sequence_length) for each sequence length with a timer around this call. There will be non-zero overhead with this way of benchmarking but the marginal overhead will be negligible for the larger variants.
Verifying output correctness: I recommend a PSNR check similar to this one in our unit tests.

vade · 2023-03-08T18:46:13Z

Hi! Thank you so much @atiorh - its awesome to have your feedback and input. Over the last few days there has been a lot of interest from the larger WhisperCPP community who is also tackling the same problem, and we are combining our efforts taking place here: ggerganov/whisper.cpp#548

I hope folks don't mind, but im pinging @wangchou @RobertRiachi each of whom have been tackling this independently. No attempt to dog - pile, mostly just getting super smart folks on same page to make informed decisions.

Thanks again everyone. Really exciting to be on the cusp of some really fast on device inference.

vade · 2023-03-08T18:48:29Z

One quick question for @atiorh and friends at Apple - since it's not super clear to me, in theory, the ANE supports ct.EnumeratedShapes assuming the layers all support it? The runtime generally wont punt to other devices? Thanks again all.

aseemw · 2023-03-09T19:19:57Z

Yes, thats right, the static shaped model should already be ANE resident, and if you update such a model with enumerated shapes it should run on the ANE.
Unless, the process of making the model dynamic shaped, with enumerated shapes, introduces some dynamic layers (e..g converts a static reshape to a fully dynamic reshape), in that case, we may lose the ANE residency.

vade · 2023-03-09T19:25:16Z

Thank you very much for the clarifications @aseemw!

atiorh · 2023-03-10T05:47:46Z

Does ct.EnumeratedShapesallow for support of up to 50,000+ possible shapes, ie (1, 1), (1, 2), ... (1, 51865)? We are accruing tokens from Whispers encoder - which requires continuing to run token prediction and passing in an accrued sequence of additional tokens until you hit an end of transcript token, or the max vocal size (51865).

Does it seem feasible to use the enumerated shapes path?

My understanding is that (please correct me if I am mistaken):

Whisper's encoder has a default maximum sequence length of 1500 tokens
Whisper's decoder has a default maximum sequence length of 448 tokens.
448 is the value @rsomani95 intended to use for flexible shapes based on the first message on this thread.
@vade Could you please confirm that your use case for flexible shapes is for something other than autoregressive decoding?

anentropic · 2023-05-08T10:54:14Z

FWIW I have the exact same issue (fails on same line) when trying to plug the translated DistilBERT model from ml-ane-transformers into the https://github.com/huggingface/exporters exporter code

I am guessing from the info above it's because HF exporter wraps the model in an extra module layer that has a bunch of conditional logic for models of different types, so the input shape becomes 'flexible'

anentropic · 2023-05-08T13:16:49Z

...after a bit more digging, the culprit in HF exporters is not the Wrapper module

it's this bit https://github.com/huggingface/exporters/blob/main/src/exporters/coreml/convert.py#L541 -> get_input_types https://github.com/huggingface/exporters/blob/main/src/exporters/coreml/convert.py#L139 -> https://github.com/huggingface/exporters/blob/main/src/exporters/coreml/convert.py#L86 get_shape

i.e. for some models (e.g. the one I was trying above, for sequence classification) it will use a ct.RangeDim(min_length, max_length)

which is because of this definition: https://github.com/huggingface/exporters/blob/main/src/exporters/coreml/config.py#L247

sequence_length=(1, self.maxSequenceLength),

So I am thinking the solution for me in this case is to hack HF exporters code to use a fixed sequence length with padding, instead of flexible

Also... I assume the only reason that HF exporters isn't more generally affected by this issue is most of the HF models don't use einsum, whereas ml-ane-transformers substitutes one in the attention layer?

fukatani · 2023-05-21T13:18:32Z

I found rewrite _attention_fn in MultiHeadAttention works well.
In your use case, bkhc,bchq->bkhq and b = h = 1, you can calculate it by kc,cq->kq and reshape tensor.

    def _attention_fn(self, q, k, v, qk_mask, k_mask, return_weights):
        ...
        attn_weights = [aw.softmax(dim=1) for aw in attn_weights
                        ]  # n_head * (batch_size, src_seq_len, 1, tgt_seq_len)
        mh_w = [self.dropout(aw) for aw in attn_weights
                ]  # n_head * (batch_size, src_seq_len, 1, tgt_seq_len)
        mh_w = [wi.reshape(wi.shape[1], wi.shape[3]) for wi in mh_w]
        mh_v = [vi.reshape(vi.shape[1], vi.shape[3]) for vi in mh_v]
        attn = [
            torch.einsum('kq,ck->cq', wi, vi)
            for wi, vi in zip(mh_w, mh_v)
        ]  # n_head * (batch_size, d_v/n_head, 1, tgt_seq_len)
        attn = [
            a.reshape(1, a.shape[0], 1, a.shape[1]) for a in attn
        ]
        attn = torch.cat(attn, dim=1)  # (batch_size, d_v, 1, tgt_seq_len)

        if return_weights:
            return attn, attn_weights
        return attn, None

Here is my test code.

import torch  # 1.13.1
import numpy as np  # 1.22.3
import coremltools as ct  # 6.2

from ane_transformers.reference.multihead_attention import MultiHeadAttention

N = 10
x = torch.rand(1, 512, 1, N)

with torch.no_grad():
    layer = MultiHeadAttention(512, n_head=8, dropout=0.0).eval()
    jit = torch.jit.trace(layer, (x, x, x))


    # Flexible input shape
    flexible_shape = ct.Shape(shape = (1, 512, 1, ct.RangeDim(1, 448)))
    mlmod_flexible_shape = ct.convert(
        jit,
        inputs = [
            ct.TensorType("q", flexible_shape),
            ct.TensorType("k", flexible_shape),
            ct.TensorType("v", flexible_shape),
        ]
    )

    out = layer(x, x, x)
    out_dict = mlmod_flexible_shape.predict({'q': x.detach().numpy().astype(np.float32),
                                             'k': x.detach().numpy().astype(np.float32),
                                             'v': x.detach().numpy().astype(np.float32)})
    np.allclose(out[0], out_dict['var_451'], rtol=0.001, atol=0.001)  # OK

I think coremltools MIL einsum has something wrong with flexible shape
One of solutions is #1863 .

fukatani · 2023-05-23T13:23:32Z

I found more simple code to reproduce this issue.

import torch  # 1.13.1
import numpy as np  # 1.22.3
import coremltools as ct  # 6.2
import torch.nn as nn


class Net(nn.Module):
    def __init__(self, embed_dim):
        super().__init__()
        self.d_qk = embed_dim
        self.d_v = embed_dim
        self.d_out = embed_dim
        self.n_head = 8

        self.k_proj = nn.Conv2d(embed_dim, self.d_qk, 1)

    def forward(self, q, k, v):
        k = self.k_proj(k)
        mh_q = q.split(
            self.d_qk // self.n_head,
            dim=1)  # n_head * (batch_size, d_qk/n_head, 1, tgt_seq_len)
        mh_k = k.transpose(1, 3).split(
            self.d_qk // self.n_head,
            dim=3)  # n_head * (batch_size, src_seq_len, 1, d_qk/n_head)
        mh_v = v.split(
            self.d_v // self.n_head,
            dim=1)  # n_head * (batch_size, d_v/n_head, 1, src_seq_len)
        mh_w = [
            torch.einsum('bchq,bkhc->bkhq', [qi, ki]) for qi, ki in zip(mh_q, mh_k)
        ]  # n_head * (batch_size, src_seq_len, 1, tgt_seq_len)
        attn = [
            torch.einsum('bkhq,bchk->bchq', wi, vi) for wi, vi in zip(mh_w, mh_v)
        ]  # n_head * (batch_size, d_v/n_head, 1, tgt_seq_len)
        return attn

N = 10
x = torch.rand(1, 512, 1, N)

with torch.no_grad():
    layer = Net(512).eval()
    jit = torch.jit.trace(layer, (x, x, x))

    # Flexible input shape - fails
    flexible_shape = ct.Shape(shape=(1, 512, 1, ct.RangeDim(1, 448)))
    mlmod_flexible_shape = ct.convert(
        jit,
        inputs=[
            ct.TensorType("q", flexible_shape),
            ct.TensorType("k", flexible_shape),
            ct.TensorType("v", flexible_shape),
        ]
    )
    out = layer(x, x, x)
    out_dict = mlmod_flexible_shape.predict({'q': x.detach().numpy().astype(np.float32),
                                             'k': x.detach().numpy().astype(np.float32),
                                             'v': x.detach().numpy().astype(np.float32)})
    np.allclose(out[0], out_dict['var_195'], rtol=0.001, atol=0.001)

In strange, k = self.k_proj(k) appear to contribute to this issue.
If commented out here, this code will be passed.

I also embedded print debug code as print(x_shape, y_shape) in linear.py, and compared commented out cases and non-commented cases.

with k = self.k_proj(k)

(1, is1, 1, 64) (1, 64, 1, is0)
(1, is1, 1, 64) (1, 64, 1, is0)
(1, is1, 1, 64) (1, 64, 1, is0)
(1, is1, 1, 64) (1, 64, 1, is0)
(1, is1, 1, 64) (1, 64, 1, is0)
(1, is1, 1, 64) (1, 64, 1, is0)
(1, is1, 1, 64) (1, 64, 1, is0)
(1, is1, 1, 64) (1, 64, 1, is0)
(1, 64, 1, is0) (1, is1, 1, is0)
...

without k = self.k_proj(k)

(1, is0, 1, 64) (1, 64, 1, is0)
(1, is0, 1, 64) (1, 64, 1, is0)
(1, is0, 1, 64) (1, 64, 1, is0)
(1, is0, 1, 64) (1, 64, 1, is0)
(1, is0, 1, 64) (1, 64, 1, is0)
(1, is0, 1, 64) (1, 64, 1, is0)
(1, is0, 1, 64) (1, 64, 1, is0)
(1, is0, 1, 64) (1, 64, 1, is0)
(1, 64, 1, is0) (1, is0, 1, is0)
...

For some reason, is1 appears in only with k = self.k_proj(k).

rsomani95 added the bug Unexpected behaviour that should be corrected (type) label Feb 9, 2023

rsomani95 closed this as completed Feb 9, 2023

rsomani95 reopened this Feb 9, 2023

rsomani95 changed the title ~~Custom MultiHead Attention Block Conversion Fails With Flexible Inputs~~ Apple's ANE Optimised MultiHeadAttention Export Fails With Flexible Inputs Feb 9, 2023

rsomani95 changed the title ~~Apple's ANE Optimised MultiHeadAttention Export Fails With Flexible Inputs~~ Apple's ANE Optimised MultiHeadAttention Export Fails With Flexible Input Shape Feb 9, 2023

TobyRoseman added the PyTorch (traced) label Feb 9, 2023

TobyRoseman added the triaged Reviewed and examined, release as been assigned if applicable (status) label Feb 16, 2023

aseemw mentioned this issue Apr 4, 2023

Question re. ANE Usage with Flexible Input Shapes #1764

Open

fukatani mentioned this issue May 21, 2023

Use generic einsum if shape is dynamic. #1863

Closed

fukatani mentioned this issue May 28, 2023

not create new symbol with immutable shape convolution. #1867

Merged

YifanShenSZ closed this as completed in #1867 May 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apple's ANE Optimised `MultiHeadAttention` Export Fails With Flexible Input Shape #1763

Apple's ANE Optimised `MultiHeadAttention` Export Fails With Flexible Input Shape #1763

rsomani95 commented Feb 9, 2023 •

edited

Loading

Stack Trace

TobyRoseman commented Feb 9, 2023

rsomani95 commented Feb 10, 2023

rsomani95 commented Feb 15, 2023

TobyRoseman commented Feb 16, 2023

rsomani95 commented Feb 17, 2023

TobyRoseman commented Feb 20, 2023

Janus289 commented Feb 20, 2023

aseemw commented Feb 23, 2023

vade commented Feb 28, 2023

aseemw commented Feb 28, 2023

rsomani95 commented Feb 28, 2023

aseemw commented Feb 28, 2023 •

edited

Loading

vade commented Mar 6, 2023

atiorh commented Mar 8, 2023 •

edited

Loading

vade commented Mar 8, 2023 •

edited

Loading

vade commented Mar 8, 2023

aseemw commented Mar 9, 2023 •

edited

Loading

vade commented Mar 9, 2023

atiorh commented Mar 10, 2023 •

edited

Loading

anentropic commented May 8, 2023

anentropic commented May 8, 2023

fukatani commented May 21, 2023 •

edited

Loading

fukatani commented May 23, 2023

Apple's ANE Optimised MultiHeadAttention Export Fails With Flexible Input Shape #1763

Apple's ANE Optimised MultiHeadAttention Export Fails With Flexible Input Shape #1763

Comments

rsomani95 commented Feb 9, 2023 • edited Loading

🐞Bug Description

Stack Trace

To Reproduce

System environment (please complete the following information):

Additional context

TobyRoseman commented Feb 9, 2023

rsomani95 commented Feb 10, 2023

rsomani95 commented Feb 15, 2023

TobyRoseman commented Feb 16, 2023

rsomani95 commented Feb 17, 2023

TobyRoseman commented Feb 20, 2023

Janus289 commented Feb 20, 2023

aseemw commented Feb 23, 2023

vade commented Feb 28, 2023

aseemw commented Feb 28, 2023

rsomani95 commented Feb 28, 2023

aseemw commented Feb 28, 2023 • edited Loading

vade commented Mar 6, 2023

atiorh commented Mar 8, 2023 • edited Loading

vade commented Mar 8, 2023 • edited Loading

vade commented Mar 8, 2023

aseemw commented Mar 9, 2023 • edited Loading

vade commented Mar 9, 2023

atiorh commented Mar 10, 2023 • edited Loading

anentropic commented May 8, 2023

anentropic commented May 8, 2023

fukatani commented May 21, 2023 • edited Loading

fukatani commented May 23, 2023

with k = self.k_proj(k)

without k = self.k_proj(k)

Apple's ANE Optimised `MultiHeadAttention` Export Fails With Flexible Input Shape #1763

Apple's ANE Optimised `MultiHeadAttention` Export Fails With Flexible Input Shape #1763

rsomani95 commented Feb 9, 2023 •

edited

Loading

aseemw commented Feb 28, 2023 •

edited

Loading

atiorh commented Mar 8, 2023 •

edited

Loading

vade commented Mar 8, 2023 •

edited

Loading

aseemw commented Mar 9, 2023 •

edited

Loading

atiorh commented Mar 10, 2023 •

edited

Loading

fukatani commented May 21, 2023 •

edited

Loading