Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apple's ANE Optimised MultiHeadAttention Export Fails With Flexible Input Shape #1763

Closed
rsomani95 opened this issue Feb 9, 2023 · 23 comments · Fixed by #1867
Closed

Apple's ANE Optimised MultiHeadAttention Export Fails With Flexible Input Shape #1763

rsomani95 opened this issue Feb 9, 2023 · 23 comments · Fixed by #1867
Labels
bug Unexpected behaviour that should be corrected (type) PyTorch (traced) triaged Reviewed and examined, release as been assigned if applicable (status)

Comments

@rsomani95
Copy link

rsomani95 commented Feb 9, 2023

🐞Bug Description

I'm trying to export Apple's ANE optimised MultiHeadAttention layer defined here

The layer exports successfully with a fixed shape, but fails with flexible shapes.
I'm using this layer in a custom sequence model, so flexible shapes are imperative.

The error thrown is an AssertionError: input shapes incompatible.

Stack Trace

Tuple detected at graph output. This will be flattened in the converted model.
Converting PyTorch Frontend ==> MIL Ops:  73%|| 129/177 [00:00<00:00, 8531.06 o

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[28], line 2
      1 flexible_shape = ct.Shape(shape = (1, 512, 1, ct.RangeDim(1, 448)))
----> 2 mlmod_flexible_shape = ct.convert(
      3     jit,
      4     inputs = [
      5         ct.TensorType("q", flexible_shape),
      6         ct.TensorType("k", flexible_shape),
      7         ct.TensorType("v", flexible_shape),
      8     ]
      9 )

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/_converters_entry.py:444, in convert(model, source, inputs, outputs, classifier_config, minimum_deployment_target, convert_to, compute_precision, skip_model_load, compute_units, package_dir, debug)
    441 if specification_version is None:
    442     specification_version = _set_default_specification_version(exact_target)
--> 444 mlmodel = mil_convert(
    445     model,
    446     convert_from=exact_source,
    447     convert_to=exact_target,
    448     inputs=inputs,
    449     outputs=outputs_as_tensor_or_image_types, # None or list[ct.ImageType/ct.TensorType]
    450     classifier_config=classifier_config,
    451     transforms=tuple(transforms),
    452     skip_model_load=skip_model_load,
    453     compute_units=compute_units,
    454     package_dir=package_dir,
    455     debug=debug,
    456     specification_version=specification_version,
    457 )
    459 if exact_target == 'milinternal':
    460     return mlmodel # Returns the MIL program

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/converter.py:187, in mil_convert(model, convert_from, convert_to, compute_units, **kwargs)
    148 @_profile
    149 def mil_convert(
    150     model,
   (...)
    154     **kwargs
    155 ):
    156     """
    157     Convert model from a specified frontend `convert_from` to a specified
    158     converter backend `convert_to`.
   (...)
    185         See `coremltools.converters.convert`
    186     """
--> 187     return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/converter.py:211, in _mil_convert(model, convert_from, convert_to, registry, modelClass, compute_units, **kwargs)
    208     weights_dir = _tempfile.TemporaryDirectory()
    209     kwargs["weights_dir"] = weights_dir.name
--> 211 proto, mil_program = mil_convert_to_proto(
    212                         model,
    213                         convert_from,
    214                         convert_to,
    215                         registry,
    216                         **kwargs
    217                      )
    219 _reset_conversion_state()
    221 if convert_to == 'milinternal':

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/converter.py:281, in mil_convert_to_proto(model, convert_from, convert_to, converter_registry, **kwargs)
    278 kwargs.setdefault("convert_to", convert_to)
    279 frontend_converter = frontend_converter_type()
--> 281 prog = frontend_converter(model, **kwargs)
    283 if convert_to.lower() != "neuralnetwork":
    284     passes = kwargs.get("transforms", list())

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/converter.py:109, in TorchFrontend.__call__(self, *args, **kwargs)
    106 def __call__(self, *args, **kwargs):
    107     from .frontend.torch import load
--> 109     return load(*args, **kwargs)

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/load.py:57, in load(model_spec, inputs, specification_version, debug, outputs, cut_at_symbols, **kwargs)
     55 inputs = _convert_to_torch_inputtype(inputs)
     56 converter = TorchConverter(torchscript, inputs, outputs, cut_at_symbols, specification_version)
---> 57 return _perform_torch_convert(converter, debug)

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/load.py:96, in _perform_torch_convert(converter, debug)
     94 def _perform_torch_convert(converter, debug):
     95     try:
---> 96         prog = converter.convert()
     97     except RuntimeError as e:
     98         if debug and "convert function" in str(e):

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/converter.py:281, in TorchConverter.convert(self)
    278 self.convert_const()
    280 # Add the rest of the operations
--> 281 convert_nodes(self.context, self.graph)
    283 graph_outputs = [self.context[name] for name in self.graph.outputs]
    285 # An output can be None when it's a None constant, which happens
    286 # in Fairseq MT.

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/ops.py:89, in convert_nodes(context, graph)
     84     raise RuntimeError(
     85         "PyTorch convert function for op '{}' not implemented.".format(node.kind)
     86     )
     88 context.prepare_for_conversion(node)
---> 89 add_op(context, node)
     91 # We've generated all the outputs the graph needs, terminate conversion.
     92 if _all_outputs_present(context, graph):

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/ops.py:1120, in einsum(context, node)
   1118 b = context[node.inputs[1]][1]
   1119 equation = context[node.inputs[0]].val
-> 1120 x = build_einsum_mil(a, b, equation, node.name)
   1121 context.add(x)

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/frontend/_utils.py:164, in build_einsum_mil(a_var, b_var, equation, name)
    162         x = mb.einsum(values=(a_var, b_var), equation=equation, name=name)
    163     else:
--> 164         x = mb.einsum(values=(b_var, a_var), equation=equation_rev, name=name)
    165 elif vec_chw_whu_chu in [parsed_vectors, parsed_vectors_rev]:
    166     if parsed_vectors == vec_chw_whu_chu:

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/mil/ops/registry.py:176, in SSAOpRegistry.register_op.<locals>.class_wrapper.<locals>.add_op(cls, **kwargs)
    173 else:
    174     op_cls_to_add = op_reg[op_type]
--> 176 return cls._add_op(op_cls_to_add, **kwargs)

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/mil/builder.py:182, in Builder._add_op(cls, op_cls, **kwargs)
    180 curr_block()._insert_op_before(new_op, before_op=before_op)
    181 new_op.build_nested_blocks()
--> 182 new_op.type_value_inference()
    183 if len(new_op.outputs) == 1:
    184     return new_op.outputs[0]

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/mil/operation.py:253, in Operation.type_value_inference(self, overwrite_output)
    243 def type_value_inference(self, overwrite_output=False):
    244     """
    245     Perform type inference and auto_val computation based on new input Vars
    246     in kwargs. If self._output_vars is None then we generate _output_vars;
   (...)
    251     existing _output_vars
    252     """
--> 253     output_types = self.type_inference()
    254     if not isinstance(output_types, tuple):
    255         output_types = (output_types,)

File ~/miniconda3/envs/rosetta/lib/python3.8/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/linear.py:290, in einsum.type_inference(self)
    287 print(f"x, y shapes: {x_shape, y_shape}")
    289 assert len(x_shape) == len(y_shape), "inputs not of the same rank"
--> 290 assert x_shape[-1] == y_shape[-3], "input shapes incompatible"
    291 if x_shape[-2] != 1 and y_shape[-2] != 1:
    292     assert x_shape[-2] == y_shape[-2], "input shapes incompatible"

AssertionError: input shapes incompatible

To Reproduce

import torch  # 1.13.1
import numpy as np  # 1.22.3
import coremltools as ct  # 6.2

from ane_transformers.reference.multihead_attention import MultiHeadAttention

N = 10
x = torch.rand(1, 512, 1, N)

layer = MultiHeadAttention(512, n_head=8, dropout=0.0).eval()
jit = torch.jit.trace(layer, (x, x, x))


# Fixed input shape - works
mlmod_fixed_shape = ct.convert(
    jit,
    inputs = [
        ct.TensorType("q", x.shape),
        ct.TensorType("k", x.shape),
        ct.TensorType("v", x.shape),
    ]
)


# Flexible input shape - fails
flexible_shape = ct.Shape(shape = (1, 512, 1, ct.RangeDim(1, 448)))
mlmod_flexible_shape = ct.convert(
    jit,
    inputs = [
        ct.TensorType("q", flexible_shape),
        ct.TensorType("k", flexible_shape),
        ct.TensorType("v", flexible_shape),
    ]
)


# Enumerated shape (not ideal, but better than fixed) also throws the same `AssertionError`
enumerated_shapes = ct.EnumeratedShapes(
    [(1, 512, 1, i) for i in np.array(list(range(1, 449)))[::4]]
)
mlmodel_enumerated_shape = ct.convert(
    jit,
    inputs = [
        ct.TensorType("q", enumerated_shapes),
        ct.TensorType("k", enumerated_shapes),
        ct.TensorType("v", enumerated_shapes),
    ],
)

System environment (please complete the following information):

  • coremltools version: 6.2
  • torch version: 1.13.1
  • numpy version: 1.22.3
  • OS: macOS 13.0, MacBook Pro 16-inch, 2021

Additional context

I'm quite certain that the shape error is happening as part of an einsum operation in the layer definition. While debugging, I printed out the equations and shapes of all einsum ops being converted (I did this by adding two print statements right below these lines). It appears that the error happens in one of the later einsum ops and not right away.

Einsum conversion equation: bkhc,bchq->bkhq
x, y shapes: ((1, is148, 1, 64), (1, 64, 1, is147))
Einsum conversion equation: bkhc,bchq->bkhq
x, y shapes: ((1, is148, 1, 64), (1, 64, 1, is147))
Einsum conversion equation: bkhc,bchq->bkhq
x, y shapes: ((1, is148, 1, 64), (1, 64, 1, is147))
Einsum conversion equation: bkhc,bchq->bkhq
x, y shapes: ((1, is148, 1, 64), (1, 64, 1, is147))
Einsum conversion equation: bkhc,bchq->bkhq
x, y shapes: ((1, is148, 1, 64), (1, 64, 1, is147))
Einsum conversion equation: bkhc,bchq->bkhq
x, y shapes: ((1, is148, 1, 64), (1, 64, 1, is147))
Einsum conversion equation: bkhc,bchq->bkhq
x, y shapes: ((1, is148, 1, 64), (1, 64, 1, is147))
Einsum conversion equation: bkhc,bchq->bkhq
x, y shapes: ((1, is148, 1, 64), (1, 64, 1, is147))
Einsum conversion equation: bchk,bkhq->bchq
x, y shapes: ((1, 64, 1, is149), (1, is148, 1, is147))

Perhaps this issue is tangentially related: #1754

@rsomani95 rsomani95 added the bug Unexpected behaviour that should be corrected (type) label Feb 9, 2023
@rsomani95 rsomani95 reopened this Feb 9, 2023
@rsomani95 rsomani95 changed the title Custom MultiHead Attention Block Conversion Fails With Flexible Inputs Apple's ANE Optimised MultiHeadAttention Export Fails With Flexible Inputs Feb 9, 2023
@rsomani95 rsomani95 changed the title Apple's ANE Optimised MultiHeadAttention Export Fails With Flexible Inputs Apple's ANE Optimised MultiHeadAttention Export Fails With Flexible Input Shape Feb 9, 2023
@TobyRoseman
Copy link
Collaborator

@rsomani95 - when trying to run your code, the following line:

layer = MaskedMultiHeadAttentionANE(512, 8).eval()

Gives the following error:

NameError: name 'MaskedMultiHeadAttentionANE' is not defined

Please update your code.

@rsomani95
Copy link
Author

@TobyRoseman apologies, I overlooked removing that line when I edited the issue to make it more concise.
I've updated the code, you should be able to reproduce now.

@rsomani95
Copy link
Author

@TobyRoseman are you able to run the code now?

@TobyRoseman
Copy link
Collaborator

Yes, I can now reproduce the issue (after locally cloning and using the ml-ane-transformers repository). Thanks for updating the code.

The issue here is that x_shape and y_shape contain symbols.

@TobyRoseman TobyRoseman added the triaged Reviewed and examined, release as been assigned if applicable (status) label Feb 16, 2023
@rsomani95
Copy link
Author

@TobyRoseman understood, thank you.
Do you have any pointers for changing the source Python code that could alleviate this issue?

@TobyRoseman
Copy link
Collaborator

Do you have any pointers for changing the source Python code that could alleviate this issue?

It's unclear how much work will be necessary here. As a starting point you could try just removing that assertions and seeing if it breaks anywhere else.

@Janus289
Copy link

Commenting out the assert allows the conversion to finish, but where fixed size uses the ANE, flexible shape does not.

@aseemw
Copy link
Collaborator

aseemw commented Feb 23, 2023

Commenting out the assert allows the conversion to finish, but where fixed size uses the ANE, flexible shape does not.

Instead of using ct.RangeDim, can you try using the ct.EnumeratedShapes option? That should possibly use the ANE.

@vade
Copy link

vade commented Feb 28, 2023

Does ct.EnumeratedShapesallow for support of up to 50,000+ possible shapes, ie (1, 1), (1, 2), ... (1, 51865)? We are accruing tokens from Whispers encoder - which requires continuing to run token prediction and passing in an accrued sequence of additional tokens until you hit an end of transcript token, or the max vocal size (51865).

Does it seem feasible to use the enumerated shapes path?

@aseemw
Copy link
Collaborator

aseemw commented Feb 28, 2023

Enumerated shapes are limited to 128. So it won't be possible to use that if you need to support 50k shapes.

@rsomani95
Copy link
Author

I haven't checked output correctness yet, but commenting out the three assert statements does allow it to export.
Enumerated shapes isn't practical because of the reasons @aseemw pointed out.

What's odd though, is that the XCode benchmark suggests that all the models run on ANE, even the flexible shape model. Could this be happening because the default input size is 1 and the benchmark only runs on the default size?

CleanShot 2023-02-28 at 11 05 35

@aseemw
Copy link
Collaborator

aseemw commented Feb 28, 2023

Right, the default size model is expected to utilize the ANE, even with flexible shapes, but if you run the model with a different shape, it would likely only run on CPU/GPU.

@vade
Copy link

vade commented Mar 6, 2023

Filed internal feedback assistant report at FB12038137 fwiw

@atiorh
Copy link
Contributor

atiorh commented Mar 8, 2023

Hello, author of ml-ane-transformers here 👋 Great to hear that you are attempting to extend the reference implementation's capabilities with flexible shapes. Some notes:

  • 50k+ different shapes needed while ct.EnumeratedShapes accommodates 128 max : I recommend enumerating only sequence_length=[2**n for n in range(7,17)], advancing from one sequence_length to the next one right before you overflow and use masking (decoder_k_mask) to disable attention on unused indices.
  • Benchmarking the flexible shape models you were able to generate: I recommend using model = coremltools.models.MLModel(path_to_mlpackage_file) and model.predict(dict_of_inputs_with_varying_sequence_length) for each sequence length with a timer around this call. There will be non-zero overhead with this way of benchmarking but the marginal overhead will be negligible for the larger variants.
  • Verifying output correctness: I recommend a PSNR check similar to this one in our unit tests.

@vade
Copy link

vade commented Mar 8, 2023

Hi! Thank you so much @atiorh - its awesome to have your feedback and input. Over the last few days there has been a lot of interest from the larger WhisperCPP community who is also tackling the same problem, and we are combining our efforts taking place here: ggerganov/whisper.cpp#548

I hope folks don't mind, but im pinging @wangchou @RobertRiachi each of whom have been tackling this independently. No attempt to dog - pile, mostly just getting super smart folks on same page to make informed decisions.

Thanks again everyone. Really exciting to be on the cusp of some really fast on device inference.

@vade
Copy link

vade commented Mar 8, 2023

One quick question for @atiorh and friends at Apple - since it's not super clear to me, in theory, the ANE supports ct.EnumeratedShapes assuming the layers all support it? The runtime generally wont punt to other devices? Thanks again all.

@aseemw
Copy link
Collaborator

aseemw commented Mar 9, 2023

Yes, thats right, the static shaped model should already be ANE resident, and if you update such a model with enumerated shapes it should run on the ANE.
Unless, the process of making the model dynamic shaped, with enumerated shapes, introduces some dynamic layers (e..g converts a static reshape to a fully dynamic reshape), in that case, we may lose the ANE residency.

@vade
Copy link

vade commented Mar 9, 2023

Thank you very much for the clarifications @aseemw!

@atiorh
Copy link
Contributor

atiorh commented Mar 10, 2023

Does ct.EnumeratedShapesallow for support of up to 50,000+ possible shapes, ie (1, 1), (1, 2), ... (1, 51865)? We are accruing tokens from Whispers encoder - which requires continuing to run token prediction and passing in an accrued sequence of additional tokens until you hit an end of transcript token, or the max vocal size (51865).

Does it seem feasible to use the enumerated shapes path?

My understanding is that (please correct me if I am mistaken):

  • Whisper's encoder has a default maximum sequence length of 1500 tokens
  • Whisper's decoder has a default maximum sequence length of 448 tokens.
  • 448 is the value @rsomani95 intended to use for flexible shapes based on the first message on this thread.
  • @vade Could you please confirm that your use case for flexible shapes is for something other than autoregressive decoding?

@anentropic
Copy link

FWIW I have the exact same issue (fails on same line) when trying to plug the translated DistilBERT model from ml-ane-transformers into the https://github.com/huggingface/exporters exporter code

I am guessing from the info above it's because HF exporter wraps the model in an extra module layer that has a bunch of conditional logic for models of different types, so the input shape becomes 'flexible'

@anentropic
Copy link

...after a bit more digging, the culprit in HF exporters is not the Wrapper module

it's this bit https://github.com/huggingface/exporters/blob/main/src/exporters/coreml/convert.py#L541 -> get_input_types https://github.com/huggingface/exporters/blob/main/src/exporters/coreml/convert.py#L139 -> https://github.com/huggingface/exporters/blob/main/src/exporters/coreml/convert.py#L86 get_shape

i.e. for some models (e.g. the one I was trying above, for sequence classification) it will use a ct.RangeDim(min_length, max_length)

which is because of this definition: https://github.com/huggingface/exporters/blob/main/src/exporters/coreml/config.py#L247

sequence_length=(1, self.maxSequenceLength),

So I am thinking the solution for me in this case is to hack HF exporters code to use a fixed sequence length with padding, instead of flexible

Also... I assume the only reason that HF exporters isn't more generally affected by this issue is most of the HF models don't use einsum, whereas ml-ane-transformers substitutes one in the attention layer?

@fukatani
Copy link
Contributor

fukatani commented May 21, 2023

I found rewrite _attention_fn in MultiHeadAttention works well.
In your use case, bkhc,bchq->bkhq and b = h = 1, you can calculate it by kc,cq->kq and reshape tensor.

    def _attention_fn(self, q, k, v, qk_mask, k_mask, return_weights):
        ...
        attn_weights = [aw.softmax(dim=1) for aw in attn_weights
                        ]  # n_head * (batch_size, src_seq_len, 1, tgt_seq_len)
        mh_w = [self.dropout(aw) for aw in attn_weights
                ]  # n_head * (batch_size, src_seq_len, 1, tgt_seq_len)
        mh_w = [wi.reshape(wi.shape[1], wi.shape[3]) for wi in mh_w]
        mh_v = [vi.reshape(vi.shape[1], vi.shape[3]) for vi in mh_v]
        attn = [
            torch.einsum('kq,ck->cq', wi, vi)
            for wi, vi in zip(mh_w, mh_v)
        ]  # n_head * (batch_size, d_v/n_head, 1, tgt_seq_len)
        attn = [
            a.reshape(1, a.shape[0], 1, a.shape[1]) for a in attn
        ]
        attn = torch.cat(attn, dim=1)  # (batch_size, d_v, 1, tgt_seq_len)

        if return_weights:
            return attn, attn_weights
        return attn, None

Here is my test code.

import torch  # 1.13.1
import numpy as np  # 1.22.3
import coremltools as ct  # 6.2

from ane_transformers.reference.multihead_attention import MultiHeadAttention

N = 10
x = torch.rand(1, 512, 1, N)

with torch.no_grad():
    layer = MultiHeadAttention(512, n_head=8, dropout=0.0).eval()
    jit = torch.jit.trace(layer, (x, x, x))


    # Flexible input shape
    flexible_shape = ct.Shape(shape = (1, 512, 1, ct.RangeDim(1, 448)))
    mlmod_flexible_shape = ct.convert(
        jit,
        inputs = [
            ct.TensorType("q", flexible_shape),
            ct.TensorType("k", flexible_shape),
            ct.TensorType("v", flexible_shape),
        ]
    )

    out = layer(x, x, x)
    out_dict = mlmod_flexible_shape.predict({'q': x.detach().numpy().astype(np.float32),
                                             'k': x.detach().numpy().astype(np.float32),
                                             'v': x.detach().numpy().astype(np.float32)})
    np.allclose(out[0], out_dict['var_451'], rtol=0.001, atol=0.001)  # OK

I think coremltools MIL einsum has something wrong with flexible shape
One of solutions is #1863 .

@fukatani
Copy link
Contributor

I found more simple code to reproduce this issue.

import torch  # 1.13.1
import numpy as np  # 1.22.3
import coremltools as ct  # 6.2
import torch.nn as nn


class Net(nn.Module):
    def __init__(self, embed_dim):
        super().__init__()
        self.d_qk = embed_dim
        self.d_v = embed_dim
        self.d_out = embed_dim
        self.n_head = 8

        self.k_proj = nn.Conv2d(embed_dim, self.d_qk, 1)

    def forward(self, q, k, v):
        k = self.k_proj(k)
        mh_q = q.split(
            self.d_qk // self.n_head,
            dim=1)  # n_head * (batch_size, d_qk/n_head, 1, tgt_seq_len)
        mh_k = k.transpose(1, 3).split(
            self.d_qk // self.n_head,
            dim=3)  # n_head * (batch_size, src_seq_len, 1, d_qk/n_head)
        mh_v = v.split(
            self.d_v // self.n_head,
            dim=1)  # n_head * (batch_size, d_v/n_head, 1, src_seq_len)
        mh_w = [
            torch.einsum('bchq,bkhc->bkhq', [qi, ki]) for qi, ki in zip(mh_q, mh_k)
        ]  # n_head * (batch_size, src_seq_len, 1, tgt_seq_len)
        attn = [
            torch.einsum('bkhq,bchk->bchq', wi, vi) for wi, vi in zip(mh_w, mh_v)
        ]  # n_head * (batch_size, d_v/n_head, 1, tgt_seq_len)
        return attn

N = 10
x = torch.rand(1, 512, 1, N)

with torch.no_grad():
    layer = Net(512).eval()
    jit = torch.jit.trace(layer, (x, x, x))

    # Flexible input shape - fails
    flexible_shape = ct.Shape(shape=(1, 512, 1, ct.RangeDim(1, 448)))
    mlmod_flexible_shape = ct.convert(
        jit,
        inputs=[
            ct.TensorType("q", flexible_shape),
            ct.TensorType("k", flexible_shape),
            ct.TensorType("v", flexible_shape),
        ]
    )
    out = layer(x, x, x)
    out_dict = mlmod_flexible_shape.predict({'q': x.detach().numpy().astype(np.float32),
                                             'k': x.detach().numpy().astype(np.float32),
                                             'v': x.detach().numpy().astype(np.float32)})
    np.allclose(out[0], out_dict['var_195'], rtol=0.001, atol=0.001)

In strange, k = self.k_proj(k) appear to contribute to this issue.
If commented out here, this code will be passed.

I also embedded print debug code as print(x_shape, y_shape) in linear.py, and compared commented out cases and non-commented cases.

with k = self.k_proj(k)

(1, is1, 1, 64) (1, 64, 1, is0)
(1, is1, 1, 64) (1, 64, 1, is0)
(1, is1, 1, 64) (1, 64, 1, is0)
(1, is1, 1, 64) (1, 64, 1, is0)
(1, is1, 1, 64) (1, 64, 1, is0)
(1, is1, 1, 64) (1, 64, 1, is0)
(1, is1, 1, 64) (1, 64, 1, is0)
(1, is1, 1, 64) (1, 64, 1, is0)
(1, 64, 1, is0) (1, is1, 1, is0)
...

without k = self.k_proj(k)

(1, is0, 1, 64) (1, 64, 1, is0)
(1, is0, 1, 64) (1, 64, 1, is0)
(1, is0, 1, 64) (1, 64, 1, is0)
(1, is0, 1, 64) (1, 64, 1, is0)
(1, is0, 1, 64) (1, 64, 1, is0)
(1, is0, 1, 64) (1, 64, 1, is0)
(1, is0, 1, 64) (1, 64, 1, is0)
(1, is0, 1, 64) (1, 64, 1, is0)
(1, 64, 1, is0) (1, is0, 1, is0)
...

For some reason, is1 appears in only with k = self.k_proj(k).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unexpected behaviour that should be corrected (type) PyTorch (traced) triaged Reviewed and examined, release as been assigned if applicable (status)
Projects
None yet
8 participants