masked_fill: fill value type must match tensor type #1915

pcuenca · 2023-07-15T20:48:47Z

masked_fill in PyTorch accepts a fill value specified as an int (0, for instance), even if the tensor to fill contains floats. The coremltools select op requires both inputs to have the same type, and conversion fails in this particular case. This PR tries to ensure that the fill value matches the tensor dtype.

For additional context, causal language models in the transformers library recently started using masked_fill to prepare the causal attention masks. Since version 4.30.0, many models contain code such as the following:

https://github.com/huggingface/transformers/blob/main/src/transformers/models/clip/modeling_clip.py#L687

Note the use of 0 as fill value, which causes conversion to fail for all these models because it's interpreted as an int during conversion. Current workarounds include pinning transformers to a previous version (see here, for example). It's also unfortunate that the transformers function that uses this code is not a method (https://github.com/huggingface/transformers/blob/5bb4430edc7df9f9950d412d98bbe505cc4d328b/src/transformers/models/clip/modeling_clip.py#L678), so patching cannot be easily applied.

In summary, this change makes the masked_fill frontend op more similar to PyTorch's, facilitates conversion of transformers models and unlocks the use of the current version of transformers, which will be required for new models being added to the library.

For testing coverage, I simply adapted the existing unit test to use 0 in addition to 10.0.

TobyRoseman · 2023-07-19T21:05:00Z

Thanks @pcuenca for another fix. This change looks good to me. I've kicked off a CI Run. Assuming that passes, I will merge this pull request.

jakesabathia2 · 2023-07-19T21:38:39Z

coremltools/converters/mil/frontend/torch/test/test_torch_ops.py

    )
-    def test_masked_fill(self, compute_unit, backend):
+    def test_masked_fill(self, compute_unit, backend, value):


By default, the testing infra is producing the fp32 input only.
However, in this change, we are missing the case that input is int32, in which the value might be downcast.
Please refer to the converter_input_type argument,
to parametrize the input dtype as well.
Also, it would be better to test the value of [10.3, 7] instead of [10.0, 0].
With 10.3, we can test the case that 10.3 be downcasted to int,
and 7 is in generally a better choice than 0 for the testing purpose,
given that 0 could potentially be in some default value haha 😄

https://github.com/apple/coremltools/blob/main/coremltools/converters/mil/frontend/torch/test/test_torch_ops.py#L7030
An example of using converter_input_type

pcuenca · 2023-07-21T13:51:02Z

Thanks @jakesabathia2 for your thoughtful review!

I increased test coverage so input values may also be ints, and fill values might have to be downcast. However, I did not follow the converter_input_type method as I couldn't make it work in this case. As I understand it, the inputs are created as floats (as you mentioned) here:

coremltools/coremltools/converters/mil/frontend/torch/test/testing_utils.py

Line 237 in 57270f3

input_data = generate_input_data(input_data, rand_range)

, and the conversion to Core ML would fail here

coremltools/coremltools/converters/mil/frontend/torch/test/testing_utils.py

Line 244 in 57270f3

model_spec, mlmodel, coreml_inputs, coreml_results = convert_and_compare(

because the Core ML TensorType would not match the actual tensors.

Instead, I created the inputs inside the test using the appropriate dtype, which is another method I've seen used in the test suite. Would that be acceptable?

I also agree it's a good idea to use a value such as 7. I kept 0 as well, because it's an usual value people would presumably use, and I was interested in testing that particular case myself :) Please, let me know if that's ok; otherwise I'll remove it.

Edit: also added a test for expected results.

YifanShenSZ · 2023-07-25T00:16:05Z

Thanks for the update! This PR looks great to me with only 2 nits:

You can check here for an example using converter_input_type.
You would not have to manually invoke expected_results = model(inputs), because if no expected result provided, we will generate one by calling the torch model

Details on why we want converter_input_type:

The CoreML model's input dtype does not come from PyTorch (since torch.jit.trace does not have that info provided); instead it is specified by converter_input_type (and default to np.float32 if not specified)
Concretely, that means, if you do not specify converter_input_type, then you are always testing such a CoreML model that fills np.float32 value to a np.float32 tensor

pcuenca · 2023-07-25T09:21:06Z

Thanks @YifanShenSZ, I got it now. In my previous failed attempt with converter_input_type I had used input shapes, not data, and conversion failed because it generated float data for me that did not match the input type I supplied.

I also misinterpreted the use of expected_results when quickly stepping through the debugger; my apologies for the confusion.

jakesabathia2 · 2023-07-25T16:48:41Z

Thanks @pcuenca .
Your PR now looks good to me,
let me trigger a CI

jakesabathia2

Awesome @pcuenca !
https://gitlab.com/coremltools1/coremltools/-/pipelines/945053424

jakesabathia2

CI passed

masked_fill: fill value type must match tensor type

d913390

jakesabathia2 requested changes Jul 19, 2023

View reviewed changes

jakesabathia2 requested a review from TobyRoseman July 19, 2023 21:39

Increase test coverage to int inputs, casted values

b2e6196

Make sure results match PyTorch's

cead240

Use converter_input_type

eeb06fb

jakesabathia2 reviewed Jul 25, 2023

View reviewed changes

jakesabathia2 approved these changes Jul 25, 2023

View reviewed changes

jakesabathia2 merged commit e0f8918 into apple:main Jul 25, 2023

pcuenca deleted the masked_fill_value_dtype branch July 25, 2023 18:45

fukatani mentioned this pull request Jul 27, 2023

ValueError: Torch var mask.3 not found in context when convert the Transforms ClipTextModel to coreml #1775

Closed

pcuenca mentioned this pull request Aug 23, 2023

Upgrade coremltools, transformers; remove attn workaround apple/ml-stable-diffusion#241

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

masked_fill: fill value type must match tensor type #1915

masked_fill: fill value type must match tensor type #1915

pcuenca commented Jul 15, 2023

TobyRoseman commented Jul 19, 2023

jakesabathia2 Jul 19, 2023

jakesabathia2 Jul 19, 2023

pcuenca commented Jul 21, 2023 •

edited

Loading

YifanShenSZ commented Jul 25, 2023 •

edited

Loading

pcuenca commented Jul 25, 2023 •

edited

Loading

jakesabathia2 commented Jul 25, 2023

jakesabathia2 left a comment

jakesabathia2 left a comment

masked_fill: fill value type must match tensor type #1915

masked_fill: fill value type must match tensor type #1915

Conversation

pcuenca commented Jul 15, 2023

TobyRoseman commented Jul 19, 2023

jakesabathia2 Jul 19, 2023

Choose a reason for hiding this comment

jakesabathia2 Jul 19, 2023

Choose a reason for hiding this comment

pcuenca commented Jul 21, 2023 • edited Loading

YifanShenSZ commented Jul 25, 2023 • edited Loading

pcuenca commented Jul 25, 2023 • edited Loading

jakesabathia2 commented Jul 25, 2023

jakesabathia2 left a comment

Choose a reason for hiding this comment

jakesabathia2 left a comment

Choose a reason for hiding this comment

pcuenca commented Jul 21, 2023 •

edited

Loading

YifanShenSZ commented Jul 25, 2023 •

edited

Loading

pcuenca commented Jul 25, 2023 •

edited

Loading