Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

masked_fill: fill value type must match tensor type #1915

Merged
merged 4 commits into from
Jul 25, 2023

Conversation

pcuenca
Copy link
Contributor

@pcuenca pcuenca commented Jul 15, 2023

masked_fill in PyTorch accepts a fill value specified as an int (0, for instance), even if the tensor to fill contains floats. The coremltools select op requires both inputs to have the same type, and conversion fails in this particular case. This PR tries to ensure that the fill value matches the tensor dtype.

For additional context, causal language models in the transformers library recently started using masked_fill to prepare the causal attention masks. Since version 4.30.0, many models contain code such as the following:

https://github.com/huggingface/transformers/blob/main/src/transformers/models/clip/modeling_clip.py#L687

Note the use of 0 as fill value, which causes conversion to fail for all these models because it's interpreted as an int during conversion. Current workarounds include pinning transformers to a previous version (see here, for example). It's also unfortunate that the transformers function that uses this code is not a method (https://github.com/huggingface/transformers/blob/5bb4430edc7df9f9950d412d98bbe505cc4d328b/src/transformers/models/clip/modeling_clip.py#L678), so patching cannot be easily applied.

In summary, this change makes the masked_fill frontend op more similar to PyTorch's, facilitates conversion of transformers models and unlocks the use of the current version of transformers, which will be required for new models being added to the library.

For testing coverage, I simply adapted the existing unit test to use 0 in addition to 10.0.

@TobyRoseman
Copy link
Collaborator

Thanks @pcuenca for another fix. This change looks good to me. I've kicked off a CI Run. Assuming that passes, I will merge this pull request.

)
def test_masked_fill(self, compute_unit, backend):
def test_masked_fill(self, compute_unit, backend, value):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, the testing infra is producing the fp32 input only.
However, in this change, we are missing the case that input is int32, in which the value might be downcast.
Please refer to the converter_input_type argument,
to parametrize the input dtype as well.
Also, it would be better to test the value of [10.3, 7] instead of [10.0, 0].
With 10.3, we can test the case that 10.3 be downcasted to int,
and 7 is in generally a better choice than 0 for the testing purpose,
given that 0 could potentially be in some default value haha 😄

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pcuenca
Copy link
Contributor Author

pcuenca commented Jul 21, 2023

Thanks @jakesabathia2 for your thoughtful review!

I increased test coverage so input values may also be ints, and fill values might have to be downcast. However, I did not follow the converter_input_type method as I couldn't make it work in this case. As I understand it, the inputs are created as floats (as you mentioned) here:

input_data = generate_input_data(input_data, rand_range)
, and the conversion to Core ML would fail here
model_spec, mlmodel, coreml_inputs, coreml_results = convert_and_compare(
because the Core ML TensorType would not match the actual tensors.

Instead, I created the inputs inside the test using the appropriate dtype, which is another method I've seen used in the test suite. Would that be acceptable?

I also agree it's a good idea to use a value such as 7. I kept 0 as well, because it's an usual value people would presumably use, and I was interested in testing that particular case myself :) Please, let me know if that's ok; otherwise I'll remove it.

Edit: also added a test for expected results.

@YifanShenSZ
Copy link
Collaborator

YifanShenSZ commented Jul 25, 2023

Thanks for the update! This PR looks great to me with only 2 nits:

  1. You can check here for an example using converter_input_type.
  2. You would not have to manually invoke expected_results = model(inputs), because if no expected result provided, we will generate one by calling the torch model

Details on why we want converter_input_type:

  1. The CoreML model's input dtype does not come from PyTorch (since torch.jit.trace does not have that info provided); instead it is specified by converter_input_type (and default to np.float32 if not specified)
  2. Concretely, that means, if you do not specify converter_input_type, then you are always testing such a CoreML model that fills np.float32 value to a np.float32 tensor

@pcuenca
Copy link
Contributor Author

pcuenca commented Jul 25, 2023

Thanks @YifanShenSZ, I got it now. In my previous failed attempt with converter_input_type I had used input shapes, not data, and conversion failed because it generated float data for me that did not match the input type I supplied.

I also misinterpreted the use of expected_results when quickly stepping through the debugger; my apologies for the confusion.

@jakesabathia2
Copy link
Collaborator

Thanks @pcuenca .
Your PR now looks good to me,
let me trigger a CI

Copy link
Collaborator

@jakesabathia2 jakesabathia2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@jakesabathia2 jakesabathia2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants