-
Notifications
You must be signed in to change notification settings - Fork 659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
masked_fill: fill value type must match tensor type #1915
Conversation
) | ||
def test_masked_fill(self, compute_unit, backend): | ||
def test_masked_fill(self, compute_unit, backend, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, the testing infra is producing the fp32 input only.
However, in this change, we are missing the case that input is int32
, in which the value
might be downcast.
Please refer to the converter_input_type
argument,
to parametrize the input dtype as well.
Also, it would be better to test the value of [10.3, 7]
instead of [10.0, 0]
.
With 10.3
, we can test the case that 10.3
be downcasted to int
,
and 7
is in generally a better choice than 0
for the testing purpose,
given that 0
could potentially be in some default value haha 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/apple/coremltools/blob/main/coremltools/converters/mil/frontend/torch/test/test_torch_ops.py#L7030
An example of using converter_input_type
Thanks @jakesabathia2 for your thoughtful review! I increased test coverage so input values may also be
TensorType would not match the actual tensors.
Instead, I created the inputs inside the test using the appropriate I also agree it's a good idea to use a value such as Edit: also added a test for expected results. |
Thanks for the update! This PR looks great to me with only 2 nits:
Details on why we want
|
Thanks @YifanShenSZ, I got it now. In my previous failed attempt with I also misinterpreted the use of |
Thanks @pcuenca . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI passed
masked_fill
in PyTorch accepts a fill value specified as anint
(0
, for instance), even if the tensor to fill contains floats. The coremltoolsselect
op requires both inputs to have the same type, and conversion fails in this particular case. This PR tries to ensure that the fill value matches the tensor dtype.For additional context, causal language models in the
transformers
library recently started usingmasked_fill
to prepare the causal attention masks. Since version 4.30.0, many models contain code such as the following:https://github.com/huggingface/transformers/blob/main/src/transformers/models/clip/modeling_clip.py#L687
Note the use of
0
as fill value, which causes conversion to fail for all these models because it's interpreted as anint
during conversion. Current workarounds include pinning transformers to a previous version (see here, for example). It's also unfortunate that the transformers function that uses this code is not a method (https://github.com/huggingface/transformers/blob/5bb4430edc7df9f9950d412d98bbe505cc4d328b/src/transformers/models/clip/modeling_clip.py#L678), so patching cannot be easily applied.In summary, this change makes the
masked_fill
frontend op more similar to PyTorch's, facilitates conversion of transformers models and unlocks the use of the current version oftransformers
, which will be required for new models being added to the library.For testing coverage, I simply adapted the existing unit test to use
0
in addition to10.0
.