Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow operations on scalar tensors that are no-ops #794

Open
reillyeon opened this issue Nov 23, 2024 · 5 comments
Open

Disallow operations on scalar tensors that are no-ops #794

reillyeon opened this issue Nov 23, 2024 · 5 comments
Assignees

Comments

@reillyeon
Copy link
Contributor

reillyeon commented Nov 23, 2024

There are a number of operators, such as transpose() and reshape() which are no-ops when applied to a scalar tensor. In addition there are operators like gather() which expect indices but a scalar doesn't have indices.

While implementations can work around this (typically by treating a scalar as a single-element 1D tensor) it seems better to simply disallow these cases which add complexity without adding any expressivity.

@reillyeon
Copy link
Contributor Author

As an additional data point, both LiteRT and Core ML do not directly support applying their transpose operators to a scalar and implementations based on these frameworks will require workarounds. DirectML only supports this operation because transposition is implemented by adjusting the dimensions and strides of a tensor as it is passed between other nodes, which is a no-op for a scalar tensor.

@reillyeon
Copy link
Contributor Author

We recall that @fdwr had a reason why supporting this was useful for developers, but if so then we should include a more complete discussion of tensor operations on scalars in the specification. Right now there's not much mention that a tensor can have a rank of 0.

@fdwr
Copy link
Collaborator

fdwr commented Nov 26, 2024

I have no qualms about disallowing this for gather indices, and I can contently let go of transpose (because one must balance between "what logically should happen to follow consistent semantics and is mathematically sensible" vs "what is practical to implement in the real world"), but reshape as a more fundamental operation I really feel should follow the axiom that you can always reshape a tensor to its own shape. So if you have an input of sizes = [4,3], then it should be legal to reshape it to sizes = [4,3]; and if you have sizes = [], then you should be able to reshape it to sizes = [] - a consistent rule, orthogonal to scalarity or not. Granted, it seems useless at first, but it does happen in higher levels with generic routines that you encounter these kinds of cases.

I don't have easy access to CoreML or TFLiteRT to try, but I'd be surprised if reshape of a scalar didn't work as expected for them, as TF and PT and NumPy all agree (code at bottom)?

Right now there's not much mention that a tensor can have a rank of 0.

Yeah, we could add more mention. reshape is documented to support rank N (thanks to @inexorabletash's #657), which means 0 through the maximum backend limit ...

operand allowed data types allowed ranks
input any N
output same as input newShape's size

...but we could clarify the existing wording "allowed ranks ... are given as an explicit rank (e.g. 1), or N to allow any dimensionality" to explicitly include 0 when defining N. Note the general expectation is that most operators support scalars by default, except for those operators that don't (rather than treating scalars as a special case that only apply to a subset of operators).

Reshape reference code

# pip freeze torch==1.11.0+cpu
import tensorflow as tf

x = tf.constant(42, dtype=tf.float32)
y = tf.reshape(x, [])

print("value:", y)
print("shape:", y.shape)

# value: 42
# shape: []
# pip freeze torch==1.11.0+cpu
import torch

x = torch.tensor(42, dtype=torch.float32)
print("x value:", x)
print("x shape:", x.shape)
print("x dtype:", x.dtype)

y = x.reshape(())
print("y value:", y)
print("y shape:", y.shape)
print("y dtype:", y.dtype)

# x value: tensor(42.)
# x shape: torch.Size([])
# x dtype: torch.float32
# y value: tensor(42.)
# y shape: torch.Size([])
# y dtype: torch.float32
# pip freeze numpy==1.21.6
import numpy
x = numpy.array(42, dtype=numpy.float32)
y = x.reshape([])

print("value:", y)
print("shape:", y.shape)
print("dtype:", y.dtype)

# value: 42.0
# shape: ()
# dtype: float32

@reillyeon
Copy link
Contributor Author

I agree with the argument in favor of reshape(). It was incorrect to refer to it as a no-op since its purpose is to change the shape, rather than the value, of the tensor and changing the shape from a scalar to a single-element tensor is valid.

I'm less sure about other operators. For example gather() probably makes sense if you require that indices is not a scalar as otherwise the output rank calculation would return -1. pad() only makes sense if you implicitly reshape the input to a single-element tensor first.

What I'd like to see are examples of where silently tolerating scalars is useful for developers. This might end up in a similar place as #391.

To start we should go through each case where we allow a rank of N and clarify whether 0 is a valid value of N.

@inexorabletash
Copy link
Member

we could clarify the existing wording "allowed ranks ... are given as an explicit rank (e.g. 1), or N to allow any dimensionality" to explicitly include 0 when defining N

+1 - if this issue turns into a PR we should include that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants