Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argMax/Min only support scalar axis in TFLite runtime #629

Closed
fujunwei opened this issue Apr 1, 2024 · 11 comments · Fixed by #724
Closed

argMax/Min only support scalar axis in TFLite runtime #629

fujunwei opened this issue Apr 1, 2024 · 11 comments · Fixed by #724

Comments

@fujunwei
Copy link

fujunwei commented Apr 1, 2024

The axis is a scalar constraint in arg_min_max::Prepare() function here third_party/tflite/src/tensorflow/lite/kernels/arg_min_max.cc.

@philloooo
Copy link
Contributor

Looking at other backends and frameworks - tensorflow, pytorch, coreml, onnx all only support scalar axis.

Only directml supports providing multiple axes, from the documentation it seems like if all axes are provided, it reduces to a single value, when it does that, the indices are calculated as if it's a flat 1d array. I don't know what happens if you give it multiple but not all axes, e.g. for a 4D array, give axes=[1,3] how does it determines the indices across dimensions if they were to be reduced together?

In general it seems strange to get a min/max index across dimensions(if that's the expected behavior). Should we consider only supporting scalar axis?

If we keep the support for axes array, we should clarify the behavior of how reduction is done when multiple dimensions are specified.

@a-sully
Copy link
Contributor

a-sully commented Apr 22, 2024

Since ONNX also only supports a scalar axis I wouldn't be surprised if DML's multiple-axis feature is not actually used at all by WebNN currently. This is similar to the problem @fdwr explored in #645 (comment) - DML supports steepness for softplus(), but ORT doesn't, so the WebNN EP doesn't take advantage of steepness

In general, I don't think we should be plumbing through DML-specific quirks to WebNN. I'm in favor of aligning with the other frameworks and only supporting scalar axis.

@a-sully
Copy link
Contributor

a-sully commented Apr 22, 2024

The issues you linked to above are about softmax(), which currently lacks any axis parameter at all (and adding it SGTM 👍). In this case we're exploring whether to turn axes (sequence) -> axis (scalar). Just to clarify, are you suggesting that you're in favor of removing multi-axes support from all operators? Just this one? Something else? :)

@philloooo
Copy link
Contributor

philloooo commented Jul 10, 2024

@fdwr @huningxin gentle ping on this, I still propose to support only scalar axis.

Further, the current fallback behavior when axes is not provided is :

If not present, all dimensions are reduced. If empty, no dimensions are reduced, and the shape of the output tensor is the same as the shape of the input tensor.

If we change to scalar axis, we would only specify if not present behavior. And I propose to default it to 0 instead of reduce all dimensions. This align with ONNX and tensorflow and is more straightforward.

WDYT?

@huningxin
Copy link
Contributor

@philloooo

I still propose to support only scalar axis.

SGTM!

If we change to scalar axis, we would only specify if not present behavior. And I propose to default it to 0 instead of reduce all dimensions. This align with ONNX and tensorflow and is more straightforward.

Pytorch reduces all if axis is not present.

dim (int) – the dimension to reduce. If None, the argmax of the flattened input is returned.

But I guess this behavior can be emulated by reshaping the tensor to 1-D and reducing along axis 0.

So, +1 to default it to 0.

@fdwr
Copy link
Collaborator

fdwr commented Jul 11, 2024

Will reply here after stack lighter
gentle ping on this

Sorry for the delay Phillis.

Regardless of our final decisions here, I want to first explain how multidimensional min/max index reduction works, and also rationalize (when you look more holistically across the related functions) why ND argMin/argMax is a DML-specific quirk intra-family consistent behavior that is elegantly generic 😉, then get to backend impact and API proposal.

ND argMin/argMax examples to illustrate indexing

if you give it multiple but not all axes, e.g. for a 4D array, give axes=[1,3] how does it determines the indices across dimensions if they were to be reduced together?

Here are some reduction indexing examples:

Adjacent axes, 2D within 4D:

input.shape = [2,3,4,5]
# omitting input values since they would be numerous
reduction axes = [1,2]
output.shape = [2,1,1,5] # keeping dimensions
element indices of the [_,3,4,_] subset:
    [
        [0,1,2,3],
        [4,5,6,7],
        [8,9,10,11],
    ]

Non-adjacent axes, 2D within 4D:

input.shape = [2,3,4,5]
reduction axes = [1,3]
output.shape = [2,1,4,1]
element indices of the [_,3,_,5] subset:
    [
        [0,1,2,3,4],
        [5,6,7,8,9],
        [10,11,12,13,14],
    ]

Single axis in 4D:

input.shape = [2,3,4,5]
reduction axes = [3]
output.shape = [2,3,4,1]
element indices of the [_,_,_,5] subset:
    [0,1,2,3,4]

All axes reduced 4D:

input.shape = [2,3,4,5]
reduction axes = [0,1,2,3]
output.shape = [1,1,1,1]
element indices of the [2,3,4,5] subset:
    [[[[0,1,2,3,4],[5,6,7,8,9]... [...119]]]]

No axes reduced (nop):

input.shape = [2,3,4,5]
reduction axes = [] # explicitly present but empty
output.shape = [2,3,4,5]
element indices of the [_,_,_,_] subset:
    0 # a single scalar, effectively identity and same behavior is reduceMin with axes=[]

So it's as if you took a slice from the input using the reduced axes, then ordered element indices linearly.

Notice if you transpose those previously non-adjacent axes from above toward the back, you still get the same element indices within that reduced slice. e.g.:

input.shape = [2,4,3,5] # Transposed from [2,3,4,5] above
reduction axes = [2,3]
output.shape = [2,4,1,1]
element indices of the [_,_3,5] subset:
    [
        [0,1,2,3,4],
        [5,6,7,8,9],
        [10,11,12,13,14],
    ]

And if you flatten those last two axes to a single axis, you still get the same indices:

input.shape = [2,4,15]
reduction axes = [2]
output shape = [2,4,1]
indices of the [_,_15] subset:
    [0,1,2,3,4, 5,6,7,8,9, 10,11,12,13,14]

Functional equivalence of capability

The general argMin/argMax ND is the superset form, and the single-axis form is a simpler subset of it, but conversely, a single-axis parameterization has equivalent capability to the ND form when combined with a transpose/reshape, and thus the ND form skips extra steps (like squashing the width and height or other spatial dimensions, which the caller would otherwise have to explicitly do) by operating on the tensor directly. So, backends like TF and CoreML are fully capable of implementing WebNN's current form. e.g. pseudocode:

func ImplementNdArgMinViaSingleAxisBackend(input, axes)
{
    if (axes.size == 1) { // reduce single axis - just ferry axis along
        reducedInput = backend.argmin(input, axis=axes[0])
    } else if (axes not defined) { // reduce all
        reshapedInput = backend.reshape(input, [input.size]);
        argMinResult = backend.argmin(reshapedInput, axis=0);
        reducedInput = backend.reshape(argMinResult, input.shape)
    } else { // axes.size > 1
        permutation = MakePermutationThatPutsAxesAtBack(input.rank, axes);
        reversePermutation = MakePermutationThatRestoresAxes(input.rank, axes);
        flattenedShape = MakeFlattenedShape(input.shape, axes);
        unflattenedShape = MakeUnflattenedShape(input.shape, axes);
    
        reshapedInput = backend.reshape(backend.transpose(input, permutation), flattenedShape);
        argMinResult = backend.argmin(reshapedInput, axis=[flattenedShape.size]);
        reducedInput = backend.transpose(backend.reshape(argMinResult, unflattenedShape), reversePermutation);
    }
    return reducedInput;
}

Front end/backend/ impact

Front-end callers have no challenge with the current ND design, because if coming from a single axis world, it's simply:

webnnGraphBuilder.argMin(input, {axes: [axis]});

Single-axis backend implementations have more of a challenge, but it's certainly possible (see ImplementNdArgMinViaSingleAxisBackend). Granted, that last code block adds a number of additional operations for a case that is unlikely to come in from callers anyway (where axes.size > 1 but less than rank), and so it's understandable from the backend implementation impact POV for TFLite/CoreML to want to avoid it. For the existing DML backend, it makes no difference either way because the single axis form is a subset of ND form.

Default axes when absent

Pytorch reduces all if axis is not present.

Yep, NumPy too. When undefined, they act as if all axes were listed (e.g. for 4D, axes = [0,1,2,3]).

And I propose to default it to 0 instead of reduce all dimensions.

🤔 That would be inconsistent from the other reduction functions like reduceMin/reduceMax, and intra-API consistency has value too; but outside WebNN, there is no clearcut winner for omitted axes. Defaults include:

  • NumPy - all axes reduced (axes=[0,1,2, ... rank-1])
  • PyTorch - all axes reduced (axes=[0,1,2, ... rank-1])
  • TF - first axis reduced (axes=[0])
  • ONNX - first axis reduced (axes=[0])
  • CoreML - last axis reduced (axes=[rank-1])

In cases of such ambiguity, rather than favoring one, maybe we should require an explicit value. WebNN should generally be explicit anyway.

References

NumPy

import numpy
x = numpy.array([[1,5,2],[3,4,6]], dtype=numpy.int64)
y = numpy.argmax(x, keepdims=True)
print("value:\n", y, sep="")
print("shape:", y.shape)
print("dtype:", y.dtype)
# value: [[5]]
# shape: (1, 1)
# dtype: int64

PyTorch

import torch
x = torch.tensor([[1,5,2],[3,4,6]], dtype=torch.float32)
y = tensorflow.argmax(x, keepdim=True)
print("value:", y)
print("shape:", y.shape)
print("dtype:", y.dtype)
# value: tensor([[5]])
# shape: torch.Size([1, 1])
# dtype: torch.int64

TensorFlow

import tensorflow
x = tensorflow.constant([[1,5,2],[3,4,6]], dtype=tensorflow.int64)
y = tensorflow.reshape(tensorflow.argmax(x), [1,3])
print("value:", y)
print("shape:", y.shape)
print("dtype:", y.dtype)
# value: tf.Tensor([[1 0 1]], shape=(1, 3), dtype=int64)
# shape: (1, 3)
# dtype: <dtype: 'int64'>

ONNX

https://onnx.ai/onnx/operators/onnx__ArgMin.html

axis - INT (default is '0')

CoreML

https://apple.github.io/coremltools/source/coremltools.converters.mil.mil.ops.defs.html#coremltools.converters.mil.mil.ops.defs.iOS15.reduction.reduce_argmin

axis: const<i32> (Optional)
The dimension to reduce. Default is -1.

Effective pseudocode for each:

PyTorch and NumPy pseudocode

func argMinSingleAxisForPyTorchAndNumPy(input, axis)
    if axis defined
        argMinNd([axis])
    else
        axes = list(range(input.rank))
        argMinNd(axes)
    endif
endfunc

TensorFlow and ONNX pseudocode

func argMinSingleAxisForTensorFlowAndOnnx(input, axis)
    if axis defined
        argMinNd([axis])
    else
        argMinNd([0])
    endif
endfunc

CoreML pseudocode

func argMinSingleAxisForCoreML(input, axis)
    if axis defined
        argMinNd([axis])
    else
        argMinNd([input.rank - 1])
    endif
endfunc

Final verdict

All that said, you'll see my original proposal had a single axis, and I even defaulted it to 0 too 😅...

dictionary MLArgMinMaxOptions {
  unsigned long axis = 0;
  boolean keepDimensions = false;
  boolean selectLastIndex = false; // selects the *last* index rather than the first find along axis
  //  NAMING: Maybe an enum for scanDirection or tieBreakerFirst/Last would be clearer?
};

...and so I'm obviously not that opposed to single axis. Though, if we're now thinking of changing axes to an axis and diverging from the other reduction ops, then I'm thinking we should require an explicit axis due to the platform differences above rather than just defaulting to 0, which seems a weird default anyway (because why would you want to default to finding the min/max across batches?). Perusing a dozen models I have locally with argMax in them, I see 9 used axis=1, 2 used axis=-1, and 1 used axis=0. So if we were to pick a default, axis=1 might make more sense, but I'd still rather just be explicit at this API level.

@philloooo
Copy link
Contributor

@fdwr Thanks! I understand that the multi axes scenario can be emulated. Thanks for providing the perspective of looking at it together with other reduction methods.

Granted, that last code block adds a number of additional operations for a case that is unlikely to come in from callers anyway (where axes.size > 1 but less than rank), and so it's understandable from the backend implementation impact POV for TFLite/CoreML to want to avoid it.

Yeah that's my perspective. If we don't have such use cases, it doesn't seem worth it to support it, given that both tflite/coreml need to emulate it.

On the default value, given your reasoning, not having a default value sounds good to me.

@huningxin
Copy link
Contributor

@fdwr , thanks for the details!

I'm thinking we should require an explicit axis due to the platform differences above rather than just defaulting to 0

Requiring an explicit axis SGTM.

@fdwr
Copy link
Collaborator

fdwr commented Jul 12, 2024

Comparing with softmax...

partial interface MLGraphBuilder {
  MLOperand softmax(MLOperand input, unsigned long axis);
};

...any preferences for:

dictionary MLArgMinMaxOptions {
  unsigned long axis;
  boolean keepDimensions = false;
  MLOperandDataType outputDataType; // https://github.com/webmachinelearning/webnn/issues/653
};
...
MLOperand argMin(MLOperand input, optional MLArgMinMaxOptions options = {});

vs

dictionary MLArgMinMaxOptions {
  boolean keepDimensions = false;
  MLOperandDataType outputDataType; // https://github.com/webmachinelearning/webnn/issues/653
};
...
MLOperand argMin(MLOperand input, unsigned long axis, optional MLArgMinMaxOptions options = {});

?

I recall reading that required parameters are better as explicit parameters, rather than dictionary members, but I'm not finding that verbiage in the spec 🤔.

@huningxin
Copy link
Contributor

@fdwr , IIUC, dictionary members are optional, if axis doesn't have a default value, I suppose it should be a parameter, so that the latter is probably preferred.

MLOperand argMin(MLOperand input, unsigned long axis, optional MLArgMinMaxOptions options = {});

@huningxin
Copy link
Contributor

Once the axis is required, the no-op behavior would be unsupported as well

If empty, no dimensions are reduced, and the shape of the output tensor is the same as the shape of the input tensor.

And the scalar input should also be unsupported, because it would fail axis >= input_rank check.

CoreML doesn't support scalar input (refer to @philloooo 's CL).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants