Add "implementation consideration" about how out-of-bound indices of Gather/Scatter should be handled #486

huningxin · 2023-12-01T03:03:20Z

In proposal Add support for operations needed for well-known transformers , the validation steps of indices operand are defined as:

Let axis be options.axis.

Let axisSize be input.[[descriptor]].dimensions[axis]

For each index → value of indices:
i. If index is greater than or equal to axisSize, then throw a "DataError" DOMException.

The index value of indices operand may not be available at the graph build time, e.g., when indices is an input operand or an intermediate operand produced by another operator. It should define the run-time behavior if the index value of indices is greater than or equal to axisSize.

In that proposal, @wacky6 also recommended to add a "implementation consideration" (i.e. an informative section) mentioning how out-of-bound indices should be handled by browser vendor and platform implementation with some bullet points:

Runtime out-of-bounds indices should be explicitly handled by either the browser implementation or the platform implementation, to avoid OOB memory accesses.

If the platform implementation doesn't handle out-of-bounds indices, the browser implementation should take steps to ensure the platform operator doesn't receive out-of-bound indices

Mention what caller should expect as a result of list item 2: 0, NaN, first/last indices (if implemented with clamp)

/cc @anssiko @wchao1115 @fdwr

The text was updated successfully, but these errors were encountered:

a-sully · 2024-03-11T19:16:27Z

This issue came up again in this code review. I'll summarize my thoughts here.

To review, this constraint is included in non-normative text:

indices: ... values must be ... in the range [0, N-1] where N is the size of the input dimension indexed by options.axis

And there are comments in the mojom file like this:

Note the values in indices are computed at runtime, so they can exceed the boundary of the axis dimension of input. If unchecked, such indices will cause out-of-bounds access.
Therefore, the implementation must "sanitize" indices before using it to perform memory addressing operations on input.
For some backends like DirectML, the native ML APIs have already done the "sanitization" so indices can be passed directly to the API. For other backends, the specific "sanitization" relies on the browser implementation.
Typically, the implementation clamps the values in indices to be in range of -N (inclusive) to N (exclusive), where N = input.dimensions[axis], and negative index means indexing from the end of the axis dimension.

The currently-specified behavior of validating the values of indices synchronously (step 7 in the snippet above) does not make sense, since these values are not known until compute() . This must change.

That being said, N is known synchronously. Perhaps we should specify the clamping behavior?

On backends which guarantee this clamping is performed internally (e.g. DML), user agents may be able to elide this step, whereas for other backends the user agent may need to explicitly clamp the values, similar to how user agents need to insert transposes as needed for certain operations. But of course those are just implementation details at that point

I propose we specify that values in indices will be clamped to be in range of -N (inclusive) to N (exclusive), where N = input.dimensions[axis], and negative index means indexing from the end of the axis dimension. This means:

gather() can synchronously validate its inputs
We avoid the need for special validation checks in compute() or build()
We do not expose implementation-specific details to the web (which @fdwr hints at in A conformance suite with disallowed intra-op examples would be helpful for hardening. #244 (comment))

Thoughts?

* webmachinelearning#483 points out that indices can't be validated at build-time, and clamping behavior with an implementation note is given instead. * Fix a typo in the steps. * Replace several map-like iterations over lists with list iteration. Fixes webmachinelearning#486

* webmachinelearning#486 points out that indices can't be validated at build-time, and clamping behavior with an implementation note is given instead. * Fix a typo in the steps. * Replace several map-like iterations over lists with list iteration. Fixes webmachinelearning#486

fdwr · 2024-04-16T06:49:54Z

I propose we specify that values in indices will be clamped to be in range of -N (inclusive) to N (exclusive),

👍 In lieu of any platform mitigation, that sounds a reasonable default to me. Though, I wouldn't encourage people to rely on that behavior as part of the algorithm itself and still emphasize the passed indices should be valid.

On backends which guarantee this clamping is performed internally (e.g. DML)

Note the clamping there is just a security mitigation for buffer reads/writes, which ensures the element read is within buffer bounds, and it doesn't clamp garbage indices to the dimension size (like recommended above), but rather clamps the final computed offset to stay within the linear buffer, which is a bit different. One could debate why this choice was made, but it's mainly because it's general logic that applies to multiple different operators that all share the same possibility of reading/writing to arbitrary offsets due to potentially invalid indices, and because it's not really worth optimizing for a degenerate case.

* gather(): Address indices validation and other algorithm nits * #486 points out that indices can't be validated at build-time, and clamping behavior with an implementation note is given instead. * Fix a typo in the steps. * Replace several map-like iterations over lists with list iteration. Fixes #486 * Update index.bs Co-authored-by: Ningxin Hu <ningxin.hu@intel.com> * Add note about negative indices. fixes #484 * Update index.bs Co-authored-by: Dwayne Robinson <dwayner@microsoft.com> * Update index.bs Co-authored-by: Dwayne Robinson <dwayner@microsoft.com> * Fix grammar glitch --------- Co-authored-by: Ningxin Hu <ningxin.hu@intel.com> Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

huningxin · 2024-09-09T05:54:25Z

There are other gather and scatter operators have been proposed to support new transformer models, #375 (comment), including gatherElements, gatherND, scatterElements and scatterND. For all scatter/gather operators, an indices tensor is used to specify specific index positions within input tensor for reading (gather) or output tensor for writing (scatter).

The out-of-bounds indices issue came up again, for example scatterND code review.

Like the "implementation consideration" of gather, d846723,

The indices parameter to gather() can not be clamped to the allowed range when the graph is built because the inputs are not known until execution. Implementations can introduce clamp() in the compiled graph if the required clamping behavior is not provided by the underlying platform. Similarly, if the underlying platform does not support negative indices, the implementation can introduce operations in the compiled graph to transform a negative index from the end of the dimension into a positive index.

we may need to add similar note or extend the existing one for the new gather and scatter operators.

huningxin · 2024-09-09T06:28:01Z

Another open raised by @quidity in code review is whether the safe backend behavior should be specified.

Different platform APIs may have different behaviors:

As @fdwr commented in previous comment and in code review, DML's gather

clamps the final computed offset to stay within the linear buffer

for scatter,

update value is simply not written at all to the output (because it's out of bounds)

CoreML's scatter and gather operators support validate_indices parameter that the default behavior for out-of-bound indices is undefined but memory safe.

TensorFlow's gather_nd:

On GPU, if an out of bound index is found, a 0 is stored in the corresponding output value.

scatter_nd:

On GPU, if an out of bound index is found, the index is ignored.

User agent may unify the out-of-bounds indices handling by "introducing clamp operator in the compiled graph". But that may still produce different results on different hardware, i.e. for scatterND, as @fdwr mentioned

all GPU threads are trying to write to the same final location, but GPU thread writes are not deterministic across hardware.

Given that, should we expect the safe backend behavior as "a backend implementation must guarantee index values do not cause invalid reads outside the input tensor (gather) or writes outside the output tensor (scatter)"? (by extending @fdwr 's proposal)

a-sully · 2024-09-09T14:59:19Z

+1 to @quidity's comment:

am I nervous of solutions that might produce different outputs on different backends

I think we're looking at this issue a bit backwards. To quote myself over on #754 (comment):

There's a give-and-take between the abstract nature of a spec and the nitty-gritty details of its implementations. We don't want the spec to prescribe implementation details, but of course the spec is written with implementations in mind and accommodates their constraints.

Our goal should be to specify (unsurprising) behavior that can be implemented consistently across backends. How the backends comply with this spec (e.g. by adding a clamp operator) is an implementation detail. If complying with the spec is onerous, then we should consider changing the spec.

If we specify that indices must be clamped, then DML's default "ignore the index" approach would not be compliant with the spec.

Have we considered specifying the "ignore the index" approach? This seems like it should be implementable on all platforms using the where operator?

all GPU threads are trying to write to the same final location, but GPU thread writes are not deterministic across hardware.

The issue here isn't with clamping, but with collisions (which may occur as a result of clamping). To riff off of @fdwr's example, what is the expected behavior of the following?

// data    = [1, 2, 3, 4, 5, 6, 7, 8]
// indices = [[7], [7], [7], [7]]
// updates = [9, 10, 11, 12]

Possible outcomes:

// output  = [1, 2, 3, 4, 5, 6, 7, 9]   // Should it be this?
// output  = [1, 2, 3, 4, 5, 6, 7, 12]  // Or this? Or some value in between?

huningxin mentioned this issue Dec 1, 2023

Add support for operations needed for well-known transformers e.g. Segment Anything, Stable Diffusion, etc. #478

Merged

wacky6 mentioned this issue Dec 11, 2023

implement gather webmachinelearning/webnn-baseline#61

Merged

inexorabletash mentioned this issue Feb 6, 2024

Process: Add documentation for labels, current and proposed #533

Merged

3 tasks

anssiko added the operator specific label Feb 7, 2024

inexorabletash self-assigned this Apr 8, 2024

inexorabletash mentioned this issue Apr 8, 2024

gather(): Address indices validation and other algorithm nits #642

Merged

fdwr closed this as completed in #642 Apr 16, 2024

huningxin reopened this Sep 9, 2024

huningxin changed the title ~~Add "implementation consideration" about how out-of-bound indices of Gather should be handled~~ Add "implementation consideration" about how out-of-bound indices of Gather/Scatter should be handled Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add "implementation consideration" about how out-of-bound indices of Gather/Scatter should be handled #486

Add "implementation consideration" about how out-of-bound indices of Gather/Scatter should be handled #486

huningxin commented Dec 1, 2023

a-sully commented Mar 11, 2024

fdwr commented Apr 16, 2024 •

edited

Loading

huningxin commented Sep 9, 2024

huningxin commented Sep 9, 2024

a-sully commented Sep 9, 2024

Add "implementation consideration" about how out-of-bound indices of Gather/Scatter should be handled #486

Add "implementation consideration" about how out-of-bound indices of Gather/Scatter should be handled #486

Comments

huningxin commented Dec 1, 2023

a-sully commented Mar 11, 2024

fdwr commented Apr 16, 2024 • edited Loading

huningxin commented Sep 9, 2024

huningxin commented Sep 9, 2024

a-sully commented Sep 9, 2024

fdwr commented Apr 16, 2024 •

edited

Loading