Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "implementation consideration" about how out-of-bound indices of Gather/Scatter should be handled #486

Open
huningxin opened this issue Dec 1, 2023 · 5 comments · Fixed by #642
Assignees

Comments

@huningxin
Copy link
Contributor

In proposal Add support for operations needed for well-known transformers , the validation steps of indices operand are defined as:

  1. Let axis be options.axis.
  2. Let axisSize be input.[[descriptor]].dimensions[axis]
  1. For each index → value of indices:
    i. If index is greater than or equal to axisSize, then throw a "DataError" DOMException.

The index value of indices operand may not be available at the graph build time, e.g., when indices is an input operand or an intermediate operand produced by another operator. It should define the run-time behavior if the index value of indices is greater than or equal to axisSize.

In that proposal, @wacky6 also recommended to add a "implementation consideration" (i.e. an informative section) mentioning how out-of-bound indices should be handled by browser vendor and platform implementation with some bullet points:

  • Runtime out-of-bounds indices should be explicitly handled by either the browser implementation or the platform implementation, to avoid OOB memory accesses.
  • If the platform implementation doesn't handle out-of-bounds indices, the browser implementation should take steps to ensure the platform operator doesn't receive out-of-bound indices
  • Mention what caller should expect as a result of list item 2: 0, NaN, first/last indices (if implemented with clamp)

/cc @anssiko @wchao1115 @fdwr

@a-sully
Copy link
Contributor

a-sully commented Mar 11, 2024

This issue came up again in this code review. I'll summarize my thoughts here.

To review, this constraint is included in non-normative text:

indices: ... values must be ... in the range [0, N-1] where N is the size of the input dimension indexed by options.axis

And there are comments in the mojom file like this:

Note the values in indices are computed at runtime, so they can exceed the boundary of the axis dimension of input. If unchecked, such indices will cause out-of-bounds access.
Therefore, the implementation must "sanitize" indices before using it to perform memory addressing operations on input.
For some backends like DirectML, the native ML APIs have already done the "sanitization" so indices can be passed directly to the API. For other backends, the specific "sanitization" relies on the browser implementation.
Typically, the implementation clamps the values in indices to be in range of -N (inclusive) to N (exclusive), where N = input.dimensions[axis], and negative index means indexing from the end of the axis dimension.

The currently-specified behavior of validating the values of indices synchronously (step 7 in the snippet above) does not make sense, since these values are not known until compute() . This must change.

That being said, N is known synchronously. Perhaps we should specify the clamping behavior?

On backends which guarantee this clamping is performed internally (e.g. DML), user agents may be able to elide this step, whereas for other backends the user agent may need to explicitly clamp the values, similar to how user agents need to insert transposes as needed for certain operations. But of course those are just implementation details at that point

I propose we specify that values in indices will be clamped to be in range of -N (inclusive) to N (exclusive), where N = input.dimensions[axis], and negative index means indexing from the end of the axis dimension. This means:

Thoughts?

inexorabletash added a commit to inexorabletash/webnn that referenced this issue Apr 8, 2024
* webmachinelearning#483 points out that indices can't be validated at build-time,
    and clamping behavior with an implementation note is given
    instead.

* Fix a typo in the steps.

* Replace several map-like iterations over lists with list iteration.

Fixes webmachinelearning#486
@inexorabletash inexorabletash self-assigned this Apr 8, 2024
inexorabletash added a commit to inexorabletash/webnn that referenced this issue Apr 8, 2024
* webmachinelearning#486 points out that indices can't be validated at build-time,
    and clamping behavior with an implementation note is given
    instead.

* Fix a typo in the steps.

* Replace several map-like iterations over lists with list iteration.

Fixes webmachinelearning#486
@fdwr
Copy link
Collaborator

fdwr commented Apr 16, 2024

I propose we specify that values in indices will be clamped to be in range of -N (inclusive) to N (exclusive),

👍 In lieu of any platform mitigation, that sounds a reasonable default to me. Though, I wouldn't encourage people to rely on that behavior as part of the algorithm itself and still emphasize the passed indices should be valid.

On backends which guarantee this clamping is performed internally (e.g. DML)

Note the clamping there is just a security mitigation for buffer reads/writes, which ensures the element read is within buffer bounds, and it doesn't clamp garbage indices to the dimension size (like recommended above), but rather clamps the final computed offset to stay within the linear buffer, which is a bit different. One could debate why this choice was made, but it's mainly because it's general logic that applies to multiple different operators that all share the same possibility of reading/writing to arbitrary offsets due to potentially invalid indices, and because it's not really worth optimizing for a degenerate case.

@fdwr fdwr closed this as completed in #642 Apr 16, 2024
fdwr added a commit that referenced this issue Apr 16, 2024
* gather(): Address indices validation and other algorithm nits

* #486 points out that indices can't be validated at build-time,
    and clamping behavior with an implementation note is given
    instead.

* Fix a typo in the steps.

* Replace several map-like iterations over lists with list iteration.

Fixes #486

* Update index.bs

Co-authored-by: Ningxin Hu <ningxin.hu@intel.com>

* Add note about negative indices. fixes #484

* Update index.bs

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update index.bs

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Fix grammar glitch

---------

Co-authored-by: Ningxin Hu <ningxin.hu@intel.com>
Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
@huningxin
Copy link
Contributor Author

There are other gather and scatter operators have been proposed to support new transformer models, #375 (comment), including gatherElements, gatherND, scatterElements and scatterND. For all scatter/gather operators, an indices tensor is used to specify specific index positions within input tensor for reading (gather) or output tensor for writing (scatter).

The out-of-bounds indices issue came up again, for example scatterND code review.

Like the "implementation consideration" of gather, d846723,

The indices parameter to gather() can not be clamped to the allowed range when the graph is built because the inputs are not known until execution. Implementations can introduce clamp() in the compiled graph if the required clamping behavior is not provided by the underlying platform. Similarly, if the underlying platform does not support negative indices, the implementation can introduce operations in the compiled graph to transform a negative index from the end of the dimension into a positive index.

we may need to add similar note or extend the existing one for the new gather and scatter operators.

@huningxin huningxin reopened this Sep 9, 2024
@huningxin huningxin changed the title Add "implementation consideration" about how out-of-bound indices of Gather should be handled Add "implementation consideration" about how out-of-bound indices of Gather/Scatter should be handled Sep 9, 2024
@huningxin
Copy link
Contributor Author

Another open raised by @quidity in code review is whether the safe backend behavior should be specified.

Different platform APIs may have different behaviors:

As @fdwr commented in previous comment and in code review, DML's gather

clamps the final computed offset to stay within the linear buffer

for scatter,

update value is simply not written at all to the output (because it's out of bounds)

CoreML's scatter and gather operators support validate_indices parameter that the default behavior for out-of-bound indices is undefined but memory safe.

TensorFlow's gather_nd:

On GPU, if an out of bound index is found, a 0 is stored in the corresponding output value.

scatter_nd:

On GPU, if an out of bound index is found, the index is ignored.

User agent may unify the out-of-bounds indices handling by "introducing clamp operator in the compiled graph". But that may still produce different results on different hardware, i.e. for scatterND, as @fdwr mentioned

all GPU threads are trying to write to the same final location, but GPU thread writes are not deterministic across hardware.

Given that, should we expect the safe backend behavior as "a backend implementation must guarantee index values do not cause invalid reads outside the input tensor (gather) or writes outside the output tensor (scatter)"? (by extending @fdwr 's proposal)

@a-sully
Copy link
Contributor

a-sully commented Sep 9, 2024

+1 to @quidity's comment:

am I nervous of solutions that might produce different outputs on different backends

I think we're looking at this issue a bit backwards. To quote myself over on #754 (comment):

There's a give-and-take between the abstract nature of a spec and the nitty-gritty details of its implementations. We don't want the spec to prescribe implementation details, but of course the spec is written with implementations in mind and accommodates their constraints.

Our goal should be to specify (unsurprising) behavior that can be implemented consistently across backends. How the backends comply with this spec (e.g. by adding a clamp operator) is an implementation detail. If complying with the spec is onerous, then we should consider changing the spec.

If we specify that indices must be clamped, then DML's default "ignore the index" approach would not be compliant with the spec.

Have we considered specifying the "ignore the index" approach? This seems like it should be implementable on all platforms using the where operator?

all GPU threads are trying to write to the same final location, but GPU thread writes are not deterministic across hardware.

The issue here isn't with clamping, but with collisions (which may occur as a result of clamping). To riff off of @fdwr's example, what is the expected behavior of the following?

// data    = [1, 2, 3, 4, 5, 6, 7, 8]
// indices = [[7], [7], [7], [7]]
// updates = [9, 10, 11, 12]

Possible outcomes:

// output  = [1, 2, 3, 4, 5, 6, 7, 9]   // Should it be this?
// output  = [1, 2, 3, 4, 5, 6, 7, 12]  // Or this? Or some value in between?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants