Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Simplify bounds checks for multi-dimensional array accesses #18064

Merged
merged 1 commit into from
Aug 23, 2016

Conversation

MatthiasJReisinger
Copy link
Contributor

@MatthiasJReisinger MatthiasJReisinger commented Aug 16, 2016

This simplifies the array bounds check code that is emitted for
multi-dimensional array accesses that use "regular" indexing, i.e.
accesses of the form A[i1,i2,...,iN] to some N-dimensional array
A.

For example, with this change, the access A[i,j,k] to an array
with the three dimensions m, n and o now leads to bounds checks
that correspond to the following pseudo code:

if (i >= m)
  out_of_bounds_error();
else if (j >= n)
  out_of_bounds_error();
else if (k >= o)
  out_of_bounds_error();

So far, the following more complicated bounds checks would have been
emitted:

if (i >= m)
  out_of_bounds_error();
else if (j >= n)
  out_of_bounds_error();
else if (((k * n + j) * m + i) >= m * n * o)
  out_of_bounds_error();

Julia also allows one-dimensional and "partial" linear indexing
(see #14770), i.e. the number of indices used to access an array does
not have to match the actual number of dimensions of the accessed array.
For this case we still have use this old scheme.

One motivation for this change was the following: expressions like
((k * n + j) * m + i) are non-affine and Polly would not be able
to analyze them. This change therefore also facilitates Polly's bounds
check elimination logic, which would hoist such checks out of loops
or may remove them entirely where possible.

@blakejohnson
Copy link
Contributor

The existing code is kind of strange. What do we get by checking the total linearized index instead of just checking the last dimension?

@MatthiasJReisinger
Copy link
Contributor Author

MatthiasJReisinger commented Aug 17, 2016

I think that we could save a load with the linearized check because we do not need to access the last array dimension (m * n * o isn't actually computed but loaded from memory). But I don't know if that's the real reason. Sorry, that thought was contradictory, since m * n * o has to be loaded too we don't save a load.

@timholy
Copy link
Member

timholy commented Aug 17, 2016

@blakejohnson, that's probably "partial linear indexing" (see #14770).

@tobiasgrosser
Copy link
Contributor

Thinking about this partial indexing, we should be able to handle this case without special-casing Polly by generating the last check as:

if (last_index >= product_of_remaining_dimensions())
  out_of_bound_error();

In case dimensions and indexes match, we get the simple checks Matthias is looking for. In case of partial linear indexing, we still get checks that ensure we do not access outside of allocated memory. This would make Polly work in the common case without partial indexing, but partial indexing would continue to work.

@timholy
Copy link
Member

timholy commented Aug 17, 2016

Since there seems to be a consensus about getting rid of partial indexing (just needs someone to actually implement the change), another option would be to say you can't use it in @polly-annotated blocks.

But if it's easy to keep, that would be fine, too.

@MatthiasJReisinger
Copy link
Contributor Author

Thank you for your comments! @TobiG With this approach we would have one load for each remaining dimension, whereas with the existing linearized check we only have a single one to retrieve the linearized array length (I hope I didn't mix that up again). Would it be okay to accept this? I just wanted to ask to be one the safe side.

@tobiasgrosser
Copy link
Contributor

On Wed, Aug 17, 2016, at 09:36 PM, Matthias J. Reisinger wrote:

Thank you for your comments! @TobiG With this approach we would have one
load for each remaining dimension, whereas with the existing linearized
check we only have a single one to retrieve the linearized array length
(I hope I didn't mix that up again). Would it be okay to accept this? I
just wanted to ask to be one the safe side.

Right. I personally would be OK with this, but I am not part of the
Yulia side. However, I realized that this might not work. Do we actually
have the remaining dimensions? AFAIK the array only saves the full
length.

Best,
Tobias

@blakejohnson
Copy link
Contributor

Do we actually have the remaining dimensions?

Yes, emit_arraysize can be used to look up the dimensions.

@ViralBShah ViralBShah added the compiler:codegen Generation of LLVM IR and native code label Aug 18, 2016
@MatthiasJReisinger
Copy link
Contributor Author

Thank you for your thoughts again! I now updated the PR in the following way: As soon as we have an array access that uses (1D or partial) linear indexing we still use the existing bounds checking, but in the case of regular indexing the simpler check is used (to be on the safe side here: is the check nidxs < nd sufficient to determine the presence of partial indexing?). That also allows us to remove the special-handling for Polly and still gives us the simpler bounds checks. Then, for code parts that use @polly it would be sufficient to require the absence of partial indexing as @timholy suggested.

Do you think this is the direction we could choose?

builder.CreateCondBr(builder.CreateICmpULT(i, alen), endBB, failBB);
} else {
// We have already emitted a bounds check for each index except for
// the last one which we therefore emit here, i.e.
Copy link
Contributor

@tobiasgrosser tobiasgrosser Aug 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence matches both branches and should be moved before the if (bc). Please also add a comment discussing the difference of the two checks and why each is used. Something like:

For the last dimension there are two options of how to emit bounds checks. In case only a single dimension remains (the number of subscript expressions in the array matches the number of array dimensions), we check if the outermost subscript is smaller than the size of the outermost dimension. In case more than one array dimension is left when we reach the last subscript (partial indexing), we verify if the computed element address is within the bounds of the allocated memory range.

@tobiasgrosser
Copy link
Contributor

Hi Matthias,

thanks for the update. I like this change a lot. Two remarks:

  1. The first commit in this PR seems now unnecessary.

  2. The commit message in the second commit is not fully up-to-date. It currently states we use simpler checks when Polly is used, but we now generate them as soon as no partial indexing is used. This means, Polly is now only a motivating use for this change.

@MatthiasJReisinger MatthiasJReisinger changed the title WIP: Emit simpler array bounds checks when using Polly WIP: Simplify bounds checks for multi-dimensional array accesses Aug 18, 2016
@MatthiasJReisinger
Copy link
Contributor Author

Thank you for the feedback! I addressed it in the updated PR.

Best,
Matthias

@tobiasgrosser
Copy link
Contributor

This looks great from my side. What do Julia people think?

This simplifies the array bounds check code that is emitted for
multi-dimensional array accesses that use "regular" indexing, i.e.
accesses of the form `A[i1,i2,...,iN]` to some `N`-dimensional array
`A`.

For example, with this change, the access `A[i,j,k]` to an array
with the three dimensions `m`, `n` and `o` now leads to bounds checks
that correspond to the following pseudo code:

```
if (i >= m)
  out_of_bounds_error();
else if (j >= n)
  out_of_bounds_error();
else if (k >= o)
  out_of_bounds_error();
```

So far, the following more complicated bounds checks would have been
emitted:

```
if (i >= m)
  out_of_bounds_error();
else if (j >= n)
  out_of_bounds_error();
else if (((k * n + j) * m + i) >= m * n * o)
  out_of_bounds_error();
```

Julia also allows one-dimensional and "partial" linear indexing
(see JuliaLang#14770), i.e. the number of indices used to access an array does
not have to match the actual number of dimensions of the accessed array.
For this case we still have use this old scheme.

One motivation for this change was the following: expressions like
`((k * n + j) * m + i)` are non-affine and Polly would not be able
to analyze them. This change therefore also facilitates Polly's bounds
check elimination logic, which would hoist such checks out of loops
or may remove them entirely where possible.
@MatthiasJReisinger MatthiasJReisinger changed the title WIP: Simplify bounds checks for multi-dimensional array accesses RFC: Simplify bounds checks for multi-dimensional array accesses Aug 22, 2016
@vtjnash
Copy link
Member

vtjnash commented Aug 22, 2016

lgtm. this seems like a good find and a good fix. I'll merge if nanosoldier agrees (which I expect it will).

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Something went wrong when running your job: RemoteException(3,CapturedException(NanosoldierError: failed to run benchmarks against primary commit: ErrorException("failed process: Process(make, ProcessExited(2)) [2]"),Any[( in pipeline_error at process.jl:612,1),( in run at process.jl:588,1),( in build_julia! at build.jl:54,1),( in execute_benchmarks! at BenchmarkJob.jl:251,1),( in run at BenchmarkJob.jl:178,1),( in #503 at multi.jl:1410,1),( in run_work_thunk at multi.jl:996,1),( in macro expansion at multi.jl:1410 [inlined],1),( in #502 at event.jl:46,1)]))
cc @jrevels

@blakejohnson
Copy link
Contributor

LGTM, too.

@jrevels
Copy link
Member

jrevels commented Aug 22, 2016

Looks like doing a clean build from master is recently broken (is that correct @tkelman?), so Nanosoldier isn't going to be able to build Julia until that gets resolved.

@kshyatt kshyatt added the arrays [a, r, r, a, y, s] label Aug 22, 2016
@tkelman
Copy link
Contributor

tkelman commented Aug 23, 2016

the nightly build hit a different failure, which may or may not happen again...

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@vtjnash vtjnash merged commit d474ca3 into JuliaLang:master Aug 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] compiler:codegen Generation of LLVM IR and native code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants