-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checking sizes on array indices #996
Comments
Also related: issue #911. |
We should certainly introduce bounds checking in all cases. It would be great if we can do it in a way that allows the compiler to remove these checks when possible. I am not excited about the macro for unsafe variants. |
I don't at the moment see how the compiler can decide that, but let me help make the discussion more concrete with an example. Suppose we have a function:
which does the obvious thing (loop over I[i][j] and test whether it's between 1 and dims[i]). For the next two code snippets, let's also mean
It's bad because each index is of length
In the case when both (Actually, are we in effect doing the first version now? Does arrayref check its index on each access?) The really tough case is for an algorithm that does lots of scalar accesses, e.g., a matrix routine that has lots of The idea behind a macro is to make it easy to write your function with full-checking (checking is turned on by default for Libraries like Eigen provide a A(i,j) which checks its bounds, and a ref(A,i,j) which does not. There's also a compiler flag to turn all A(i,j) into ref(A,i,j). One nice thing about the macro is that we'd be able to leave checking on in general and disable it for the parts we're confident about. |
It is worth pointing out that bounds checks are also inserted by the compiler in arrayref and arrayset. https://github.com/JuliaLang/julia/blob/master/src/codegen.cpp#L847 These checks will catch certain types of out of bounds issues. Compilers do eliminate bounds checks, but this probably needs more information at compile time. I just don't like having a large program, where you insert @disable_bounds_check at the beginning, and an end statement at the end. Since this is likely to interact with the codegen in any case at a later point, a compiler switch may be a better way to go. Anyways, I am not religious about these views and am willing to try out a few different options. In any case, we should certainly implement all the necessary safety checks. How we enable/disable them is something we can continue exploring and improving over time. -viral On 30-Jun-2012, at 8:14 PM, Tim Holy wrote:
|
Once you get down to codegen, you are beyond my knowledge of Julia. I'll be glad to hear more about how this works (and study it myself of course, but this will take time). |
@JeffBezanson, is there any way we could leverage the type system to help eliminate bounds checks? Like having invisible "dimension types" associated with each of an array object's indices and letting type inference pass that information along? That gets awfully close to having array dimension sizes in the types though. |
Are dimensions likely to be always available when type inference runs? Isn't this more like a compiler pass than anything specific to do with type inference? @JeffBezanson has also said that we need a few other related things for array indexing performance - hoist some of the metadata access, and inline or optimize the assign() call. This is of course not directly related to this issue, but presumably we should plan out things so that we can go all the way. -viral On 30-Jun-2012, at 10:44 PM, Stefan Karpinski wrote:
|
And even for direct copies, issue #190 still remains to be done. But 90% of our performance problems got cured this morning. I didn't want to add anything, like bounds-checking, that would seem to reduce the benefit! |
There's also this in array.c:
This would seem to suggest that bounds are being checked on each access to an element of an array. |
As I understand, either this version gets called, or if types are inferred, the fast codegen version gets called. In either case, there is always a check in the current situation. -viral On 02-Jul-2012, at 4:13 PM, Tim Holy wrote:
|
Yes, that's the run-time version, as in the one that gets used when fast code-gen doesn't occur. |
Gotcha. Quite informative. Would love a doc on "julia internals" someday (I know you have better things to do at the moment). |
I will extend arrayref and arrayset to accept all indexes, and then code generation can emit in-line bounds checks for each dimension. |
Is this for all integer indices? I.e., |
You got that performance by using only linear indexing, right? In general, when the expression |
Not at all. You check each index upon entry to ref, before you do any looping over any of the indices. That way it's |
But when you are not accessing in a patterned way, but instead accessing individual elements "at random", then of course you do have to do a check on each index, each time. That's when bounds-checking gets nasty for performance, and why it would be nice to turn off in a well-vetted algorithm. |
I did say "in general". In other words, when code contains I also believe with some code generator improvements on our end LLVM could hoist some of the operations out of loops. Code like your fancy indexing code is a different matter --- it uses linear indexing, so checking all dimensions has to be done in a custom way, as you describe. Using an intrinsic would allow those checks to also be disabled by the compiler switch. So we need to do a couple different things: add the checks to |
That sounds like a plan. I can do the library functions (I'm sure) and macro (I think), if you can do the compiler end. Will there be an |
The macro can't be implemented as a simple code rewrite, since macros can only see surface syntax. You can't even distinguish an array access from, say, a dictionary lookup. So, all we can do is surround the macro argument in special to-be-added forms |
Hmm, thanks for pointing that out, I was just beginning to wonder how I was going to do that. So just to make sure we're on the same page: I'll rename the large majority of our
which either one of us can write, but I'd let you do it just so you can pick names you like. |
Oh god no! Actually having two versions of every indexing function would be horrible. A bit of performance is not worth that kind of nightmare. I will take care of the macro. |
Ah, good, thanks for clarifying---you're saying the macro can work even on the core functions to avoid the explicit call to So the only part for me to do is write |
In other words, I guess I was wondering whether all library functions can be inlined. |
...of course, |
Currently the following is allowed in Julia:
That's because checking is done to make sure that no invalid memory will be accessed, but the individual indices are not checked. This seems reasonable, but of course it's a little unexpected (esp. compared to Matlab) and could make it harder to notice bugs in algorithms.
The downside of more checking, of course, is that there is a performance hit.
I see three options:
@disable_arrayind_check
that one can use to wrap a function definition to get more performance out of well-tested code. That macro would "just" change all cases of ref/assign for arrays to their unsafe versions (not sure how easy this would be to write).Obviously the third is the most ambitious. Given the new array ref/assign code that just got checked in, I'm also going to be testing whether we can yank some of the specialized definitions (particularly abundant for assign). If so, that will make this issue easier to address.
Or perhaps there are other ways?
The text was updated successfully, but these errors were encountered: