Fix #7642. #7671

quinnj · 2014-07-20T01:27:26Z

Explicitly call convert on append/prepend/splice items to match 1st array elements. This causes a MethodError to be thrown before any array growing/shrinking happens. Fixes #7642.

cc: @kmsquire, in particular I think the splice! changes should do the trick, but I'd appreciate a look over to make sure I didn't miss something.

JeffBezanson · 2014-07-21T00:40:24Z

I think the performance of this is unacceptable, due to copying the items array. Other similar operations are not done atomically --- for example indexed assignment might also fail partway through copying values.

StefanKarpinski · 2014-07-21T05:08:13Z

Doesn't the array end up getting copied on assignment anyway? If it's already of the correct type, clearly we shouldn't be copying, but if it needs to be converted, there's no additional overhead.

JeffBezanson · 2014-07-21T15:43:43Z

No; copy! doesn't allocate memory (unless values need to be boxed).

simonster · 2014-07-21T16:03:16Z

One key difference between these functions and setindex! is that, if setindex! fails converting the first element, nothing happens, whereas these functions will still mutate the array. Maybe we could convert the first element before resizing the collection so that the error happens before mutation if the types are completely incompatible. I'm not sure that is the most defensible behavior, but it would make things less surprising for people playing around at the REPL. Alternatively we could use try/catch and undo the modifications on failure, if the overhead of try is not too great.

lindahua · 2014-07-21T16:26:15Z

Making a copy of an entire array before appending it would cause substantial performance degradation.

I agree with @simonster that a reasonable tactic to address this is to try to convert the first element of the input list, and if that passes, we continue the action, otherwise, we raise an error.

This introduces nearly zero overhead, while providing a reasonable guard of accidental mistakes.

Another reasonable approach is to rollback the operation whenever error happens.

JeffBezanson · 2014-07-22T03:04:33Z

I guess we might as well add the first element check.

StefanKarpinski · 2014-07-22T03:27:30Z

I just proposed the same thing in on the other issue.

StefanKarpinski · 2014-07-22T03:30:09Z

One issue with the first element check is that it means you can't trap this error and unconditionally delete all the undefined / junk elements since they may or may not have been inserted. I guess you can check the length of the array and do that conditionally. Would it be so bad to wrap this in a try/catch and do that automatically? The stack unwinding would only happen when there's an error anyway, which wouldn't be the common case.

JeffBezanson · 2014-07-22T03:39:50Z

The try/catch approach is worth timing and seeing. Opening a try block is not all that expensive.

StefanKarpinski · 2014-07-22T03:47:49Z

Given that splice is already resizing and array and shuffling memory around, it may be reasonable.

quinnj · 2014-07-22T13:26:15Z

I've been playing around with the first element check, but it ends up being quite a bit trickier than just adding in convert(T, items[1]), due to inference stuff I believe. I've got append!/prepend! working, but still working on splice! (the error shows up in a build failure when inference kicks in).

quinnj · 2014-07-22T15:47:08Z

Ok, I think I figured out the splice! issues I was having. This implements a call to convert on the first element before growing the original arrays. I can do a quick mock of the try-method and compare performance if people are interested. I kind of like this approach though since it mirrors what we're already doing in the dequeue functions (calling convert before growing).

quinnj · 2014-07-22T15:55:36Z

Though obviously with just the first element check, we're still going to have cases like:

In  [13]: t = [1,2,3]

Out [13]: 3-element Array{Int64,1}:
 1
 2
 3

In  [14]: r = [1.0,1.2,1.3]

Out [14]: 3-element Array{Float64,1}:
 1.0
 1.2
 1.3

In  [15]: append!(t,r)

Out [15]: ERROR: InexactError()
while loading In[15], in expression starting on line 1
 in copy! at abstractarray.jl:197
 in append! at array.jl:477


In  [16]: t

Out [16]: 6-element Array{Int64,1}:
       1
       2
       3
       1
 3616176
 3616176

quinnj · 2014-07-23T14:50:11Z

So I just pushed a new version utilizing try-catch blocks, but I'm running into some weird build/inference errors. I can define the functions at the repl and run them, no problem, tests pass, etc. But when I modify array.jl and do a make, I run into build errors when inference kicks in and I've gotten 3 or 4 different error messages. Can anyone shed light on what's going on here? I'm not really familiar with what's different between defining a function at the repl and what happens to Base function definitions; something with precompiling?

JeffBezanson · 2014-07-23T15:16:18Z

base/array.jl

    ccall(:jl_array_grow_end, Void, (Any, Uint), a, n)
-    copy!(a, length(a)-n+1, items, 1, n)
+    try
+        copy!(a, d-n+1, items, 1, n)


This is incorrect --- array_grow_end changes the length of the array. d+1 should be the correct offset.

Hmmm.....turns out d = length(a) was causing one of the build failures, so I've taken it out, but

copy!(a, length(a)-n+1, items, 1, n)

Is actually from the original implementation, I didn't change it at all.

You moved the call to length to before jl_array_grow_end, at which point the length is different.

Got it. Ok. That's fixed on my local branch. Any idea on my splice! build failure?

error during bootstrap: LoadError(at "sysimg.jl" line 68: LoadError(at "osutils. jl" line 38: UndefVarError(var=:string))) Basic Block in function 'julia_splice!;2728' does not have terminator! label %if26 LLVM ERROR: Broken module, no Basic Block terminator! Stack dump: 0. Running pass 'Function Pass Manager' on module 'julia'. 1. Running pass 'Preliminary module verification' on function '@"julia_spli ce!;2728"' Makefile:83: recipe for target '/c/Users/karbarcca/julia/usr/lib/julia/sys0.o' f ailed make[1]: *** [/c/Users/karbarcca/julia/usr/lib/julia/sys0.o] Error 1 Makefile:32: recipe for target 'release' failed make: *** [release] Error 2

My current version of splice!

function splice!{T<:Integer}(a::Vector, r::UnitRange{T}, ins::AbstractArray=_default_splice) v = a[r] m = length(ins) if m == 0 deleteat!(a, r) return v end n = length(a) f = first(r) l = last(r) d = length(r) if m < d delta = d - m if f-1 < n-l _deleteat_beg!(a, f, delta) else _deleteat_end!(a, l-delta+1, delta) end elseif m > d delta = m - d if f-1 < n-l _growat_beg!(a, f, delta) else _growat_end!(a, l+1, delta) end end try for k = 1:m a[f+k-1] = ins[k] end catch e if m < d splice!(a,f,v) elseif m > d a[r] = v _deleteat!(a, l+1, delta) else a[r] = v end rethrow(e) end return v end

JeffBezanson · 2014-07-23T15:39:57Z

I found the next error and will fix it on master in a minute.

JeffBezanson · 2014-07-23T15:49:27Z

Ok should be fixed now.

quinnj · 2014-07-23T16:14:05Z

Ok, pushed with updated changes/tests. All seems to work fine on my machine.

StefanKarpinski · 2014-07-23T16:21:02Z

So now how's the performance?

quinnj · 2014-07-23T16:31:12Z

I'm not sure I'm quite getting a performance benchmark right, because I can't notice any difference. I've tried a few different ways, my latest is:

function old_append!{T}(a::Array{T,1}, items::AbstractVector)
    if is(T,None)
        error(_grow_none_errmsg)
    end
    n = length(items)
    ccall(:jl_array_grow_end, Void, (Any, Uint), a, n)
    copy!(a, length(a)-n+1, items, 1, n)
    return a
end
function new_append!{T}(a::Array{T,1}, items::AbstractVector)
    if is(T,None)
        error(_grow_none_errmsg)
    end
    n = length(items)
    ccall(:jl_array_grow_end, Void, (Any, Uint), a, n)
    try
        copy!(a, length(a)-n+1, items, 1, n)
    catch e
        Base._deleteat!(a, length(a)+1, n)
        rethrow(e)
    end
    return a
end

function old_test(n)
    for i = 1:n
        A = [1,2,3]
        B = [1:10000]
        old_append!(A,B);
    end
end
@time old_test(10000);

function new_test(n)
    for i = 1:n
        A = [1,2,3]
        B = [1:10000]
        new_append!(A,B);
    end
end
@time new_test(10000);

What's a better way to perf this?

JeffBezanson · 2014-07-23T16:36:51Z

The difference might not be measurable --- all try does is call a small function that moves a couple words of memory around.

JeffBezanson · 2014-07-23T16:38:29Z

I see a bit of difference with these parameters:

function old_test(n)
    for i = 1:n
        A = [1,2,3]
        B = [1:1000]
        old_append!(B,A);
    end
end
@time old_test(100000);

function new_test(n)
    for i = 1:n
        A = [1,2,3]
        B = [1:1000]
        new_append!(B,A);
    end
end
@time new_test(100000);

giving

julia> @time old_test(100000);
elapsed time: 1.010989101 seconds (2417600080 bytes allocated, 72.83% gc time)

julia> @time new_test(100000);
elapsed time: 1.114201672 seconds (2417600080 bytes allocated, 69.68% gc time)

about 10%.

JeffBezanson · 2014-07-23T16:43:21Z

To remove some allocation time from the test:

A = [1,2,3]
B = [1:1000]

function old_test(A,B,n)
    for i = 1:n
        old_append!(B,A);
    end
end
function new_test(A,B,n)
    for i = 1:n
        new_append!(B,A);
    end
end

julia> @time old_test(A,B,10000000);
elapsed time: 0.285410279 seconds (1070428864 bytes allocated, 11.13% gc time)

julia> @time new_test(A,B,10000000);
elapsed time: 0.438251569 seconds (2472970784 bytes allocated, 6.75% gc time)

Now we see about a 50% slowdown.

StefanKarpinski · 2014-07-23T17:10:22Z

Ok, so now the question – is a 50% slowdown acceptable here?

quinnj · 2014-07-23T17:18:04Z

I see about 40-43% consistently with Jeff's latest test.

simonster · 2014-07-23T17:35:15Z

@JeffBezanson isn't it a problem that the vector is much longer by the time you run new_test (which seems to account for the difference in bytes allocated)? But even if I pass in copy(B) I still see a 1/3 slowdown on my MBP (although the overhead is smaller as A gets longer).

I'm not sure of the real-world situations where append! is the performance bottleneck, few items are being appended on each call, and it's not possible to preallocate, but it seems to me that this slowdown might be acceptable if we define a separate method without try that is called when the element types of a and items match.

JeffBezanson · 2014-07-23T19:18:28Z

Honestly I still think this whole thing is a non-issue. It seems highly implausible to me that you'd write a program that recovers from trying to append values of the wrong type, expecting the original array to be unmodified in that case.

ivarne · 2014-07-23T19:53:59Z

It does not seem that unlikely to have such a typo at the REPL, and want to try again to append! the correct array. As we can have a different method for conversion free behaviour, the cost does not seem high.

JeffBezanson · 2014-07-23T20:19:15Z

The REPL argument does not convince me.

quinnj · 2014-07-24T03:56:58Z

How about one more stab at this by restricting the argument types to be of the same type? It seems better to force the user to do their own conversion anyway (for those actually trying to append arrays of different types, which as Jeff mentioned, I don't think really happens very often). And this way they get a no method error upfront.

I'm not sure if this would break much in packages; there was only one case in REPLCompletions with a ASCIIString/UTF8String mismatch.

One last question I have though is if there was any particular reason for having AbstractArray for splice! replacements whereas append! and prepend! both took AbstractVector as the source arrays?

quinnj · 2014-07-25T13:17:04Z

@kmsquire, do you have any ideas on my question above?

If there's not any interest in the restricted signature version, we can probably just close as a "won't fix".

quinnj · 2014-07-28T14:47:56Z

Ok, a final bump here. My thoughts are:

I personally think restricting the signatures to arrays of the same type is the best way to go. 99% of the time, this is what's probably happening anyway and it not only avoids the lack of atomicity, but some possible subtleties of automatic convert as well.
I also think this can be 0.4. It's not a serious "bug" by any means, and it has the potential to break some packages (see the few places in Base that had to be corrected in my commits). The changes are mainly places where a Array{UTF8String} is appended to Array{ASCIIString} which is planned to go away in the future anyway (as I've understood various discussions of string overhauling)

If there are no other opinions by the end of the day though, I'll just close.

ivarne · 2014-07-28T16:08:30Z

Why do you need to restrict the behaviour? In my opinion we can have two methods for append!(and family). One fast method when the element type of the arrays are the same, and a (slow) version that converts all the elements and fail gracefully (without changing the array), if one of the conversions fail.

StefanKarpinski · 2014-07-28T16:40:35Z

I think that's a good compromise. If you really need speed, make sure that the types already match.

…e for atomicity. Adds further method definitions when array element types don't match to try to convert the second array to the element type of the first. Fixes #7642

quinnj · 2014-07-31T03:26:11Z

Ok, changes made. The signatures of the original methods are now restricted to arrays of the same types and additional methods have been added that call convert when the array element types don't match.

ivarne · 2014-07-31T03:44:31Z

base/array.jl

@@ -466,7 +466,7 @@ function push!(a::Array{Any,1}, item::ANY)
    return a
 end

-function append!{T}(a::Array{T,1}, items::AbstractVector)
+function append!{T}(a::Array{T,1}, items::AbstractVector{T})
    if is(T,None)
        error(_grow_none_errmsg)


Isn't this pretty useless now that we know that you tried to append an array of None to an array of None. This warning needs to be raised in the other method now.

Good call. Changed.

…element types.

ivarne · 2014-07-31T09:07:30Z

How much less efficient will something like this be?

function append!{T}(a::Array{T,1}, items::AbstractVector)
    if is(T,None)
        error(_grow_none_errmsg)
    end
    sizehint(a, length(a) + length(items))
    for it in items
        push!(a, it)
    end
end

push! can probably be replaced by a version that does not check bounds in order to be a little bit faster.

JeffBezanson · 2014-07-31T17:15:24Z

To me that example helps illustrate the ill-defined nature of this situation: is it more correct to append all items up to the error point, or none of them?

quinnj · 2014-07-31T18:00:53Z

I think the idea is for append! et al. to be atomic. Using a version that uses push! would violate that.

Honestly, I think the real value for this is in REPL/IJulia/interactive usage. If you're running a script and run into a convert error; ok, big deal, fix it and run again. At the REPL though, you run into a convert error and all the sudden your original array is junk. I have to quit julia and fire it back up again and rerun who knows how much to get back to the state I was in.

If the tradeoff is an occasional performance hiccup because my arrays are differently typed and nice, atomic functions, I'll take the latter. We can even add a snippet in the performance tips if needed. Not having this be atomic just feels sloppy to me.

JeffBezanson · 2014-07-31T18:21:52Z

Ok, I'll add a performance tip that says "if you need append! to be fast, write your own." Really?

This is favoring a single erroneous append operation over millions of appends running in production.

quinnj · 2014-08-01T02:12:11Z

Alright, fair enough. Will close.

JeffBezanson reviewed Jul 23, 2014
View reviewed changes

Changes append!, prepend!, and splice! to take arrays of the same typ…

d3dce2c

…e for atomicity. Adds further method definitions when array element types don't match to try to convert the second array to the element type of the first. Fixes #7642

ivarne reviewed Jul 31, 2014
View reviewed changes

Move check for element type of 'None' to method with unmatched array …

cb00b8b

…element types.

quinnj closed this Aug 1, 2014

quinnj deleted the jq/append branch August 11, 2014 10:33

mbauman mentioned this pull request Apr 13, 2016

append!(Int64[], [1.1, 0.1]) results in unwanted array growth #15868

Closed

jakobnissen mentioned this pull request Jun 19, 2020

Array allocation happens before type checking #36362

Closed

Fix #7642. #7671

Fix #7642. #7671

Conversation

quinnj commented Jul 20, 2014

JeffBezanson commented Jul 21, 2014

StefanKarpinski commented Jul 21, 2014

JeffBezanson commented Jul 21, 2014

simonster commented Jul 21, 2014

lindahua commented Jul 21, 2014

JeffBezanson commented Jul 22, 2014

StefanKarpinski commented Jul 22, 2014

StefanKarpinski commented Jul 22, 2014

JeffBezanson commented Jul 22, 2014

StefanKarpinski commented Jul 22, 2014

quinnj commented Jul 22, 2014

quinnj commented Jul 22, 2014

quinnj commented Jul 22, 2014

quinnj commented Jul 23, 2014

JeffBezanson Jul 23, 2014

Choose a reason for hiding this comment

quinnj Jul 23, 2014

Choose a reason for hiding this comment

JeffBezanson Jul 23, 2014

Choose a reason for hiding this comment

quinnj Jul 23, 2014

Choose a reason for hiding this comment

JeffBezanson commented Jul 23, 2014

JeffBezanson commented Jul 23, 2014

quinnj commented Jul 23, 2014

StefanKarpinski commented Jul 23, 2014

quinnj commented Jul 23, 2014

JeffBezanson commented Jul 23, 2014

JeffBezanson commented Jul 23, 2014

JeffBezanson commented Jul 23, 2014

StefanKarpinski commented Jul 23, 2014

quinnj commented Jul 23, 2014

simonster commented Jul 23, 2014

JeffBezanson commented Jul 23, 2014

ivarne commented Jul 23, 2014

JeffBezanson commented Jul 23, 2014

quinnj commented Jul 24, 2014

quinnj commented Jul 25, 2014

quinnj commented Jul 28, 2014

ivarne commented Jul 28, 2014

StefanKarpinski commented Jul 28, 2014

quinnj commented Jul 31, 2014

ivarne Jul 31, 2014

Choose a reason for hiding this comment

quinnj Jul 31, 2014

Choose a reason for hiding this comment

ivarne commented Jul 31, 2014

JeffBezanson commented Jul 31, 2014

quinnj commented Jul 31, 2014

JeffBezanson commented Jul 31, 2014

quinnj commented Aug 1, 2014