Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #7642. #7671

Closed
wants to merge 2 commits into from
Closed

Fix #7642. #7671

wants to merge 2 commits into from

Conversation

quinnj
Copy link
Member

@quinnj quinnj commented Jul 20, 2014

Explicitly call convert on append/prepend/splice items to match 1st array elements. This causes a MethodError to be thrown before any array growing/shrinking happens. Fixes #7642.

cc: @kmsquire, in particular I think the splice! changes should do the trick, but I'd appreciate a look over to make sure I didn't miss something.

@JeffBezanson
Copy link
Member

I think the performance of this is unacceptable, due to copying the items array. Other similar operations are not done atomically --- for example indexed assignment might also fail partway through copying values.

@StefanKarpinski
Copy link
Member

Doesn't the array end up getting copied on assignment anyway? If it's already of the correct type, clearly we shouldn't be copying, but if it needs to be converted, there's no additional overhead.

@JeffBezanson
Copy link
Member

No; copy! doesn't allocate memory (unless values need to be boxed).

@simonster
Copy link
Member

One key difference between these functions and setindex! is that, if setindex! fails converting the first element, nothing happens, whereas these functions will still mutate the array. Maybe we could convert the first element before resizing the collection so that the error happens before mutation if the types are completely incompatible. I'm not sure that is the most defensible behavior, but it would make things less surprising for people playing around at the REPL. Alternatively we could use try/catch and undo the modifications on failure, if the overhead of try is not too great.

@lindahua
Copy link
Contributor

Making a copy of an entire array before appending it would cause substantial performance degradation.

I agree with @simonster that a reasonable tactic to address this is to try to convert the first element of the input list, and if that passes, we continue the action, otherwise, we raise an error.

This introduces nearly zero overhead, while providing a reasonable guard of accidental mistakes.


Another reasonable approach is to rollback the operation whenever error happens.

@JeffBezanson
Copy link
Member

I guess we might as well add the first element check.

@StefanKarpinski
Copy link
Member

I just proposed the same thing in on the other issue.

@StefanKarpinski
Copy link
Member

One issue with the first element check is that it means you can't trap this error and unconditionally delete all the undefined / junk elements since they may or may not have been inserted. I guess you can check the length of the array and do that conditionally. Would it be so bad to wrap this in a try/catch and do that automatically? The stack unwinding would only happen when there's an error anyway, which wouldn't be the common case.

@JeffBezanson
Copy link
Member

The try/catch approach is worth timing and seeing. Opening a try block is not all that expensive.

@StefanKarpinski
Copy link
Member

Given that splice is already resizing and array and shuffling memory around, it may be reasonable.

@quinnj
Copy link
Member Author

quinnj commented Jul 22, 2014

I've been playing around with the first element check, but it ends up being quite a bit trickier than just adding in convert(T, items[1]), due to inference stuff I believe. I've got append!/prepend! working, but still working on splice! (the error shows up in a build failure when inference kicks in).

@quinnj
Copy link
Member Author

quinnj commented Jul 22, 2014

Ok, I think I figured out the splice! issues I was having. This implements a call to convert on the first element before growing the original arrays. I can do a quick mock of the try-method and compare performance if people are interested. I kind of like this approach though since it mirrors what we're already doing in the dequeue functions (calling convert before growing).

@quinnj
Copy link
Member Author

quinnj commented Jul 22, 2014

Though obviously with just the first element check, we're still going to have cases like:

In  [13]: t = [1,2,3]

Out [13]: 3-element Array{Int64,1}:
 1
 2
 3

In  [14]: r = [1.0,1.2,1.3]

Out [14]: 3-element Array{Float64,1}:
 1.0
 1.2
 1.3

In  [15]: append!(t,r)

Out [15]: ERROR: InexactError()
while loading In[15], in expression starting on line 1
 in copy! at abstractarray.jl:197
 in append! at array.jl:477


In  [16]: t

Out [16]: 6-element Array{Int64,1}:
       1
       2
       3
       1
 3616176
 3616176

@quinnj
Copy link
Member Author

quinnj commented Jul 23, 2014

So I just pushed a new version utilizing try-catch blocks, but I'm running into some weird build/inference errors. I can define the functions at the repl and run them, no problem, tests pass, etc. But when I modify array.jl and do a make, I run into build errors when inference kicks in and I've gotten 3 or 4 different error messages. Can anyone shed light on what's going on here? I'm not really familiar with what's different between defining a function at the repl and what happens to Base function definitions; something with precompiling?

ccall(:jl_array_grow_end, Void, (Any, Uint), a, n)
copy!(a, length(a)-n+1, items, 1, n)
try
copy!(a, d-n+1, items, 1, n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect --- array_grow_end changes the length of the array. d+1 should be the correct offset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.....turns out d = length(a) was causing one of the build failures, so I've taken it out, but

copy!(a, length(a)-n+1, items, 1, n)

Is actually from the original implementation, I didn't change it at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You moved the call to length to before jl_array_grow_end, at which point the length is different.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Ok. That's fixed on my local branch. Any idea on my splice! build failure?

error during bootstrap: LoadError(at "sysimg.jl" line 68: LoadError(at "osutils.
jl" line 38: UndefVarError(var=:string)))
Basic Block in function 'julia_splice!;2728' does not have terminator!
label %if26
LLVM ERROR: Broken module, no Basic Block terminator!
Stack dump:
0.      Running pass 'Function Pass Manager' on module 'julia'.
1.      Running pass 'Preliminary module verification' on function '@"julia_spli
ce!;2728"'
Makefile:83: recipe for target '/c/Users/karbarcca/julia/usr/lib/julia/sys0.o' f
ailed
make[1]: *** [/c/Users/karbarcca/julia/usr/lib/julia/sys0.o] Error 1
Makefile:32: recipe for target 'release' failed
make: *** [release] Error 2

My current version of splice!

function splice!{T<:Integer}(a::Vector, r::UnitRange{T}, ins::AbstractArray=_default_splice)
    v = a[r]
    m = length(ins)
    if m == 0
        deleteat!(a, r)
        return v
    end

    n = length(a)
    f = first(r)
    l = last(r)
    d = length(r)

    if m < d
        delta = d - m
        if f-1 < n-l
            _deleteat_beg!(a, f, delta)
        else
            _deleteat_end!(a, l-delta+1, delta)
        end
    elseif m > d
        delta = m - d
        if f-1 < n-l
            _growat_beg!(a, f, delta)
        else
            _growat_end!(a, l+1, delta)
        end
    end
    try
        for k = 1:m
            a[f+k-1] = ins[k]
        end
    catch e
        if m < d
            splice!(a,f,v)
        elseif m > d
            a[r] = v
            _deleteat!(a, l+1, delta)
        else
            a[r] = v
        end
        rethrow(e)
    end
    return v
end

@JeffBezanson
Copy link
Member

I found the next error and will fix it on master in a minute.

@JeffBezanson
Copy link
Member

Ok should be fixed now.

@quinnj
Copy link
Member Author

quinnj commented Jul 23, 2014

Ok, pushed with updated changes/tests. All seems to work fine on my machine.

@StefanKarpinski
Copy link
Member

So now how's the performance?

@quinnj
Copy link
Member Author

quinnj commented Jul 23, 2014

I'm not sure I'm quite getting a performance benchmark right, because I can't notice any difference. I've tried a few different ways, my latest is:

function old_append!{T}(a::Array{T,1}, items::AbstractVector)
    if is(T,None)
        error(_grow_none_errmsg)
    end
    n = length(items)
    ccall(:jl_array_grow_end, Void, (Any, Uint), a, n)
    copy!(a, length(a)-n+1, items, 1, n)
    return a
end
function new_append!{T}(a::Array{T,1}, items::AbstractVector)
    if is(T,None)
        error(_grow_none_errmsg)
    end
    n = length(items)
    ccall(:jl_array_grow_end, Void, (Any, Uint), a, n)
    try
        copy!(a, length(a)-n+1, items, 1, n)
    catch e
        Base._deleteat!(a, length(a)+1, n)
        rethrow(e)
    end
    return a
end

function old_test(n)
    for i = 1:n
        A = [1,2,3]
        B = [1:10000]
        old_append!(A,B);
    end
end
@time old_test(10000);

function new_test(n)
    for i = 1:n
        A = [1,2,3]
        B = [1:10000]
        new_append!(A,B);
    end
end
@time new_test(10000);

What's a better way to perf this?

@JeffBezanson
Copy link
Member

The difference might not be measurable --- all try does is call a small function that moves a couple words of memory around.

@JeffBezanson
Copy link
Member

I see a bit of difference with these parameters:

function old_test(n)
    for i = 1:n
        A = [1,2,3]
        B = [1:1000]
        old_append!(B,A);
    end
end
@time old_test(100000);

function new_test(n)
    for i = 1:n
        A = [1,2,3]
        B = [1:1000]
        new_append!(B,A);
    end
end
@time new_test(100000);

giving

julia> @time old_test(100000);
elapsed time: 1.010989101 seconds (2417600080 bytes allocated, 72.83% gc time)

julia> @time new_test(100000);
elapsed time: 1.114201672 seconds (2417600080 bytes allocated, 69.68% gc time)

about 10%.

@JeffBezanson
Copy link
Member

To remove some allocation time from the test:

A = [1,2,3]
B = [1:1000]

function old_test(A,B,n)
    for i = 1:n
        old_append!(B,A);
    end
end
function new_test(A,B,n)
    for i = 1:n
        new_append!(B,A);
    end
end

julia> @time old_test(A,B,10000000);
elapsed time: 0.285410279 seconds (1070428864 bytes allocated, 11.13% gc time)

julia> @time new_test(A,B,10000000);
elapsed time: 0.438251569 seconds (2472970784 bytes allocated, 6.75% gc time)

Now we see about a 50% slowdown.

@StefanKarpinski
Copy link
Member

Ok, so now the question – is a 50% slowdown acceptable here?

@quinnj
Copy link
Member Author

quinnj commented Jul 23, 2014

I see about 40-43% consistently with Jeff's latest test.

@simonster
Copy link
Member

@JeffBezanson isn't it a problem that the vector is much longer by the time you run new_test (which seems to account for the difference in bytes allocated)? But even if I pass in copy(B) I still see a 1/3 slowdown on my MBP (although the overhead is smaller as A gets longer).

I'm not sure of the real-world situations where append! is the performance bottleneck, few items are being appended on each call, and it's not possible to preallocate, but it seems to me that this slowdown might be acceptable if we define a separate method without try that is called when the element types of a and items match.

@JeffBezanson
Copy link
Member

Honestly I still think this whole thing is a non-issue. It seems highly implausible to me that you'd write a program that recovers from trying to append values of the wrong type, expecting the original array to be unmodified in that case.

@ivarne
Copy link
Member

ivarne commented Jul 23, 2014

It does not seem that unlikely to have such a typo at the REPL, and want to try again to append! the correct array. As we can have a different method for conversion free behaviour, the cost does not seem high.

@JeffBezanson
Copy link
Member

The REPL argument does not convince me.

@quinnj
Copy link
Member Author

quinnj commented Jul 24, 2014

How about one more stab at this by restricting the argument types to be of the same type? It seems better to force the user to do their own conversion anyway (for those actually trying to append arrays of different types, which as Jeff mentioned, I don't think really happens very often). And this way they get a no method error upfront.

I'm not sure if this would break much in packages; there was only one case in REPLCompletions with a ASCIIString/UTF8String mismatch.

One last question I have though is if there was any particular reason for having AbstractArray for splice! replacements whereas append! and prepend! both took AbstractVector as the source arrays?

@quinnj
Copy link
Member Author

quinnj commented Jul 25, 2014

@kmsquire, do you have any ideas on my question above?

If there's not any interest in the restricted signature version, we can probably just close as a "won't fix".

@quinnj
Copy link
Member Author

quinnj commented Jul 28, 2014

Ok, a final bump here. My thoughts are:

  • I personally think restricting the signatures to arrays of the same type is the best way to go. 99% of the time, this is what's probably happening anyway and it not only avoids the lack of atomicity, but some possible subtleties of automatic convert as well.
  • I also think this can be 0.4. It's not a serious "bug" by any means, and it has the potential to break some packages (see the few places in Base that had to be corrected in my commits). The changes are mainly places where a Array{UTF8String} is appended to Array{ASCIIString} which is planned to go away in the future anyway (as I've understood various discussions of string overhauling)

If there are no other opinions by the end of the day though, I'll just close.

@ivarne
Copy link
Member

ivarne commented Jul 28, 2014

Why do you need to restrict the behaviour? In my opinion we can have two methods for append!(and family). One fast method when the element type of the arrays are the same, and a (slow) version that converts all the elements and fail gracefully (without changing the array), if one of the conversions fail.

@StefanKarpinski
Copy link
Member

I think that's a good compromise. If you really need speed, make sure that the types already match.

…e for atomicity. Adds further method definitions when array element types don't match to try to convert the second array to the element type of the first. Fixes #7642
@quinnj
Copy link
Member Author

quinnj commented Jul 31, 2014

Ok, changes made. The signatures of the original methods are now restricted to arrays of the same types and additional methods have been added that call convert when the array element types don't match.

@@ -466,7 +466,7 @@ function push!(a::Array{Any,1}, item::ANY)
return a
end

function append!{T}(a::Array{T,1}, items::AbstractVector)
function append!{T}(a::Array{T,1}, items::AbstractVector{T})
if is(T,None)
error(_grow_none_errmsg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this pretty useless now that we know that you tried to append an array of None to an array of None. This warning needs to be raised in the other method now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Changed.

@ivarne
Copy link
Member

ivarne commented Jul 31, 2014

How much less efficient will something like this be?

function append!{T}(a::Array{T,1}, items::AbstractVector)
    if is(T,None)
        error(_grow_none_errmsg)
    end
    sizehint(a, length(a) + length(items))
    for it in items
        push!(a, it)
    end
end

push! can probably be replaced by a version that does not check bounds in order to be a little bit faster.

@JeffBezanson
Copy link
Member

To me that example helps illustrate the ill-defined nature of this situation: is it more correct to append all items up to the error point, or none of them?

@quinnj
Copy link
Member Author

quinnj commented Jul 31, 2014

I think the idea is for append! et al. to be atomic. Using a version that uses push! would violate that.

Honestly, I think the real value for this is in REPL/IJulia/interactive usage. If you're running a script and run into a convert error; ok, big deal, fix it and run again. At the REPL though, you run into a convert error and all the sudden your original array is junk. I have to quit julia and fire it back up again and rerun who knows how much to get back to the state I was in.

If the tradeoff is an occasional performance hiccup because my arrays are differently typed and nice, atomic functions, I'll take the latter. We can even add a snippet in the performance tips if needed. Not having this be atomic just feels sloppy to me.

@JeffBezanson
Copy link
Member

Ok, I'll add a performance tip that says "if you need append! to be fast, write your own." Really?

This is favoring a single erroneous append operation over millions of appends running in production.

@quinnj
Copy link
Member Author

quinnj commented Aug 1, 2014

Alright, fair enough. Will close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

append!, prepend!, and splice! should not modify arrays if convert throws
7 participants