-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #7642. #7671
Fix #7642. #7671
Conversation
I think the performance of this is unacceptable, due to copying the items array. Other similar operations are not done atomically --- for example indexed assignment might also fail partway through copying values. |
Doesn't the array end up getting copied on assignment anyway? If it's already of the correct type, clearly we shouldn't be copying, but if it needs to be converted, there's no additional overhead. |
No; |
One key difference between these functions and |
Making a copy of an entire array before appending it would cause substantial performance degradation. I agree with @simonster that a reasonable tactic to address this is to try to convert the first element of the input list, and if that passes, we continue the action, otherwise, we raise an error. This introduces nearly zero overhead, while providing a reasonable guard of accidental mistakes. Another reasonable approach is to rollback the operation whenever error happens. |
I guess we might as well add the first element check. |
I just proposed the same thing in on the other issue. |
One issue with the first element check is that it means you can't trap this error and unconditionally delete all the undefined / junk elements since they may or may not have been inserted. I guess you can check the length of the array and do that conditionally. Would it be so bad to wrap this in a try/catch and do that automatically? The stack unwinding would only happen when there's an error anyway, which wouldn't be the common case. |
The try/catch approach is worth timing and seeing. Opening a |
Given that splice is already resizing and array and shuffling memory around, it may be reasonable. |
I've been playing around with the first element check, but it ends up being quite a bit trickier than just adding in |
Ok, I think I figured out the splice! issues I was having. This implements a call to convert on the first element before growing the original arrays. I can do a quick mock of the try-method and compare performance if people are interested. I kind of like this approach though since it mirrors what we're already doing in the dequeue functions (calling convert before growing). |
Though obviously with just the first element check, we're still going to have cases like: In [13]: t = [1,2,3]
Out [13]: 3-element Array{Int64,1}:
1
2
3
In [14]: r = [1.0,1.2,1.3]
Out [14]: 3-element Array{Float64,1}:
1.0
1.2
1.3
In [15]: append!(t,r)
Out [15]: ERROR: InexactError()
while loading In[15], in expression starting on line 1
in copy! at abstractarray.jl:197
in append! at array.jl:477
In [16]: t
Out [16]: 6-element Array{Int64,1}:
1
2
3
1
3616176
3616176 |
So I just pushed a new version utilizing try-catch blocks, but I'm running into some weird build/inference errors. I can define the functions at the repl and run them, no problem, tests pass, etc. But when I modify array.jl and do a |
ccall(:jl_array_grow_end, Void, (Any, Uint), a, n) | ||
copy!(a, length(a)-n+1, items, 1, n) | ||
try | ||
copy!(a, d-n+1, items, 1, n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect --- array_grow_end
changes the length of the array. d+1
should be the correct offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm.....turns out d = length(a)
was causing one of the build failures, so I've taken it out, but
copy!(a, length(a)-n+1, items, 1, n)
Is actually from the original implementation, I didn't change it at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You moved the call to length
to before jl_array_grow_end
, at which point the length is different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Ok. That's fixed on my local branch. Any idea on my splice! build failure?
error during bootstrap: LoadError(at "sysimg.jl" line 68: LoadError(at "osutils.
jl" line 38: UndefVarError(var=:string)))
Basic Block in function 'julia_splice!;2728' does not have terminator!
label %if26
LLVM ERROR: Broken module, no Basic Block terminator!
Stack dump:
0. Running pass 'Function Pass Manager' on module 'julia'.
1. Running pass 'Preliminary module verification' on function '@"julia_spli
ce!;2728"'
Makefile:83: recipe for target '/c/Users/karbarcca/julia/usr/lib/julia/sys0.o' f
ailed
make[1]: *** [/c/Users/karbarcca/julia/usr/lib/julia/sys0.o] Error 1
Makefile:32: recipe for target 'release' failed
make: *** [release] Error 2
My current version of splice!
function splice!{T<:Integer}(a::Vector, r::UnitRange{T}, ins::AbstractArray=_default_splice)
v = a[r]
m = length(ins)
if m == 0
deleteat!(a, r)
return v
end
n = length(a)
f = first(r)
l = last(r)
d = length(r)
if m < d
delta = d - m
if f-1 < n-l
_deleteat_beg!(a, f, delta)
else
_deleteat_end!(a, l-delta+1, delta)
end
elseif m > d
delta = m - d
if f-1 < n-l
_growat_beg!(a, f, delta)
else
_growat_end!(a, l+1, delta)
end
end
try
for k = 1:m
a[f+k-1] = ins[k]
end
catch e
if m < d
splice!(a,f,v)
elseif m > d
a[r] = v
_deleteat!(a, l+1, delta)
else
a[r] = v
end
rethrow(e)
end
return v
end
I found the next error and will fix it on master in a minute. |
Ok should be fixed now. |
Ok, pushed with updated changes/tests. All seems to work fine on my machine. |
So now how's the performance? |
I'm not sure I'm quite getting a performance benchmark right, because I can't notice any difference. I've tried a few different ways, my latest is: function old_append!{T}(a::Array{T,1}, items::AbstractVector)
if is(T,None)
error(_grow_none_errmsg)
end
n = length(items)
ccall(:jl_array_grow_end, Void, (Any, Uint), a, n)
copy!(a, length(a)-n+1, items, 1, n)
return a
end
function new_append!{T}(a::Array{T,1}, items::AbstractVector)
if is(T,None)
error(_grow_none_errmsg)
end
n = length(items)
ccall(:jl_array_grow_end, Void, (Any, Uint), a, n)
try
copy!(a, length(a)-n+1, items, 1, n)
catch e
Base._deleteat!(a, length(a)+1, n)
rethrow(e)
end
return a
end
function old_test(n)
for i = 1:n
A = [1,2,3]
B = [1:10000]
old_append!(A,B);
end
end
@time old_test(10000);
function new_test(n)
for i = 1:n
A = [1,2,3]
B = [1:10000]
new_append!(A,B);
end
end
@time new_test(10000); What's a better way to perf this? |
The difference might not be measurable --- all |
I see a bit of difference with these parameters:
giving
about 10%. |
To remove some allocation time from the test:
Now we see about a 50% slowdown. |
Ok, so now the question – is a 50% slowdown acceptable here? |
I see about 40-43% consistently with Jeff's latest test. |
@JeffBezanson isn't it a problem that the vector is much longer by the time you run I'm not sure of the real-world situations where |
Honestly I still think this whole thing is a non-issue. It seems highly implausible to me that you'd write a program that recovers from trying to append values of the wrong type, expecting the original array to be unmodified in that case. |
It does not seem that unlikely to have such a typo at the REPL, and want to try again to append! the correct array. As we can have a different method for conversion free behaviour, the cost does not seem high. |
The REPL argument does not convince me. |
How about one more stab at this by restricting the argument types to be of the same type? It seems better to force the user to do their own conversion anyway (for those actually trying to append arrays of different types, which as Jeff mentioned, I don't think really happens very often). And this way they get a no method error upfront. I'm not sure if this would break much in packages; there was only one case in REPLCompletions with a ASCIIString/UTF8String mismatch. One last question I have though is if there was any particular reason for having |
@kmsquire, do you have any ideas on my question above? If there's not any interest in the restricted signature version, we can probably just close as a "won't fix". |
Ok, a final bump here. My thoughts are:
If there are no other opinions by the end of the day though, I'll just close. |
Why do you need to restrict the behaviour? In my opinion we can have two methods for |
I think that's a good compromise. If you really need speed, make sure that the types already match. |
…e for atomicity. Adds further method definitions when array element types don't match to try to convert the second array to the element type of the first. Fixes #7642
Ok, changes made. The signatures of the original methods are now restricted to arrays of the same types and additional methods have been added that call convert when the array element types don't match. |
@@ -466,7 +466,7 @@ function push!(a::Array{Any,1}, item::ANY) | |||
return a | |||
end | |||
|
|||
function append!{T}(a::Array{T,1}, items::AbstractVector) | |||
function append!{T}(a::Array{T,1}, items::AbstractVector{T}) | |||
if is(T,None) | |||
error(_grow_none_errmsg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this pretty useless now that we know that you tried to append an array of None to an array of None. This warning needs to be raised in the other method now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Changed.
How much less efficient will something like this be? function append!{T}(a::Array{T,1}, items::AbstractVector)
if is(T,None)
error(_grow_none_errmsg)
end
sizehint(a, length(a) + length(items))
for it in items
push!(a, it)
end
end
|
To me that example helps illustrate the ill-defined nature of this situation: is it more correct to append all items up to the error point, or none of them? |
I think the idea is for Honestly, I think the real value for this is in REPL/IJulia/interactive usage. If you're running a script and run into a convert error; ok, big deal, fix it and run again. At the REPL though, you run into a convert error and all the sudden your original array is junk. I have to quit julia and fire it back up again and rerun who knows how much to get back to the state I was in. If the tradeoff is an occasional performance hiccup because my arrays are differently typed and nice, atomic functions, I'll take the latter. We can even add a snippet in the performance tips if needed. Not having this be atomic just feels sloppy to me. |
Ok, I'll add a performance tip that says "if you need This is favoring a single erroneous append operation over millions of appends running in production. |
Alright, fair enough. Will close. |
Explicitly call convert on append/prepend/splice items to match 1st array elements. This causes a MethodError to be thrown before any array growing/shrinking happens. Fixes #7642.
cc: @kmsquire, in particular I think the
splice!
changes should do the trick, but I'd appreciate a look over to make sure I didn't miss something.