Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

append!(Int64[], [1.1, 0.1]) results in unwanted array growth #15868

Closed
gasagna opened this issue Apr 13, 2016 · 17 comments · Fixed by #51903
Closed

append!(Int64[], [1.1, 0.1]) results in unwanted array growth #15868

gasagna opened this issue Apr 13, 2016 · 17 comments · Fixed by #51903
Labels
arrays [a, r, r, a, y, s] needs decision A decision on this change is needed

Comments

@gasagna
Copy link
Contributor

gasagna commented Apr 13, 2016

Consider the following code demonstrating the bug in append!

julia> a = Int64[]
0-element Array{Int64,1}

julia> append!(a, [1.1, 2.2])
ERROR: InexactError()
 in copy! at abstractarray.jl:344
 in append! at array.jl:447

julia> a
2-element Array{Int64,1}:
 13128344792
  4521590816

Now, in this case the program terminates with an error, but if the append statement is within a try/catch block the growth of the array can go unnoticed.

I actually discovered this bug in testing code, where I was testing that some code involving an append! raised an appropriate error. As soon as I added this test, all other tests in my code started to fail miserably.

@StefanKarpinski StefanKarpinski added the bug Indicates an unexpected problem or unintended behavior label Apr 13, 2016
@StefanKarpinski
Copy link
Member

I believe this came up once before and this behavior was chosen because it's technically acceptable since the operation errored and this avoids and expensive try-catch, but I still think it would be better to at least try to make append! behave atomically w.r.t. known failure modes.

@mbauman
Copy link
Member

mbauman commented Apr 13, 2016

Ref #7671

@JeffBezanson
Copy link
Member

In the Midori terminology this would be an unrecoverable error. As in #7671 I still don't think operations need to be atomic with respect to these.

@nalimilan
Copy link
Member

Could we at least reset the array to its original size before raising the error? I don't see any drawback in doing so, and it completely hides the bug.

@ivarne ivarne added needs decision A decision on this change is needed and removed bug Indicates an unexpected problem or unintended behavior labels Apr 14, 2016
@ivarne
Copy link
Member

ivarne commented Apr 14, 2016

The previous discussion in #7642 and #7671 has all most relavant arguments.

This is a choice between safety and the speed of appending arrays of different types.

Suggested solutions:

  1. Document that append! consideres conversion errors as an unrecoverable error and that the orignal array will be left in a undefined state. The current documentation is wrong, because append(A,b) isn't equivalent to push!(A,b...) in this regard.
  2. Limit append to only work on arrays with the same element type (MethodError if the types don't match)
  3. Implement the missing method after adding the restriction in (2.). This will be a slower method, because it uses try...catch to restore the array to after a failed conversion.
  4. Various ideas that tries to catch a limited set of issues.

Off topic: Should probably consider #7798 (extending append! to iterables)

@StefanKarpinski
Copy link
Member

In the Midori terminology this would be an unrecoverable error. As in #7671 I still don't think operations need to be atomic with respect to these.

Right, except that we don't have unrecoverable errors.

@StefanKarpinski
Copy link
Member

Doesn't @nalimilan's suggesting solve this issue without introducing any overhead?

@ivarne
Copy link
Member

ivarne commented Apr 14, 2016

I don't see how to implement @nalimilan's suggestion without a try...catch.

@StefanKarpinski
Copy link
Member

Hmm. That's a fair point. If we could pass in custom error handlers, then it would be straightforward. But we can't do that currently.

@JeffBezanson
Copy link
Member

One question that still interests me is why append! needs to be error-atomic, but multi-element setindex! doesn't. It seems to me the only difference is cost, which is part of what makes me think this is not really a safety issue.

Or imagine that append! really were implemented with repeated push!, as indeed it might be for some data structures (e.g. merge! on dicts). Then the array would end up with elements appended up to the point of the error. I suspect this would strike fewer people as a bug, since the resulting array wouldn't look weird. So I think at least part of this is not a need for atomicity, but just the uninitialized array looking weird.

@StefanKarpinski
Copy link
Member

Could this be handled by calling sizehint on the array first and then pushing individual elements? That way the array would only end up containing successfully converted and pushed elements.

@nalimilan
Copy link
Member

My bad. Unless we introduce a can_convert() -> Bool function, my suggestion doesn't work.

Pushing elements one-by-one is an interesting idea, but it probably incurs some performance hit too, and in the end the array is still modified (even if the result is much better). Only benchmarks will tell.

@non-Jedi
Copy link
Contributor

non-Jedi commented Jan 22, 2020

Are there really that many situations where one might want to append an iterable that HasLength() or HasShape() but doesn't HasEltype()? Currently the definition of append!(::AbstractVector, iter) requires the former; couldn't we just require the latter as well?

Of course, this would be a breaking change so would require waiting until 2.0.

Edit: I guess the downside is that you lose the ability to append! an Any eltype iterable that happens to contain only items of the appropriate type.

@rapus95
Copy link
Contributor

rapus95 commented Jan 22, 2020

As this is still open I'm curious to ask why the following doesn't solve the original problem:

function append!(a, b)
  oldlen = length(a)
  resize!(a, length(a)+length(b)) #ensures enough space & contiguous memory
  a, newb = split!(a, oldlen) #splits into two arrays
  #a now is equivalent to the original a
  newb::typeof(similar(b, eltype(a))) #-> newb is uninitialized
  copyto!(newb, b) #errors on convert error
  contiguousmerge!(a, newb) #fast path which atomically forgets newb and resizes a
  return a
end

Here we only need to guarantee that the gc won't delete newb in mid of the function (which currently is guaranteed as I understand it) and that a and newb are not moved meanwhile. But that's already guaranteed aswell I guess.
It should also be noted that calling contiguousmerge! out of those very restricted circumstances can render the whole instance unrecoverable. Maybe even better replace it with some intrinsics.

Phrased differently: Why can't we allocate the targetarray in the right place in the first place?

@MasonProtter
Copy link
Contributor

One option would be to do something like this:

function append!(x, y)
    if eltype(y) <: eltype(x)
        unsafe_append!(x, y)
    else
        try_append!(x, y)
    end
end

where unsafe_append! is basically what we do now, but try_append! has the try; catch block and will autoresize if it hits an error.

@StefanKarpinski
Copy link
Member

I like it! Would you make a PR?

@MasonProtter
Copy link
Contributor

Yeah I can give it a go.

@giordano giordano added the arrays [a, r, r, a, y, s] label May 20, 2023
vtjnash added a commit that referenced this issue Oct 27, 2023
First we add an optional API parameter for `sizehint!` that controls
whether it is for `push!` (default) or `pushfirst!`. Secondly, we make
the offset zero when using `sizehint!` to shrink an array from the end,
or the trailing size zero when using it to shring from the beginning.

Then we replace the prior implementations of `prepend!` and `append!`
with ones that are safe even if the iterator changes length during the
operation or if convert fails. The result of `prepend!` may be in an
undefined order (because of the `reverse!` call) in the presence of
concurrent modifications or errors, but at least all of the elements
will be present and valid afterwards.

Replaces and closes #49905
Replaces and closes #47391
Fixes #15868

Co-authored-by: Sukera <Seelengrab@users.noreply.github.com>
Co-authored-by: MasonProtter <mason.protter@icloud.com>
vtjnash added a commit that referenced this issue Oct 27, 2023
First we add an optional API parameter for `sizehint!` that controls
whether it is for `push!` (default) or `pushfirst!`. Secondly, we make
the offset zero when using `sizehint!` to shrink an array from the end,
or the trailing size zero when using it to shring from the beginning.

Then we replace the prior implementations of `prepend!` and `append!`
with ones that are safe even if the iterator changes length during the
operation or if convert fails. The result of `prepend!` may be in an
undefined order (because of the `reverse!` call) in the presence of
concurrent modifications or errors, but at least all of the elements
will be present and valid afterwards.

Replaces and closes #49905
Replaces and closes #47391
Fixes #15868

Co-authored-by: Sukera <Seelengrab@users.noreply.github.com>
Co-authored-by: MasonProtter <mason.protter@icloud.com>
vtjnash added a commit that referenced this issue Nov 8, 2023
First we add an optional API parameter for `sizehint!` that controls
whether it is for `push!` (default) or `pushfirst!`. Secondly, we make
the offset zero when using `sizehint!` to shrink an array from the end,
or the trailing size zero when using it to shring from the beginning.

Then we replace the prior implementations of `prepend!` and `append!`
with ones that are more safe, even if the iterator changes length during
the operation or if convert fails. The result of `prepend!` may be in an
undefined order (because of the `reverse!` call) in the presence of
concurrent modifications or errors, but at least all of the elements
will be present and all entries will be valid afterwards.

Replaces and closes #49905
Replaces and closes #47391
Fixes #15868

Co-authored-by: Sukera <Seelengrab@users.noreply.github.com>
Co-authored-by: MasonProtter <mason.protter@icloud.com>
vtjnash added a commit that referenced this issue Nov 8, 2023
First we add an optional API parameter for `sizehint!` that controls
whether it is for `push!` (default) or `pushfirst!`. Secondly, we make
the offset zero when using `sizehint!` to shrink an array from the end,
or the trailing size zero when using it to shring from the beginning.

Then we replace the prior implementations of `prepend!` and `append!`
with ones that are more safe, even if the iterator changes length during
the operation or if convert fails. The result of `prepend!` may be in an
undefined order (because of the `reverse!` call) in the presence of
concurrent modifications or errors, but at least all of the elements
will be present and all entries will be valid afterwards.

Replaces and closes #49905
Replaces and closes #47391
Fixes #15868

Co-authored-by: Sukera <Seelengrab@users.noreply.github.com>
Co-authored-by: MasonProtter <mason.protter@icloud.com>
vtjnash added a commit that referenced this issue Nov 10, 2023
First we add an optional API parameter for `sizehint!` that controls
whether it is for `push!` (default) or `pushfirst!`. Secondly, we make
the offset zero when using `sizehint!` to shrink an array from the end,
or the trailing size zero when using it to shring from the beginning.

Then we replace the prior implementations of `prepend!` and `append!`
with ones that are safe even if the iterator changes length during the
operation or if convert fails. The result of `prepend!` may be in an
undefined order (because of the `reverse!` call) in the presence of
concurrent modifications or errors, but at least all of the elements
will be present and valid afterwards.

Replaces and closes #49905
Replaces and closes #47391
Fixes #15868

Benchmarks show that repeated `push!` performance (with sizehint) is
nearly equivalent to the old append performance:
```
julia> @benchmark append!(x, 1:1000) setup=x=Vector{Float64}(undef,0)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.027 μs … 72.871 μs  ┊ GC (min … max): 0.00% … 94.57%
 Time  (median):     1.465 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.663 μs ±  1.832 μs  ┊ GC (mean ± σ):  6.20% ±  5.67%

             ▂▃▅▆█▇▇▆▄▂▁                                      
  ▂▁▁▂▂▂▂▃▄▅▇█████████████▇▆▅▅▅▅▅▅▄▅▄▅▅▅▆▇███▆▅▄▃▃▂▂▂▂▂▂▂▂▂▂ ▄
  1.03 μs        Histogram: frequency by time        2.31 μs <

 Memory estimate: 19.69 KiB, allocs estimate: 0.

julia> @benchmark append!(x, 1:1000) setup=x=Vector{Int}(undef,0)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  851.900 ns … 76.757 μs  ┊ GC (min … max): 0.00% … 91.59%
 Time  (median):       1.181 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.543 μs ±  1.972 μs  ┊ GC (mean ± σ):  6.75% ±  5.75%

     ▆█▇▃                                                       
  ▂▃██████▇▅▅▄▅▅▃▂▂▂▃▃▃▂▃▃▃▂▂▂▂▂▁▂▁▂▁▂▂▂▁▁▂▂▁▁▁▁▁▁▁▂▂▂▃▃▃▃▂▂▂▂ ▃
  852 ns          Histogram: frequency by time         4.07 μs <

 Memory estimate: 19.69 KiB, allocs estimate: 0.
```

Co-authored-by: Sukera <Seelengrab@users.noreply.github.com>
Co-authored-by: MasonProtter <mason.protter@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] needs decision A decision on this change is needed
Projects
None yet
10 participants