Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Intermediate API for updating collections, especially AbstractDict #31199

Open
chethega opened this issue Feb 28, 2019 · 8 comments
Labels
collections Data structures holding multiple items, e.g. sets

Comments

@chethega
Copy link
Contributor

chethega commented Feb 28, 2019

One of the most common operations on AbstractDict is updating values in-place, like e.g. d[k] += v or haskey(d, k) ? d[k]+= v : d[k] = v.

It is unlikely that user-code with that idiom will ever get optimized to perform only a single look-up, and previous discussions got stalled on what is the preferred user-facing API.

I propose to introduce a new un-exported but stable and documented intermediate API. Design goal is maximal flexibility, at the cost of arbitrary uglyness. Then, package authors who implement an AbstractDict can specialize this intermediate API to accelerate updates, and package authors who use AbstractDict can implement their favorite beautiful user-facing API that uses the intermediate API.

edit: Excised preliminary proposal to separate comment. If you vote on the issue, use this comment for expressing feelings about the general idea of an intermediate ugly API and the below comment for expressing feelings about the specific preliminary proposal.

@chethega
Copy link
Contributor Author

To get the ball rolling, an example how this could possibly look like:

#combinator: returns new value, and whether to skip update (0x00), perform update (0x01) or delete entry (0x02).
#_update_elem returns new value, and whether iterators have become invalid.
function _update_elem(combinator::Function, d::AbstractDict, k::key, v)
    if haskey(d, k)
        nv, up = combinator(k, d[k], v)
        pres = true
    else
        nv, up = combinator(k, v)
        pres = false
    end
    if up == 0x00
        #don't perform updates
        #iterators are not invalidated.
        return nv, false
    elseif up == 0x01
        d[k] = nv
        #update key. iterators are not invalidated if key was present before.
        return nv, pres
    elseif up == 0x02
        if pres
            delete!(d, k)
            #delete key. iterators are invalidated.
            return nv, true
        else
            #nothing to delete, nothing to invalidate.
            return nv, false
        end
    else
        throw(ArgumentError())
    end
end

@inline function _update_elem(combinator::Function, d::Dict, k, v)
    age0 = d.age
    #note: Base.ht_keyindex2! wants to be @inline, rehash! wants to be @noinline
    #Why does `_setindex!` have a `rehash!`-path? `ht_keyindex2!` already has a path for that. 
    idx = Base.ht_keyindex2!(d, k)

    if idx > 0
        vold = @inbounds d.vals[idx]
        nv, up = combinator(k, vold, v)
    else
        nv, up = combinator(k, v)
    end
    if up == 0x00
        #don't perform updates
        return nv, age0 != d.age
    elseif up == 0x01 && idx > 0
        #update key. 
        @inbounds d.vals[idx] = nv
        return nv, false
    elseif up == 0x01 && idx < 0
        @inbounds Base._setindex!(d, nv, k, -idx)
        return nv, true
    elseif up == 0x02
        inv = (age0 != d.age)
        if idx > 0 
            Base._delete!(d, idx)
        end
        return nv, inv
    else 
        throw(ArgumentError())
    end
end

function increment_old!(d::AbstractDict, k, v)
    if haskey(d, k)
        d[k] += v
    else
        d[k] = v
    end
end


#bad because the anon-function does not get inlined this way
function bad_increment_new!(d::AbstractDict, k, v)
    nv, inv = _update_elem(d, k, v) do args...
            if length(args) == 2
                return args[2], 0x01 
            elseif lenth(args) == 3
                return args[2]+args[3], 0x01
            else
                throw(ArgumentError())
            end
        end
    return nv
end

_inc(k,vold, v) = vold+v, 0x01
_inc(k, v) = v, 0x01

function increment_new!(d::AbstractDict, k, v)
    nv, inv = _update_elem(_inc, d, k, v)
    return nv
end

julia> d=Dict(3=>4.0, 5=>1.1);
julia> @btime haskey($d, $5);
  7.459 ns (0 allocations: 0 bytes)
julia> @btime increment_new!($d, $5, $1.1);
  11.814 ns (0 allocations: 0 bytes)
julia> @btime increment_old!($d, $5, $1.1);
  29.706 ns (0 allocations: 0 bytes)

Most of the timing discrepancy between haskey and increment_new! is due to failure of inlining of ht_keyindex2!. I would consider that almost a bug: ht_keyindex2! and ht_keyindex should either both get inlined, or neither.

The above intermediate API is too ugly to use for end-users, but not too hard for package authors. Obvious simplifications: Skip the "invalidate iterators" rigamarole, skip the "skip updates? delete entries?" rigmarole.

The reason I am interested in delete entries is that this is what's needed for sparse vectors that are backed by dictionaries.

The reason I am interested in iterator invalidation is that I need to iterate over key-value pairs, and possibly update the value (or delete it alltogether), without touching any except for the current one.

@JeffBezanson JeffBezanson added the collections Data structures holding multiple items, e.g. sets label Feb 28, 2019
@JeffBezanson
Copy link
Member

I would call this update!, and make it work a lot like get! except the passed function gets the current value if there is one.

@chethega
Copy link
Contributor Author

chethega commented Feb 28, 2019

In order to facilitate do syntax, one could pass combinator(key, (v,)) or combinator(key, (vold,v)). That way, the anonymous function does not need a vararg, and hopefully inlining and inference work as expected.

That is almost good enough to be user facing (update! is a very nice name btw, but can't be exported atm, due to breakage), and is flexible enough to implement generic get! in 3 lines, using update!.

Is there a non-ugly way of including a result-flag in the combinator that permits deletion of present elements?

Cramming a skip_update flag in there is probably a bad idea. My reason for wanting it was that it would be useful to also have an update!(combinator::Function, d::AbstractDict) that updates all values, potentially deleting some (valid for Dict, because _delete! can never trigger a rehash). In that context, skip_update could be relevant to avoid writing all the values again (e.g. if you only want to filter: combinator instructs to delete unneeded elements and otherwise can have needed side-effects like e.g. putting removed pairs into a different collection; in that case, one would never return 0x00, constant-prop detects this and deletes the unneeded branch; voila, zero-cost flexibility). On the other hand, constant-prop can probably also detect that combinator(k, (vold, v))[1]== vold and skip the write altogether.

@andyferris
Copy link
Member

Simply on bikeshedding the name, I feel we should probabably provide seperate functions for insert, update and “upsert” (current setindex!) operations for dictionaries. (An insert must throw an exception if key already exists, while an update must throw an exception if key does not already exist)

Just something to consider if ever we were to want an update! function with update semantics.

@ndinsmore
Copy link
Contributor

In #31362 I put forth the idea of using a 0-Dimension array, make something very similar work. Right now the best way I had to make an AbstractArray{T,0} was to use view. Which is relatively fast but there are still overhead in the lowering. I tried to do an unsafe_wrap of the view but that ended up being even slower.
If we could get a very fast 0-Dim array, called say get_pointercould something like this happen during lowering?

d[k] += v gets lowered to temp=get_pointer(d,k); temp[]=temp[]+v . The latter portion I this is already optimized.

@tkf
Copy link
Member

tkf commented Apr 12, 2020

xref older discussion #24454

@stevengj
Copy link
Member

stevengj commented Apr 25, 2023

Can't we just export a 4-argument get! with the semantics:

get!(f::Function, collection, key, default) =
    collection[key] = f(get(collection, key, default))

?? i.e. like get! but passing a default value, so that get(collection, key, default) can always be passed to f, which then updates the collection?

This would not be breaking (it would not export any new symbols), would be easily discoverable as another method of get!, and would handle a lot of these dictionary updates. e.g. d[k] = get(d, k, 0) + 1 would become get(v -> v+1, d, k, 0).

I'm tired of not having an API for this in Base, and adding another get! method seems like low-hanging fruit.

@oscardssmith
Copy link
Member

oscardssmith commented Apr 25, 2023

I think that makes sense. This is similar (modulo name) to #33758

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets
Projects
None yet
Development

No branches or pull requests

7 participants