Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WeakKeyDict based on object ids #3002

Open
tanmaykm opened this issue May 3, 2013 · 13 comments
Open

WeakKeyDict based on object ids #3002

tanmaykm opened this issue May 3, 2013 · 13 comments

Comments

@tanmaykm
Copy link
Member

tanmaykm commented May 3, 2013

Is this the correct behaviour?

julia> v1 = [1,2]
2-element Int64 Array:
 1
 2

julia> v2 = copy(v1)
2-element Int64 Array:
 1
 2

julia> 

julia> w1 = WeakRef(v1)
WeakRef([1, 2])

julia> w2 = WeakRef(v2)
WeakRef([1, 2])

julia> 

julia> object_id(w1.value) == object_id(w2.value)
false

julia> object_id(w1) == object_id(w2)
false

julia> 

julia> w1 == w2
true

julia> hash(w1) == hash(w2)
true

Since v1 and v2 are distinct, shouldn't hash(w1) and hash(w2) be different for WeakKeyDict to work correctly?

@JeffBezanson
Copy link
Member

== and === (is) are different equality functions. The functions used by Dict and WeakKeyDict are == and hash. ObjectIdDict uses === and object_id.

@tanmaykm
Copy link
Member Author

tanmaykm commented May 3, 2013

Yes. And I think that is what my concern is about, in WeakKeyDict.

As I understand, WeakKeyDict entries are supposed to be removed when the keys (particular instances of a type) get garbage collected. I think that would not work reliably if WeakRefs to two different instances produce the same hash value.

Shouldn't the hash value of WeakRef be based on object ids of the referenced value instead of the value itself? Or may be WeakKeyDict should be implemented differently?

@JeffBezanson
Copy link
Member

I think both combinations make sense. One case where you want the current behavior of WeakKeyDict is hash consing, where you construct a new object and then see if something equal to it already exists in a WeakKeyDict.

I could modify ObjectIdDict to work correctly with WeakRef entries, and then it would be easy to define a WeakKeyObjectIdDict.

@StefanKarpinski
Copy link
Member

I feel like this should ideally be more composable:

  • WeakKeyDict{K,V} should just be Dict{WeakRef{K},V}
  • WeakKeyObjectIdDict should just be ObjectIdDict{WeakRef,Any}

etc. Not entirely sure if this is possible at the moment, but it seems like it's pretty close to doable at least.

@StefanKarpinski
Copy link
Member

I realize this may make weak-key dicts less convenient to use since you'd have to both look each item up and dereference the actual value from the WeakRef container, but I feel like that's acceptable – it's not like WeakKeyDicts are something that ought to be used every day.

@tanmaykm
Copy link
Member Author

tanmaykm commented May 3, 2013

But then that results in the following scenario:

ulia> wd = WeakKeyDict{Vector, Int}();

julia> v1 = [1,2];

julia> v2 = copy(v1);

julia> wd[v1] = 1;

julia> wd[v2] = 2;

julia> length(wd)
1

julia> v1 = v2 = [1,2];

julia> gc()
ERROR: key not found: [1, 2]
 in delete! at dict.jl:488
 in anonymous at dict.jl:518
 in gc at base.jl:92

I also noticed the usage of WeakKeyDict in multi.jl for RemoteRefs relies on the current behavior. But I wonder if it is correct... It seems to be using this behavior to get the same RemoteRef object instance in the constructor.

@JeffBezanson
Copy link
Member

The RemoteRef constructor works because it uses getkey(). This is just the classic hash-consing pattern. I agree that other usage patterns should be supported too though.

The error is an implementation bug. It can be fixed several ways; simplest is just to tolerate missing keys in the finalizer that calls delete!.

@tanmaykm
Copy link
Member Author

tanmaykm commented May 3, 2013

Thanks. That sounds good.

I think then a WeakKeyObjectIdDict is what would be useful as you suggested.

I would be using this in the ChainedVectors package... where I try to create sub vectors of larger buffers without making a copy and need the larger buffer to hang around till any of the sub vectors are still being used.

@tanmaykm
Copy link
Member Author

tanmaykm commented May 7, 2013

Here's an implementation of WeakObjectIdKeyDict, in the same lines as WeakKeyDict. Does it look good?

type WeakObjectIdKeyDict{K,V} <: Associative{K,V}
    ht::Dict{Uint,Tuple}

    WeakObjectIdKeyDict() = new((Uint=>Tuple)[])
end
WeakObjectIdKeyDict() = WeakObjectIdKeyDict{Any,Any}()

function setindex!{K,V}(wih::WeakObjectIdKeyDict{K,V}, v, key)      
    oid = object_id(key::K)
    wih.ht[oid] = (WeakRef(key), v::V)
    finalizer(key, x->delete!(wih.ht, oid, nothing))
end

function getkey{K}(wih::WeakObjectIdKeyDict{K}, kk, deflt)
    k = getkey(wih.ht, object_id(kk::K), secret_table_token)
    if is(k, secret_table_token)
        return deflt
    end
    kk
end

function get{K}(wih::WeakObjectIdKeyDict{K}, key, def)
    v = get(wih.ht, object_id(key::K), def)
    is(v, def) && return v
    v[2]
end
delete!{K}(wih::WeakObjectIdKeyDict{K}, key) = delete!(wih.ht, object_id(key::K))[2]
function delete!{K}(wih::WeakObjectIdKeyDict{K}, key, def)
    v = delete!(wih.ht, object_id(key::K), def)
    is(v, def) && return v
    v[2]
end
empty!(wih::WeakObjectIdKeyDict)  = (empty!(wih.ht); wih)
haskey{K}(wih::WeakObjectIdKeyDict{K}, key) = haskey(wih.ht, object_id(key::K))
getindex{K}(wih::WeakObjectIdKeyDict{K}, key) = getindex(wih.ht, object_id(key::K))[2]
isempty(wih::WeakObjectIdKeyDict) = isempty(wih.ht)

start(wih::WeakObjectIdKeyDict) = start(wih.ht)
done(wih::WeakObjectIdKeyDict, i) = done(wih.ht, i)
function next{K}(wih::WeakObjectIdKeyDict{K}, i)
    kv, i = next(wih.ht, i)
    vv = kv[2]
    ((vv[1].value::K, vv[2]), i)
end
length(wih::WeakObjectIdKeyDict) = length(wih.ht)

[pao: syntax highlighting]

@JeffBezanson
Copy link
Member

Looks good. It's interesting to note that for most purposes the key doesn't actually need to be stored in the dictionary; it's only needed in next. If you didn't need iteration, it could be implemented much more efficiently.

@tanmaykm
Copy link
Member Author

tanmaykm commented May 7, 2013

Yes. Do you feel we should skip the iterator in favor of efficiency?

@mauro3
Copy link
Contributor

mauro3 commented Dec 6, 2017

Another thing which seems surprising to me is:

julia> w1,w2 = WeakRef([1]), WeakRef([1])                                                                                                                                
(WeakRef([1]), WeakRef([1]))                                                                                                                                             

julia> w1==w2                                                                                                                                                            
true                                                                                                                                                                     

julia> r1,r2 = Ref([1]), Ref([1])                                                                                                                                        
(Base.RefArray{Int64,Array{Int64,1},Void}([1], 1, nothing), Base.RefArray{Int64,Array{Int64,1},Void}([1], 1, nothing))                                                   

julia> r1==r2                                                                                                                                                            
false                                                                                                                                                                    

WeakRef compares their values with ==, Ref compares their values with ===.

I would argue that it would be clearer for WeakRef to behave like Ref, i.e. use === and object_id (and WeakKeyDict equally). Presumably, we'd then want to define a WeakKeyDictValue (needs a better name) to take up the current WeakKeyDict functionality.

EDIT: x-ref I came across #12198 (comment)

@mauro3
Copy link
Contributor

mauro3 commented Dec 6, 2017

I just tried bulding Julia with WeakRef using === and object_id. This is the diff:

diff --git a/base/gcutils.jl b/base/gcutils.jl
index 0ff53bceb6..0f26be2af7 100644
--- a/base/gcutils.jl
+++ b/base/gcutils.jl
@@ -1,8 +1,8 @@
 # This file is a part of Julia. License is MIT: https://julialang.org/license
 
-==(w::WeakRef, v::WeakRef) = isequal(w.value, v.value)
-==(w::WeakRef, v) = isequal(w.value, v)
-==(w, v::WeakRef) = isequal(w, v.value)
+# ==(w::WeakRef, v::WeakRef) = isequal(w.value, v.value)
+# ==(w::WeakRef, v) = isequal(w.value, v)
+# ==(w, v::WeakRef) = isequal(w, v.value)
 
 """
     finalizer(f, x)
diff --git a/base/hashing.jl b/base/hashing.jl
index e2847342a8..a415f85a5b 100644
--- a/base/hashing.jl
+++ b/base/hashing.jl
@@ -14,7 +14,7 @@ Typically, any type that implements `hash` should also implement its own `==` (h
 `isequal`) to guarantee the property mentioned above.
 """
 hash(x::Any) = hash(x, zero(UInt))
-hash(w::WeakRef, h::UInt) = hash(w.value, h)
+# hash(w::WeakRef, h::UInt) = hash(w.value, h)
 
 ## hashing general objects ##

It builds fine and the only error in the test-set is from deserializing a function:

    JULIA test/serialize                        
Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
serialize: Test Failed at /home/mauro/julia/julia-master/test/serialize.jl:320
  Expression: deserialize(ds) === g2
   Evaluated: getfield(Base.Serializer.__deserialized_types__, Symbol("##976"))() === getfield(Base.Serializer.__deserialized_types__, Symbol("#g#57"))()
Stacktrace:
 [1] (::getfield(, Symbol("##53#56")))(::Base.GenericIOBuffer{Array{UInt8,1}}) at /home/mauro/julia/julia-master/test/serialize.jl:320
 [2] create_serialization_stream(::getfield(, Symbol("##53#56"))) at /home/mauro/julia/julia-master/test/serialize.jl:12
 [3] top-level scope at /home/mauro/julia/julia-master/test/serialize.jl:309
 [4] eval(::Module, ::Expr) at /home/mauro/julia/julia-master/test/testdefs.jl:13
 [5] top-level scope
 [6] include_relative(::Module, ::String) at ./loading.jl:509
 [7] include at ./sysimg.jl:15 [inlined]
 [8] include(::String) at /home/mauro/julia/julia-master/test/testdefs.jl:13
 [9] macro expansion at /home/mauro/julia/julia-master/test/testdefs.jl:22 [inlined]
 [10] macro expansion at /home/mauro/julia/julia-master/usr/share/julia/site/v0.7/Test/src/Test.jl:953 [inlined]
 [11] macro expansion at /home/mauro/julia/julia-master/test/testdefs.jl:21 [inlined]
 [12] macro expansion at util.jl:310 [inlined]
 [13] top-level scope at /home/mauro/julia/julia-master/test/testdefs.jl:19
Worker 1 failed running test serialize:
Some tests did not pass: 104 passed, 1 failed, 0 errored, 0 broken.serialize: Test Failed at /home/mauro/julia/julia-master/test/serialize.jl:320
  Expression: deserialize(ds) === g2
   Evaluated: getfield(Base.Serializer.__deserialized_types__, Symbol("##976"))() === getfield(Base.Serializer.__deserialized_types__, Symbol("#g#57"))()
Stacktrace:
 [1] record(::Test.DefaultTestSet, ::Test.Fail) at /home/mauro/julia/julia-master/usr/share/julia/site/v0.7/Test/src/Test.jl:660
 [2] (::getfield(, Symbol("##42#46")))() at /home/mauro/julia/julia-master/test/runtests.jl:184
 [3] cd(::getfield(, Symbol("##42#46")), ::String) at ./file.jl:70
 [4] top-level scope
 [5] include_relative(::Module, ::String) at ./loading.jl:509
 [6] include(::Module, ::String) at ./sysimg.jl:15
 [7] process_options(::Base.JLOptions) at ./client.jl:340
 [8] _start() at ./client.jl:406

Test Summary: | Pass  Fail  Total
  Overall     |  104     1    105
    serialize |  104     1    105
    FAILURE

So at least a WeakRef test should be added, which explicitly tests the == behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants