Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deepcopy semantics for GLOBAL_RNG and ENV #42899

Closed
cscherrer opened this issue Nov 1, 2021 · 19 comments
Closed

deepcopy semantics for GLOBAL_RNG and ENV #42899

cscherrer opened this issue Nov 1, 2021 · 19 comments
Labels
randomness Random number generation and the Random stdlib

Comments

@cscherrer
Copy link

cscherrer commented Nov 1, 2021

This was very surprising to me, and took me a very long time to figure out:

julia> using Random

julia> rng = Random.GLOBAL_RNG
_GLOBAL_RNG()

julia> deepcopy(rng) === rng
true

Even more surprising is

julia> copy(rng) === rng
false

There's similar behavior with ENV in place of GLOBAL_RNG.

There are a few issues here:

  1. The behavior of deepcopy on GLOBAL_RNG is very different than for other RNGs. It's not clear that the former should be "special" in this way.
  2. I had always understood deepcopy to be a stronger version of copy, in the sense that copy may still result in shared mutable sub-structures, but that deepcopy would replicate these, so the result of deepcopy would have no mutable substructures in common with the original.

I had some discussion about this on Zulip with @Seelengrab and @fredrikekre here. Fredrik pointed out that

deepcopy is really a very different function; it almost belongs in the Serialization package or something of that sort.
(from #42796 (comment))

If this is really the intent, it seems it ought to be well-documented, with guidance on when to use which one.

In any case, I can't imagine a use case for the current arrangement where deepcopy can result in strictly more sharing than copy.

@JeffBezanson
Copy link
Member

Great example of the inherent problems with deepcopy :) One can easily see how deepcopy can copy "too much", but it can also copy too little! In either case the problem is that deepcopy cannot obey the abstractions that objects present; it can only traverse the object graph and duplicate the representations of objects. Note that adding an overload for deepcopy might seem like it could fix the problem, but it can't, because you might have

mutable struct MyType
    rng::_GLOBAL_RNG
end

When deepcopy tries to recursively copy the field, it would get something of a different type (due to the overload), resulting in a type error or an invalid object.

Of course, _GLOBAL_RNG is also a hack, and it's not ideal for copy to return an object of a different type.

@JeffBezanson
Copy link
Member

julia> copy(rng) === rng
false

This would not be expected to be true, since an rng has mutable state.

@Seelengrab
Copy link
Contributor

Seelengrab commented Nov 1, 2021

mutable struct MyType
rng::_GLOBAL_RNG
end

IMO it's questionable whether that type existing can be relied upon at all - GLOBAL_RNG itself is not documented and aside from various docstrings of rand as a default argument and in UUID, isn't mentioned at all. Much less _GLOBAL_RNG. In contrast, ENV is documented to be a EnvDict, though I now question if that is good..

I wouldn't consider the _GLOBAL_RNG type to be a part of the public API (though admittedly I'm more and more confused about what exactly that means and when something is considered API), so I think an overload for deepcopy is appropriate.

@vtjnash
Copy link
Member

vtjnash commented Nov 1, 2021

RNG might be an even weirder example, since for a PRNG, we'd expect a copy to change the object identity, but for a real RNG (such as RandomDevice), we'd expect it not to change the identity.

@Seelengrab
Copy link
Contributor

Under the hood, _GLOBAL_RNG is just a dispatch trick to extract task local RNG state, which happens to be equivalent to a Xoshiro (and is in fact what copy returns right now - extracted state with a julia-native Xoshiro wrapper).

@JeffBezanson
Copy link
Member

I wouldn't consider the _GLOBAL_RNG type to be a part of the public API (though admittedly I'm more and more confused about what exactly that means and when something is considered API), so I think an overload for deepcopy is appropriate.

Fair point, but whether it is public doesn't really matter. The key feature is that you can have a field with that type, and since deepcopy is recursive it will break if an object is deepcopy'd to a different type. That stems directly from the idea of a function that generically traverses fields without knowing what their semantics are.

@JeffBezanson
Copy link
Member

For comparison, what would you expect to happen when serializing and deserializing GLOBAL_RNG?

@cscherrer
Copy link
Author

I'd expect it to give a Xoshiro, but I thought it was a type alias for that

@Seelengrab
Copy link
Contributor

Seelengrab commented Nov 1, 2021

The key feature is that you can have a field with that type, and since deepcopy is recursive it will break if an object is deepcopy'd to a different type. That stems directly from the idea of a function that generically traverses fields without knowing what their semantics are.

Well, from my POV GLOBAL_RNG really only guarantees you a given stream of random numbers - in a sense, the RNG is defined by the stream of numbers it produces, since any given PRNG can/should be treated as a black box in regards to how it generates numbers. Continuing that thought would require deepcopy to only return an AbstractRNG that produces the same random numbers, which in this case would be a Xoshiro with the same internal state. Incidentally, that would make deepcopy and copy have to behave the same under this interpretation. Of course we can't dispatch on that, which sadly brings me back to what is documented about our guarantees here..

For comparison, what would you expect to happen when serializing and deserializing GLOBAL_RNG?

I'd expect to get the same stream of random numbers from the deserialized version as I would get from the non-serialized version, while not affecting the original, when passed as the first argument to a rand call. It is, after all, an entirely new object created from somewhat structured data.

serialize already documents that a read-back value will be as identical as possible to the original - it doesn't cover a bit-for-bit or total behavioral correspondence. As such, I don't think that covers the dispatch mechanism for hooking into task-local RNGs. After all, that doesn't affect the stream of random numbers, only the internal state of the RNG at time of serialization does (assuming no child RNGs are created during runtime, for simplicities sake... but even that can be emulated on the serialized copy, if required).

@Seelengrab
Copy link
Contributor

Seelengrab commented Nov 1, 2021

As such, I don't think that covers the dispatch mechanism for hooking into task-local RNGs. After all, that doesn't affect the stream of random numbers, only the internal state of the RNG at time of serialization does.

In other words (even though I know internally it's not/only sort-of implemented like that): We can picture a Task holding/wrapping a Xoshiro internally, to which we get access through GLOBAL_RNG. When copying the RNG, we don't copy the task (the wrapper) with it - that would require a inversecopy (is there a wrapper-equivalent for deepcopy? like "meta" vs. "mesa" or "inter" vs "intra"?) which copies its wrapper(s) as well.

@Seelengrab
Copy link
Contributor

Seelengrab commented Nov 1, 2021

but for a real RNG (such as RandomDevice), we'd expect it not to change the identity.

Unifying this with the above - I think a RandomDevice can't ever be deepcopied, because we can't serialize it - its state depends on things external to the PRNG, depending on what kind of RNG it is (let's assume some lava lamp based statistical process for simplicities sake). We can copy it, but that's almost useless to do since that requires linking the two copies in the streams of numbers they produce, meaning we could have passed the original RNG object around in the first place.

@JeffBezanson
Copy link
Member

It would be nice to get rid of the _GLOBAL_RNG type. It seems to cause more confusion than it's worth. TaskLocalRNG has a similar problem, since it also doesn't encapsulate its state, but I don't see any way to fix that. It seems to me any application that depends on "object contents => same random stream" would need to manually use a Xoshiro object. The notion of deepcopying or serializing an object is fully incompatible with task-local RNG state AFAICT. All you can do is manually save the state, and manually copy! it back to task-local storage.

@JeffBezanson JeffBezanson added the randomness Random number generation and the Random stdlib label Nov 2, 2021
@Seelengrab
Copy link
Contributor

The notion of deepcopying or serializing an object is fully incompatible with task-local RNG state AFAICT.

Right - the problem is the interaction with the task machinery. One way out of this would be to have copy retain the task-intertwinedness (by just being a shallow reference to another task, which would be horrible in case that task gets GCed..) and have deepcopy produce a new Xoshiro with the same internal state (but no connection to the task it originated from). Per se I don't see why having deepcopy available would require to get the exact same type (_GLOBAL_RNG or TaskLocalRNG) back, as long as it's documented that deepcopy on that may not produce an === value, but will produce an object with the same behavior as far as RNG in isolation goes.

@JeffBezanson
Copy link
Member

When an object has a reference to TaskLocalRNG(), it's impossible to know whether that means "use whatever the current task-local stream is" or "use the exact RNG state of the current task". If deepcopy or serialize changes the object graph, we would just be guessing which one is meant. Therefore we want to have the simple invariant that you get the same object graph back. Similar cases arise for example with files: when we deepcopy an IOStream, should we include the current contents of the file?

Having deepcopy(::TaskLocalRNG) return a Xoshiro is just a superficial fix in my opinion. It doesn't actually give you something that behaves the same, except in simple cases. Anyway, you can easily imagine code breaking if deserialize gave back something different than what I saved with serialize. What if I have typeasserts in my code?

@cscherrer Can you say more about how this case arose? Did you need to deepcopy some larger structure?

As many of us have long said, deepcopy is a strange function. It should maybe be called duplicate_object_graph or something like that. It comes up because people sometimes want a function that "copies everything", but the problem is you can't conclusively specify what that means. If we changed this, eventually somebody would want "copy everything except TaskLocalRNG". And so on.

@cscherrer
Copy link
Author

Hi Jeff,

The problem I was working on was to be able to build a reproducible infinite stream of values. Ideally, this would use a purely functional RNG, but all the infrastructure seems to assume mutability, so I wasn't sure I could get that to work for arbitrary samplers.

So as a hack, I made a ResettableRNG that holds an RNG and a seed, and made it so constructing one of these makes a copy. You have to be really careful, because if other things try to use the same RNG it throws off the sequence and you lose reproducibility.

I guess a more rigorous mutable approach would have something like Rust's borrow checker, so you can guarantee that only one other object can hold the reference to the RNG. But you know, baby steps :)

Anyway, that why I needed to copy RNGs. And I always just thought deepcopy is "safer", and would guarantee there's no mutable shared state.

@JeffBezanson
Copy link
Member

Thanks, makes sense. In this case the solution is certainly to copy the RNG, since that will do the right thing. We should make it clear that if an object has mutable state, copy is all you need. There aren't really different levels or depths of copying; either an object has mutable state or it doesn't, and if it does, copy copies it. I'm not sure what deepcopy is for. My recollection is that Stefan and I pushed back strongly against the function, on the grounds that it's not clear what it means, but then the meaning "similar to deserialize(serialize(x))" was proposed, which is sufficiently clear, so we went ahead with that.

@cscherrer
Copy link
Author

Thanks, I guess the case that's not clear to me is if you have something like a vector of tuples of arrays. I could imagine copy stopping when it hits the immutable layer, so the inner arrays end up shared. Haven't tested this though.

@JeffBezanson
Copy link
Member

Our Arrays don't own their elements, so copy only copies the outer array and does not look into the elements, whether immutable or not. The right way to copy multiple levels is to either (1) do it explicitly, e.g. map(copy, array), or (2) define a new data type represented as e.g. a vector of tuples of arrays, with its own copy method.

@vtjnash vtjnash closed this as completed Mar 3, 2022
@vtjnash
Copy link
Member

vtjnash commented Mar 3, 2022

tl;dr don't use deepcopy (except in simple cases where it is useful)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
randomness Random number generation and the Random stdlib
Projects
None yet
Development

No branches or pull requests

4 participants