-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thoughts on using a more efficient ordering in ArcStr::drop #16
Comments
I think it's pretty conclusive that we can't do better here. Closing to avoid confusing anyone in the future. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
As-is, our ArcStr::clone and ArcStr::drop code is more or less taken verbatim from the stdlib's
std::sync::Arc<T>
's implementation*. There are a lot of reasons for this, mainly the fact that it was, at one point, proven correct. However,ArcStr
offers some meaningfully stronger guarantees which could perhaps allow us improve our implementation ofDrop
.Right now Drop (for non-literal ArcStrs) is conceptually:
However, given that we're immutable, it might give you pause that we need that
Ordering::Acquire
, which only ever executes on this thread (so it's purpose can't be to ensure other threads see our changes that we just made infetch_sub
). This tracks with discussion in the bug report linked in the footnote, which describe it as essentially ensuring that the actual destructor sees the final view of the type, e.g. ensuring that our thread gets changes written by other threads. Well, we don't actually need those, sinceArcStr
is fully immutable.In practice, it may also serve as an optimization barrier to ensure writes from
drop_slow
(e.g writes from inside the memory allocator's dealloc) aren't moved beforeself.count.fetch_sub
. This would be pretty wild, given that there's a conditional, but I guess in theory nothing strictly prevents it...Further, while https://en.cppreference.com/w/cpp/atomic/memory_order says that
memory_order_release
provides the following guarantee:It's not 100% clear to me which part of the C or C++ standard here that this is using to back this up. In practice, it does seem like a given, but most of
Release
s semantics seem to be defined around the interaction withAcquire
. I can almost see it given the definition of release sequences (in conjunction with happens before / dependency on / etc), but it's hard for me to say anything that concretely. Anyway, this matters because if we can't guarantee it and don't have an Acquire, there's nothing preventing other threads from doing their decref, and then continuing to read from theArcStr
despite a remote thread being allowed to deallocate the str's memory.Do I think we might be able to remove the Acquire fence? Well, kinda:
It feels pretty likely that cppreference's interpretation that reads on a thread cannot be moved after a
fetch_sub(1, Release)
on the same thread is correct. In practice it's impossible to implementMutex<T>
without a guarantee of this. (In order to change the code I would need more concrete evidence, and perhaps this ultimately is going to be the downfall, but... yeah).Additionally, while it's true that without the
Acquire
fence there's no ordering individual threads have with each other, it is guaranteed that thefetch_sub
s are still atomic, and that at most one thread will ever reach 0, and be allowed entry into the conditional block, and be allowed to free the memory.That said, neither of these are really strong guarantees under the memory model, and so I'd need something stronger to convince me. Hell, even with obvious mistakes in the ordering AFAICT
loom
happily accepts my code... So maybe I need to install coq or whatever theorem prover has a specification of the C memory model's atomics that I can easily use... That said, it seems likely to be a while before I can do something like this.Anyway, this was a bit rambling, and I probably repeated myself, but I wanted to get it down somewhere so that I can later take a look and say "ah no, I already thought about that" and such.
* Okay, not verbatim for
ArcStr::drop
. We take a small modification to replace an explicitatomic::fence(Acquire)
with by a.load(Acquire)
of the refcount. The stdlib actually does this too, but only when running under thread-sanitizer. At one point there was a PR for this change (rust-lang/rust#41714), which AFAICT mainly didn't merge because the policy at the time was to only accept PRs like this with benchmarks showing improvements, which will be hard given that the distinction here matters mainly in cases of high contention, and is specific to how the cache is being used.However, it's possibly worth noting that this means the variant we use, while (IMO) "obviously" equivalent to using a fence, is not actually identical to what was proven correct in RustBelt.
The text was updated successfully, but these errors were encountered: