-
Notifications
You must be signed in to change notification settings - Fork 791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make PyClassBorrowChecker thread safe #4544
Conversation
Hmm, not sure I follow why taking an |
I realized today that if I make |
CodSpeed Performance ReportMerging #4544 will not alter performanceComparing Summary
|
ab32156
to
fb19488
Compare
Ping @colesbury - I'd appreciate a code review from you if you have some spare cycles. |
I feel quite strongly that we shouldn't merge this PR. The semantics in #4265 (comment) make much more sense to me. |
@mejrs I'm very much in agreement with you that keeping these "refcell-like" semantics on the freethreaded build is likely to be completely unusable in real-world conditions. Would you be prepared to accept this PR merging in 0.23 with the understanding that we will follow up in 0.24 with something akin to #4265 (comment), i.e. a more realistic design. I think this allows us to make PyO3 sound for downstream testing of free-threading without needing to introduce breaking changes (of which there are already enough collected for this release). |
I disagree strongly that these are unusable in real-world situations. I did a review of pyca/cryptography and these semantics would be appropriate for all of our usage of non-frozen pyclasses. The reason is that while these types require mutable access, there's no particularly coherent behavior for concurrent multi-threaded usage. These types are, for example, iterators and hashers. There's no real world usage for concurrent mutable use. |
I'm OK with that.
They may work in your situation, but not in general. Imagine a List-like pyclass with a |
Thanks, that's interesting to see. I guess that makes a lot of sense if most of your types are either frozen or are "algorithm" helpers, then these semantics are workable. I'm still quite worried that the runtime crashes will trip up users, but I guess maybe it's better than silently blocking and degrading throughput. Still, I think that a lot of PyO3 users might not be so careful in designing their types! 👍 thanks all for the agreement, it seems like we at least have a plan for 0.23 and a rough roadmap of where we're going after that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've finally read the code & test - very concise change in the end. I have just a few thoughts, and then once we've discussed those points let's move ahead with a merge. Thanks!
d92e733
to
27eaf02
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I've reviewed the replacement without SeqCst
and have a different opinion on what the correct implementation is. 🙈
Agreed on the reviews, thanks! I missed the |
I think still possible changes warranted for the Also, now that we're venturing more out of my comfort zone with the less strict memory orderings, it would be great to come up with some tests which give us confidence the implementation is correct. I came up with this one, which fails on #[test]
fn test_thread_safety() {
#[pyclass]
struct MyClass {
x: u64,
}
Python::with_gil(|py| {
let inst = Py::new(py, MyClass { x: 0 }).unwrap();
let total_modifications = py.allow_threads(|| {
std::thread::scope(|s| {
// Spawn a bunch of threads all racing to write to
// the same instance of `MyClass`.
let threads = (0..10)
.map(|_| {
s.spawn(|| {
Python::with_gil(|py| {
// Each thread records its own view of how many writes it made
let mut local_modifications = 0;
for _ in 0..100 {
if let Ok(mut i) = inst.try_borrow_mut(py) {
i.x += 1;
local_modifications += 1;
}
}
local_modifications
})
})
})
.collect::<Vec<_>>();
// Sum up the total number of writes made by all threads
threads.into_iter().map(|t| t.join().unwrap()).sum::<u64>()
})
});
// If the implementation is free of data races, the total number of writes
// should match the final value of `x`.
assert_eq!(total_modifications, inst.borrow(py).x);
});
} |
src/pycell/impl_.rs
Outdated
// On success the read is synchronized to ensure other | ||
// threads don't get a reference before this thread checks | ||
// that it can get one | ||
Ordering::Acquire, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not totally convinced it's necessary to establish a happens-before relationship with other threads when they observe the successfully stored flag (as they won't be allowed to read the accompanying data), but I think this also cannot hurt.
Co-authored-by: David Hewitt <mail@davidhewitt.dev>
Ok, here's yet another test, which fails if the atomic orderings are all relaxed: #[test]
fn test_thread_safety_2() {
struct SyncUnsafeCell<T>(UnsafeCell<T>);
unsafe impl<T> Sync for SyncUnsafeCell<T> {}
impl<T> SyncUnsafeCell<T> {
fn get(&self) -> *mut T {
self.0.get()
}
}
let data = SyncUnsafeCell(UnsafeCell::new(0));
let data2 = SyncUnsafeCell(UnsafeCell::new(0));
let borrow_checker = BorrowChecker(BorrowFlag(AtomicUsize::new(BorrowFlag::UNUSED)));
std::thread::scope(|s| {
s.spawn(|| {
for _ in 0..1_000_000 {
if borrow_checker.try_borrow_mut().is_ok() {
// thread 1 writes to both values during the mutable borrow
unsafe { *data.get() += 1 };
unsafe { *data2.get() += 1 };
borrow_checker.release_borrow_mut();
}
}
});
s.spawn(|| {
for _ in 0..1_000_000 {
if borrow_checker.try_borrow().is_ok() {
// if the borrow checker is working correctly, it should be impossible
// for thread 2 to observe a difference in the two values
assert_eq!(unsafe { *data.get() }, unsafe { *data2.get() });
borrow_checker.release_borrow();
}
}
});
});
} |
Added that test, I can also reproduce the failure if I make the orderings relaxed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work here, I'm now reasonably convinced that what we have here is correct. We might yet dream up new tests later, though I don't think we need to wait for those to merge.
Ah, we probably need to gate the new tests with |
Ref #4265 (comment) and replies from @alex.
I tried doing this with an
AtomicUsize
but becausetry_borrow
accepts an immutable reference, I couldn't figure out a way to get that to work without keeping theCell
. A mutex seemed like a more natural choice for the existing code structure.I think in principle we could use a mutex on the GIL-enabled build as well, since the lock is only held very briefly in rust to update the borrow checker state.
Is the new test that only triggers on the free-threaded build OK? I could also write it to test that there isn't an exception on the GIL-enabled build.