Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Immutable Event implementation #3198

Merged
merged 18 commits into from
Aug 16, 2024
Merged

Immutable Event implementation #3198

merged 18 commits into from
Aug 16, 2024

Conversation

lifers
Copy link
Contributor

@lifers lifers commented Aug 12, 2024

Implement the Event struct with no need of mutating methods using RwLock. I'd think it's safe from deadlock because:

  • RwLock can block when reader and writer threads are contesting access
  • When swap lock is acquired, we either have:
    • an exclusive write access from one of add, remove, or clear
    • a shared read access from call and one of add, remove, or clear
  • Otherwise,
    • when a change lock is acquired, we have a single read access from one of add, remove, or clear
    • when no lock is acquired, there is no access
  • Therefore delegates shouldn't deadlock

Please let me know if there is other considerations for this feature. Thank you!

Fixes: #1833

No deadlock reasoning:
- RwLock can block when reader and writer threads are contesting access
- When swap lock is acquired, we either have:
  - an exclusive write access from one of add, remove, or clear
  - a shared read access from call and one of add, remove, or clear
- Otherwise,
  - when a change lock is acquired, we have a single read access from one of add, remove, or clear
  - when no lock is acquired, there is no access
- Therefore delegates shouldn't deadlock
@lifers
Copy link
Contributor Author

lifers commented Aug 12, 2024

@microsoft-github-policy-service agree

@kennykerr
Copy link
Collaborator

Additional synchronization should not be required above the swap and change locks. For reference, I wrote the Rust implementation based on the implementation I wrote for C++/WinRT here:

https://github.com/microsoft/cppwinrt/blob/master/strings/base_events.h

That in turn is loosely based on the WRL implementation.

@kennykerr
Copy link
Collaborator

Basically all we need to do here is make add, remove, and clear take a &self. The only tricky bit is convincing Rust that Array::swap is thread safe.

Assuming we have an exclusive access to the array
Assuming we have an exclusive access to the array
Assuming we have exclusive access to the array
@lifers
Copy link
Contributor Author

lifers commented Aug 13, 2024

Since we have exclusive access to the delegate array when we hold the swap lock, would you rather have the swap done like this? (e.g. lines 104-105, 160-167)

@kennykerr
Copy link
Collaborator

Looks good - is it necessary to use a pointer rather than a mutable reference?

use core::cell::UnsafeCell.
@lifers
Copy link
Contributor Author

lifers commented Aug 13, 2024

I figured out a way to use a mutable reference using UnsafeCell.

Once the class design is complete, I guess we should impl Send and Sync right 🤔

@kennykerr
Copy link
Collaborator

Yes, it can be Send and Sync.

@kennykerr
Copy link
Collaborator

Would appreciate some more eyes on this. The change here is just to make the Event type - part of the supporting type system for implementing WinRT events - to not require mutable references to self since the implementation is otherwise entirely thread safe and is meant to be used in the context of a multi-threaded COM object that can receive concurrent requests for adding/removing/calling event handlers.

@ChrisDenton @dpaoliello @sivadeilra

@dpaoliello
Copy link
Collaborator

Seems reasonable to me. @sivadeilra thoughts?

@sivadeilra
Copy link
Collaborator

Reading.

Copy link
Collaborator

@sivadeilra sivadeilra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this code is safe (sound). It accesses the delegates list while acquiring different locks in different pub methods.

Can you explain why this is safe? If delegates is associated with exactly one lock, then why is it not contained within one of the Mutex instances? Since this code acquires different locks in different paths, I don't understand how it is sound.

@lifers
Copy link
Contributor Author

lifers commented Aug 13, 2024

From my understanding:

  • We don't want concurrent add/remove/clear, so we have the self.change Mutex to prevent it from happening.
  • When add/remove/clear is reading delegates and hasn't mutated anything, we don't want to block call since it only needs an immutable reference.
  • However, when add/remove/clear is writing to delegates, we do need an exclusive mutable reference so we have another lock self.swap for it.
  • The only time call can't read delegates is when delegates are being written, or in other words self.swap is being locked. So we simply make call acquire self.swap in order to read it.

Copy link
Collaborator

@sivadeilra sivadeilra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I re-read this with Chris Denton's comments in mind.

I hate to say it, but I think this code makes the wrong trade-off of safety vs. performance. First, this code may currently be sound, but it is dangerously close to the edge. And there are safe abstractions that already provide the semantics that are wanted.

From my reading of this code, RwLock would provide exactly the same level of synchronization, and would do so safely. The only scenario where RwLock would have different behavior would be that writers would briefly block reads (call paths), but I consider that acceptable for this design because modifying the list should be extremely infrequent, compared to reading the list.

So I think this PR needlessly uses unsafe code; there is no advantage here to using unsafe code. Even worse, this PR mixes the implementation of a R/W lock with a particular usage of that R/W lock, so it mixes two things together. That makes verification harder and makes it more likely that a future PR that looks innocent will actually break the safety invariants. And there's just no advantage over using RwLock.

I'm also troubled by the performance implication of cloning the delegate list every time call runs. That's a lot of allocator calls.

I think this would be improved by replacing all of this with RwLock<Arc<Array<T>>>. The writer paths (add, remove, etc.) would acquire it in exclusive mode and do the obvious thing. The call path would acquire the RwLock in shared mode, call Arc::clone (which is a single interlocked increment), then drop the RwLock guard. This would eliminate all of the allocator calls in call.

This implementation would be 100% safe code, and obviously correct from a borrow-checker and concurrency point of view.

@kennykerr
Copy link
Collaborator

Thanks for the feedback. As I mentioned in #3198 (comment), this implementation is copied from C++/WinRT which is based on WRL. The choice around performance tradeoffs was done before my time but it was stressed that C++/WinRT must provide the same performance tradeoffs as WRL as they were calculated at some point to be preferable for the OS. Having said that, I don't mind making a different set of performance tradeoffs for Rust if it means the code is simpler and thus easier to reason about. I will say that this implementation (at least the C++ implementations) has stood the test of time and very heavy usage.

However, there are some constraints that must be preserved. I suspect RwLock<Arc<Array<T>>> could preserve them but I thought I'd just be explicit about that here so that we're on the same page. That will also help folks understand the existing implementation. The constraints are essentially:

  • No locks can be held while invoking delegates (event handlers). This is to avoid COM reentrancy issues and deadlocks.
  • No locks can be held indefinitely that would prevent add or remove calls from acquiring and executing within a lock. This is to avoid DoS for unrelated code.

If a simpler implementation is possible that preserves these constraints, then I'm not opposed to it. 😊

@sivadeilra
Copy link
Collaborator

sivadeilra commented Aug 14, 2024

Both of those constraints would be met by using RwLock<Arc<Array<T>>>. I understand the need for them.

I think a simpler, provably-safe implementation is the best place to start. If there is quantitative evidence that this implementation can't meet the same needs as the equivalent C++, then I'm open to adjusting that, but not without some evidence.

If it would help, I can sketch out the implementation.

@kennykerr
Copy link
Collaborator

Hey @lifers I realize this is probably more than you bargained for. 😊 Would you like someone else to handle this?

@lifers
Copy link
Contributor Author

lifers commented Aug 15, 2024

Hi, this has gotten more interesting and I'd like to figure this out unless the maintainers need to finish this ASAP with their background knowledge 😄. It takes a while for me to figure out what else is happening inside the initial code. From what I've seen,

  • Do we need to keep the fallible allocation that returns E_OUTOFMEMORY instead of Rust's default abort?
  • Using the structs from Rust will simplify a lot of things but might increase the memory footprint (e.g. using std::sync::Arc instead of imp::RefCount for reference counting, using RwLock to guard the Array). Is this something of a concern?
  • Is there a limitation of where structs can come from? Can we use things freely from std::?

@kennykerr
Copy link
Collaborator

A few thoughts.

  • Consistent error handling via Result (and HRESULT) is important.
  • The downside to using RwLock<Arc<Array<T>>> is that it would result in two allocations. The existing implementations handle this by storing the array and the reference count in a single allocation. This is also how the Rust event's internal Array works.
  • Some of the additional reasoning about the existing implementations is persuaded by the understanding that events will almost always only ever have zero or one event handlers so if you're going to optimize that's what you should optimize for. Thus, needing two allocations for storing a single event handler seems wasteful.

I wonder if we can use RwLock<Array<T>> instead of RwLock<Arc<Array<T>>> given that Array already provides a thread-safe reference counting.

@sivadeilra
Copy link
Collaborator

sivadeilra commented Aug 15, 2024

So here's a sketch of what I propose:

pub struct Event<T> {
    delegates: RwLock<Arc<[T]>>,
}

impl<T> Event<T> {
    pub fn add(&self, delegate: T) {
        let mut guard = self.delegates.write().unwrap();
        let mut new_list = Vec::with_capacity(guard.len() + 1);
        for old_delegate in guard.iter() {
            new_list.push(old_delegate.clone());
        }
        new_list.push(delegate);
        let new_list_arc: Arc<[T]> = new_list.into(); // <-- converts Vec<T> into Arc<[T]>
        *guard = new_list_arc;
    }

    pub fn clear(&self) {
        let mut guard = self.delegates.write().unwrap();
        guard.clear();
    }

    pub fn remove(&self, delegate: &T) {
        let mut guard = self.delegates.write().unwrap();
        if let Some(i) = guard.iter().find(|old_delegate| old_delegate == delegate) {
            let mut new_list = Vec::with_capacity(guard.len() - 1);
            new_list.extend(guard[..i].iter().cloned());
            new_list.extend(guard[i + 1..].iter().cloned());
            let new_list_arc: Arc<[T]> = new_list.into(); // <-- converts Vec<T> into Arc<[T]>
            *guard = new_list_arc;
        } else {
            // We never found it, so don't modify the list.
        }
    }

    pub fn call(&self) {
        let delegates: Arc<[T]> = {
            let guard = self.delegates.read().unwrap();
            guard.clone()
            // <-- lock is released here
        };
        for delegate in delegates.iter() {
            // ...
        }
    }

Obviously this is just a sketch, but the idea should be clear. The paths that modify the list use RwLock::write. The path that reads the list (call) uses RwLock::read, but only for a very brief time so that they can increase the refcount on the list.

@lifers
Copy link
Contributor Author

lifers commented Aug 15, 2024

If the fallible allocation is still needed, we could use a simplified Array just to allocate like:

/// A thread-safe reference-counted array of delegates.
struct Array<T: Interface> {
    buffer: *mut Delegate<T>,
    len: usize,
}

impl<T: Interface> Default for Array<T> {
    fn default() -> Self {
        Self::with_capacity(0).unwrap()
    }
}

impl<T: Interface> Array<T> {
    /// Creates a new, empty `Array<T>` with the specified capacity.
    fn with_capacity(capacity: usize) -> Result<Self> {
        if capacity != 0 {
            let alloc_size = capacity * size_of::<Delegate<T>>();
            let header = heap_alloc(alloc_size)? as *mut Delegate<T>;

            Ok(Self {
                buffer: header,
                len: 0,
            })
        } else {
            Ok(Self {
                buffer: null_mut(),
                len: 0,
            })
        }
    }

which will be used as

pub struct Event<T: Interface> {
    delegates: RwLock<Arc<Array<T>>>,
}

impl<T: Interface> Event<T> {
    pub fn add(&self, delegate: T) -> Result<()> {
        let mut guard = self.delegates.write().unwrap();
        let mut new_list = Array::with_capacity(guard.len() + 1)?;
        for old_delegate in guard.as_slice().iter() {
            new_list.push(old_delegate.clone());
        }
        new_list.push(Delegate::new(&delegate)?);
        // let new_list_arc: Arc<[Delegate<T>]> = new_list.into(); // <-- converts Vec<T> into Arc<[T]>
        *guard = new_list.into();
        Ok(())
    }

    pub fn clear(&self) -> Result<()> {
        let mut guard = self.delegates.write().unwrap();
        *guard = Array::with_capacity(0)?.into();
        Ok(())
    }
...
    pub fn call(&self) {
          let delegates = {
              let guard = self.delegates.read().unwrap();
              guard.clone()
              // <-- lock is released here
          };
          for delegate in delegates.as_slice().iter() {
              // ...
          }
      }

@kennykerr
Copy link
Collaborator

Don't worry about allocation failure.

@sivadeilra
Copy link
Collaborator

Honestly, I think providing the "fallible" allocation pathway is worse. Just stick with the standard Rust approach. If you are implementing a memory allocation container, the appropriate response to OOM is to call std::alloc::handle_alloc_error. This initiates the panic / stack unwind.

The `Option` that wraps `Arc<[Delegate<T>]>` will let us assign `None` instead of creating `Arc::new([])` with no space penalty.
@sivadeilra
Copy link
Collaborator

Looks a lot simpler -- thank you!

The Clippy suggestion about a Default impl is a good one.

@kennykerr kennykerr requested review from sivadeilra and removed request for sivadeilra August 16, 2024 13:52
@lifers
Copy link
Contributor Author

lifers commented Aug 16, 2024

By the way, now that Event<T> can be used concurrently, should we have tests for that?

@kennykerr
Copy link
Collaborator

Concurrency tests can be challenging - fortunately the code is now a lot simpler so perhaps just a basic smoke test confirming that it is in fact Send and Sync may be helpful e.g. can be used with thread::spawn or similar.

@lifers
Copy link
Contributor Author

lifers commented Aug 16, 2024

Just an idea for a new test... ```Rust #[test] fn across_threads() -> Result<()> { let event = Arc::new(Event::>::new());
  let check = Arc::new(AtomicI32::new(0));
  let other = Arc::new(AtomicI32::new(0));

  // Register event handler on a different thread.
  let event1 = event.clone();
  let check1 = check.clone();
  let _ = std::thread::spawn(move || {
      event1.add(&EventHandler::<i32>::new(move |_, args| {
          check1.fetch_add(*args, Ordering::Relaxed);
          Ok(())
      })).unwrap();
  }).join().unwrap();

  // Raise event on the main thread.
  assert_eq!(check.load(Ordering::Relaxed), 0);
  event.call(|delegate| delegate.Invoke(None, 1));
  assert_eq!(check.load(Ordering::Relaxed), 1);

  let event2 = event.clone();
  let other2 = other.clone();
  let check2 = check.clone();
  std::thread::scope(move |s| {
      // Call the `check` event handler on a another thread.
      let event3 = event2.clone();
      s.spawn(move || {
          assert_eq!(check2.load(Ordering::Relaxed), 1);
          event3.call(|delegate| delegate.Invoke(None, 2));
          assert_eq!(check2.load(Ordering::Relaxed), 3);
      });

      // Register and call `other` event on another thread.
      let event3 = event2.clone();
      s.spawn(move || {
          let other3 = other2.clone();

          event3.add(&EventHandler::<i32>::new(move |_, args| {
              other3.fetch_add(*args, Ordering::Relaxed);
              Ok(())
          })).unwrap();

          assert_eq!(other2.load(Ordering::Relaxed), 0);
          event.call(|delegate| delegate.Invoke(None, 4));
          assert_eq!(other2.load(Ordering::Relaxed), 4);
      });
  });

  // Check final value after the thread scope ends.
  assert_eq!(check.load(Ordering::Relaxed), 7);
  assert_eq!(other.load(Ordering::Relaxed), 4);

  Ok(())

}

</details>

@sivadeilra
Copy link
Collaborator

I'm not opposed to that test, but from experience in testing concurrent algorithms, I don't think it actually has much value. I know, more testing is always supposed to be better, but when it comes to concurrency, tests can often be flakey, or even if they aren't flakey, they don't actually exercise meaningful properties and so they just give a false sense of security.

If you think this test is valuable, and it runs in a very short period of time, I'm fine with it. However, don't drop a thread joiner; that's nearly always a code smell. If you think you've already synchronized with the thread, then retain the thread joiner and join on it at the end of the test function. It should be a no-op, but it gives me higher confidence in the correctness of the test.

@kennykerr
Copy link
Collaborator

Yep, I'd not worry about testing the concurrency itself and just test that the compiler can actually move the references around - the thing that wouldn't work if Send and Sync weren't implemented - in the simplest way possible.

Co-authored-by: Kenny Kerr <kenny@kennykerr.ca>
Copy link
Collaborator

@kennykerr kennykerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate all the work on this - looks great!

@kennykerr kennykerr merged commit f19edde into microsoft:master Aug 16, 2024
79 checks passed
@lifers
Copy link
Contributor Author

lifers commented Aug 16, 2024

Glad to help! 😁 Thanks everyone for following through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Event should not be mutable
5 participants