Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReplaySubject race condition #228

Closed
nepolak opened this issue Jun 24, 2024 · 7 comments
Closed

ReplaySubject race condition #228

nepolak opened this issue Jun 24, 2024 · 7 comments

Comments

@nepolak
Copy link

nepolak commented Jun 24, 2024

I've spotted a bug with ReplaySubject, which may leave newly-subscribed observer without some new notifications in rare cases in a current version (v. 1.1.12)

When an observer subscribes to ReplaySubject, it receives stored notifications under a lock, but then it is registered to ReplaySubject without the lock. If in between these lines, OnNext is called with a new notification from another thread, the observer will miss the new notification because its emission is not synchronized with new subscriptions.

dotnet/reactive implements ReactiveSubject with synchronization of both notification emission and subscription.

Easy way to fix that is to put subscription (SubscribeCore) and notification emission (OnNext) under the lock too.

@neuecc
Copy link
Member

neuecc commented Jun 25, 2024

Thanks for the details, this issue is important and I will check it out right away.

neuecc added a commit that referenced this issue Jun 25, 2024
@ninjaoxygen
Copy link

Just to confirm we are also seeing what sounds like the same bug. It looks like a really short window of race only on the first OnNext/Value call. We have a test case with just ReactiveProperty -> Select -> Switch. In production we only see the bug around 1 in 200.

In our case, the Select constructs a new ReactiveProperty, subscribes over the network and calls OnNext within ~2ms. If we add a 10ms delay before the OnNext, it solves the problem, but then we need to add more locks to avoid out-of-order delivery.

If we swap the Select for SelectAwait before the Switch the bug goes away.

@nepolak
Copy link
Author

nepolak commented Jun 25, 2024

Just to confirm we are also seeing what sounds like the same bug. It looks like a really short window of race only on the first OnNext/Value call. We have a test case with just ReactiveProperty -> Select -> Switch. In production we only see the bug around 1 in 200.

In our case, the Select constructs a new ReactiveProperty, subscribes over the network and calls OnNext within ~2ms. If we add a 10ms delay before the OnNext, it solves the problem, but then we need to add more locks to avoid out-of-order delivery.

If we swap the Select for SelectAwait before the Switch the bug goes away.

Seems like race condition also, because OnNext is not synchronized with subscriptions. Current value of ReactiveProperty is emitted to a new observer without a lock, and only subscription is under the lock. It is possible to call OnNext, which doesn't seem to have any synchronization at all, in between these lines and have a notification missing.

@nepolak
Copy link
Author

nepolak commented Jun 25, 2024

If we swap the Select for SelectAwait before the Switch the bug goes away.

I'd rather reimplement ReactiveProperty with locking in OnNext and SubscribeCore and see if it goes away. I think the problem is in ReactiveProperty and not in Select/SelectAwait or Switch.

@nepolak
Copy link
Author

nepolak commented Jun 25, 2024

If we swap the Select for SelectAwait before the Switch the bug goes away.

I'd rather reimplement ReactiveProperty with locking in OnNext and SubscribeCore and see if it goes away. I think the problem is in ReactiveProperty and not in Select/SelectAwait or Switch.

Just investigated Switch, it seems fine, I don't see any possibility for race condition there.

@neuecc
Copy link
Member

neuecc commented Jun 25, 2024

Thank you, once we release the current fix code that includes the ReplaySubject issue, we will look into this issue as soon as possible.

nepolak added a commit to nepolak/R3 that referenced this issue Jun 25, 2024
nepolak added a commit to nepolak/R3 that referenced this issue Jun 25, 2024
@nepolak
Copy link
Author

nepolak commented Jun 25, 2024

Thank you, once we release the current fix code that includes the ReplaySubject issue, we will look into this issue as soon as possible.

I've created a pull request for the ReactiveProperty issue

@neuecc neuecc closed this as completed Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants