Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock on beginWriteTransaction #4896

Closed
o15a3d4l11s2 opened this issue Apr 24, 2017 · 42 comments
Closed

Deadlock on beginWriteTransaction #4896

o15a3d4l11s2 opened this issue Apr 24, 2017 · 42 comments

Comments

@o15a3d4l11s2
Copy link

Goals

Begin a write transaction

Expected Results

The transaction is started

Actual Results

Deadlock (not every time, but at the same place)

screen shot 2017-04-25 at 01 12 55

screen shot 2017-04-25 at 01 14 14

Steps to Reproduce

Code Sample

[[RLMRealm defaultRealm] transactionWithBlock:block error:nil];

The block itself contains a change of single property.

Version of Realm and Tooling

ProductName:	Mac OS X
ProductVersion:	10.12.4
BuildVersion:	16E195

/Applications/Xcode.app/Contents/Developer
Xcode 8.3.2
Build version 8E2002

/usr/local/bin/pod
1.2.1
Realm (2.6.2)

/bin/bash
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin16)

/usr/local/bin/carthage
0.16.2
(not in use here)

/usr/local/bin/git
git version 2.10.1

Unfortunately it is close to impossible to send a sample project, but if this might help, I am available for any tests/debugs to do

Might be similar to #4428

@jpsim jpsim added the T-Help label Apr 25, 2017
@jpsim
Copy link
Contributor

jpsim commented Apr 25, 2017

The stack trace you shared strongly indicates that there's more involved in your app than just a single write transaction. Looks like you also have some notification blocks, which would be significant. What happens when you remove those notification blocks?

Also, are you running more than one process accessing this Realm file at a time?

@kevincador
Copy link

We're experiencing the same issue here. There's an unusual long lock when beginning a transaction. We found that when we upgraded to Realm 2.6.2.
The easy fix was to switch to 2.6.1. The bug is not present on 2.6.1 but on 2.6.2 it is present (same code base).

@jpsim
Copy link
Contributor

jpsim commented Apr 25, 2017

Have you set up any notification blocks that may be observing changes to objects with circular links? If so, this may be caused by realm/realm-object-store#432

@kevincador
Copy link

By circular link you mean an object linking another object which itself links to the first one, right? I think it is the case. We have an object A referencing an object B that itself reference object A.
We have a lot of notification blocks observing a lot of stuff (with rx). So it is possible that we are observing those objects, yes.
What would be your recommendation?

@jpsim
Copy link
Contributor

jpsim commented Apr 25, 2017

What would be your recommendation?

To give us a way to reproduce the "unusual long lock" ourselves so we can identify why this is happening.

@TimOliver
Copy link
Contributor

Hi @o15a3d4l11s2 and @kevincador!

A sample app to reproduce this issue would be really valuable to us. Would you be able to isolate your Realm models to show us howe you've set up those circular links?

Thanks!

@semireg
Copy link

semireg commented May 9, 2017

We can confirm 2.7.0 and 2.6.2 affects us in the same way and yes, 2.6.1 works fine. We are downgrading to 2.6.1 and reevaluating our uses of realm notifications. We will remove them where necessary. This has caused major heartburn with our clients.

@austinzheng
Copy link
Contributor

We're deeply sorry that this issue has caused you problems. Would you be willing to share any information about what the apps are doing when the deadlock occurs, any stack traces, etc? I will attempt to get this escalated with the appropriate engineers as well.

@semireg
Copy link

semireg commented May 10, 2017

2017-05-10 at 9 17 am

2017-05-10 at 9 09 am

This use case certainly fits the mold of the cyclical parent <--> child relationships. We think we're managing it properly, but the issue with deadlock in notifications makes us less confident.

@austinzheng
Copy link
Contributor

This is great. Is there any chance you can create a reproduction case based on the subset of the code that is causing the problem, so we can poke at a live example?

@semireg
Copy link

semireg commented May 10, 2017

Unfortunately, no. We're in the same boat as with this - #4428 (comment)

@austinzheng
Copy link
Contributor

Alright, I see you provided some suggestions for creating a similar project. I'll put something together and see what happens.

@austinzheng austinzheng added T-Bug and removed T-Help labels May 10, 2017
@austinzheng
Copy link
Contributor

@semireg Thanks for the tips in the linked comment. I put together a test application earlier today and have been playing around with it. Haven't reproduced the deadlock yet, although I understand this type of problem can be tricky to replicate. If I uploaded the project, would you be willing to take a look and see if there is anything blindingly obvious that I need to change to more closely match the conditions of your apps?

@semireg
Copy link

semireg commented May 11, 2017 via email

@bdash bdash self-assigned this Jun 16, 2017
@BartBM
Copy link

BartBM commented Jun 19, 2017

We experience a temporary deadlock problem using Realm 2.6.1 (lock for about 13 seconds). There is a workaround by removing all addNotificationBlock methods.
What I wanted to add is that when we upgrade from realm 2.6.1 to >= 2.6.2 we experience a permanent deadlock and can't get out of this state.

@austinzheng
Copy link
Contributor

Hi @antonkiselev. Thanks again for providing a working reproduction case. We're trialling a new program: to express our appreciation for your time and effort, if you'd like a free Realm T-shirt please email your address and T-shirt size to help@realm.io, then post back here so we know it's you.

@antonkiselev
Copy link

@austinzheng I've sent email few minutes ago. Thanks for supporting community with nice gifts. Waiting for a bug fix in upcoming release :)

@bdash
Copy link
Contributor

bdash commented Jun 20, 2017

As an update, I'm currently investigating this.

@bdash
Copy link
Contributor

bdash commented Jun 20, 2017

@antonkiselev, your issue appears to be different than the problem that was originally reported here.

In your test case, the deadlock occurs when a background thread uses NSAttributedString's HTML parsing functionality from within a Realm write transaction. This HTML parsing delegates to WebKit, which is a main-thread-only API, and waits for the work to complete before returning. At the same time, your main thread is blocked attempting to begin a Realm write transaction. It's unable to do so as a background thread is already within a write transaction. The background thread is also unable to make progress as it requires the main thread to make progress in order to parse the HTML.

My immediate suggestion to address this would be to avoid using NSAtrributedString's HTML parsing functionality within Realm write transactions as you have the potential to hit a deadlock like this if you ever perform a write transaction on the main thread.


If anyone else is still able to reproduce the deadlock as originally described and can share a reproducible case with us, we'd be very grateful for it. Feel free to share it privately via email to help@realm.io.

@BartBM
Copy link

BartBM commented Jul 4, 2017

@bdash I've emailed a reproducible scenario. Hope it helps!

@weibel
Copy link

weibel commented Jul 21, 2017

We have had this issue for a while and have tried different approaches to solve it.

The TL;DR, is that we think we have found a way to circumvent it.
The root cause for the deadlock should still be identified and dealt with inside Realm.

We would have Realm randomly lock up on RealmCoordinator::wait_for_notifiers on either beginWriteTransaction or refresh if there was a Realm notification listener connected to any objects involved in those operations.

First we mitigated the deadlock on beginWriteTransaction by dispatching write transactions to a background queue. We did use an autoreleasepool inside the background work as suggested by the docs. Then after the transaction had finished we would dispatch back to the main queue in case we needed to process any results from the transaction.

Back on the main dispatch queue we would often need to refresh to see the results of the transaction. This is backed up by the following

From #4837

[…]when dispatch_async’ing to the main thread immediately after a write transaction it's possible for the async block to be dequeued and invoked before Realm's commit notification makes it to the main thread. In this case you may see the Realm as not having refreshed to the latest version, and calling Realm.refresh() is the correct way to deal with that.

This suggest that the auto refresh feature is not synced with the main queue. This could actually be the case. This guy ran some tests https://blackpixel.com/writing/2013/11/performselectoronmainthread-vs-dispatch-async.html and also https://stackoverflow.com/questions/10440412/perform-on-next-run-loop-whats-wrong-with-gcd

At this point refresh would still cause a deadlock with random intervals

This lead us to try to see if we could start execution on the main thread after we knew for sure the Realm had been auto refreshed.

Instead of going with dispatch_async to the main queue we tried using the performSelector family of methods, which are guaranteed to run on the next run loop cycle. So far it's working for us.

This should also mitigate the need to explicitly refresh as auto refresh should have run at this time.

@kharmabum
Copy link

kharmabum commented Jul 26, 2017

Thanks @weibel, much appreciated. FWIW I also alleviated this issue (and improved realm notification related performance) by replacing my object references with primary key references.

#import "NSThread-PerformBlock.h"

@implementation NSThread (PerformBlock)

+ (void)performBlockOnMainThread:(void (^)())block {
    [[NSThread mainThread] orca_performBlock:block];
}

+ (void)runBlock:(void (^)())block{
    block();
}

- (void)performBlock:(void (^)())block {
    if ([[NSThread currentThread] isEqual:self]) {
        block();
    } else {
        [self performBlock:block waitUntilDone:NO];
    }
}
- (void)performBlock:(void (^)())block waitUntilDone:(BOOL)wait {
    
    [NSThread performSelector:@selector(runBlock:)
                     onThread:self
                   withObject:[block copy]
                waitUntilDone:wait];
}

@end
    fileprivate(set)
    var creator: User? {
        get {
            if isInvalidated { return nil }
            guard let key = _creatorPrimaryKey else { return nil }
            return User.fetch(key)
        }
        set {
            _creatorPrimaryKey = newValue?.key
        }
    }

@nikolsky2
Copy link

nikolsky2 commented Aug 9, 2017

I've got the same deadlock on Realm Swift 2.9.1 in write transaction. Moving all realm accessing code to main thread, as @antonkiselev said, did not fix the problem. I've tried moving everything to the background threads, as @weibel specified, I got lock inside realm.refresh.

screen shot 2017-08-09 at 4 03 55 pm

screen shot 2017-08-09 at 4 18 57 pm

I've used this function for all write transactions on main thread in the app. The first realm write goes through successfully, the second one gets stuck in try self.write(block)

extension Realm {
	func performWriteTransaction(_ block: (() throws -> Void)) {
	    precondition(!isInWriteTransaction, "realm.write() was called when isInWriteTransaction == true")
	    dispatchPrecondition(condition: .onQueue(.main))

	    do {
	        try self.write(block)
	    } catch {
	        print("Realm write error")
	    }
	}
}

Also I can add when I disable all notification blocks it doesn't fall in a deadlock.

@plm75
Copy link

plm75 commented Aug 12, 2017

I'm facing a similar issue where the app "freezes" for quite some time apparently due to the triggering of a batch write operation.

It must be noted that I have never had this issue when running the app in the simulator. Only on real devices.

Can other people in this issue confirm that as well or is this maybe another issue ?

@kharmabum
Copy link

@plm75 I did notice that what was appearing as a deadlock was in fact realm notification logic (apparently triggered by the circular references) that would sometimes eventually complete after a very long period of time, O(minutes).

@nikolsky2
Copy link

I'm not getting any issues on 2.10.0 anymore. Thank you!

@bdash
Copy link
Contributor

bdash commented Aug 22, 2017

There were changes in v2.10.0 that should mitigate this issue in many cases. I'd love to hear from anyone that's still seeing this after updating to v2.10.0 or newer.

@kharmabum
Copy link

Will do! Thank you.

@emuye
Copy link

emuye commented Oct 27, 2017

Just ran into a deadlock with several Realm threads trying to write in v2.10.10. I've attached the backtrace. This isn't something that's reproducible but I'm happy to share our schema if that's useful.

backtrace.txt.zip

@bdash
Copy link
Contributor

bdash commented Oct 27, 2017

@emuye, your problem appears to be unrelated to the issue as originally reported here.

Your backtraces show that thread 24 has an open Realm write transaction, and is blocked on a call to dispatch_sync. Similarly, your main thread is blocked on a call to dispatch_sync. Two other background threads appear to be waiting to acquire Realm's write lock, which they cannot do until thread 24 releases it. At least one of these threads looks like it may be executing in the context of the dispatch queue that the various dispatch_sync calls are attempting to target.

This seems like a classic case of inconsistent lock ordering leading to a deadlock. One part of your code acquires Realm's write lock then attempts to dispatch_sync to a specific serial queue, while another part of your code attempts to acquire Realm's write lock while already running on the serial queue.

@emuye
Copy link

emuye commented Oct 27, 2017

Thanks so much @bdash! I missed the open Realm write on thread 24. I agree this is unrelated (and completely an issue in our code) :)

@bdash bdash removed their assignment Mar 21, 2018
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests