Deadlock on beginWriteTransaction #4896

o15a3d4l11s2 · 2017-04-24T22:20:28Z

Goals

Begin a write transaction

Expected Results

The transaction is started

Actual Results

Deadlock (not every time, but at the same place)

Steps to Reproduce

Code Sample

[[RLMRealm defaultRealm] transactionWithBlock:block error:nil];

The block itself contains a change of single property.

Version of Realm and Tooling

ProductName:	Mac OS X
ProductVersion:	10.12.4
BuildVersion:	16E195

/Applications/Xcode.app/Contents/Developer
Xcode 8.3.2
Build version 8E2002

/usr/local/bin/pod
1.2.1
Realm (2.6.2)

/bin/bash
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin16)

/usr/local/bin/carthage
0.16.2
(not in use here)

/usr/local/bin/git
git version 2.10.1

Unfortunately it is close to impossible to send a sample project, but if this might help, I am available for any tests/debugs to do

Might be similar to #4428

The text was updated successfully, but these errors were encountered:

jpsim · 2017-04-25T00:03:25Z

The stack trace you shared strongly indicates that there's more involved in your app than just a single write transaction. Looks like you also have some notification blocks, which would be significant. What happens when you remove those notification blocks?

Also, are you running more than one process accessing this Realm file at a time?

kevincador · 2017-04-25T19:25:09Z

We're experiencing the same issue here. There's an unusual long lock when beginning a transaction. We found that when we upgraded to Realm 2.6.2.
The easy fix was to switch to 2.6.1. The bug is not present on 2.6.1 but on 2.6.2 it is present (same code base).

jpsim · 2017-04-25T20:02:25Z

Have you set up any notification blocks that may be observing changes to objects with circular links? If so, this may be caused by realm/realm-object-store#432

kevincador · 2017-04-25T20:25:25Z

By circular link you mean an object linking another object which itself links to the first one, right? I think it is the case. We have an object A referencing an object B that itself reference object A.
We have a lot of notification blocks observing a lot of stuff (with rx). So it is possible that we are observing those objects, yes.
What would be your recommendation?

jpsim · 2017-04-25T20:28:25Z

What would be your recommendation?

To give us a way to reproduce the "unusual long lock" ourselves so we can identify why this is happening.

TimOliver · 2017-05-03T21:02:49Z

Hi @o15a3d4l11s2 and @kevincador!

A sample app to reproduce this issue would be really valuable to us. Would you be able to isolate your Realm models to show us howe you've set up those circular links?

Thanks!

semireg · 2017-05-09T20:41:17Z

We can confirm 2.7.0 and 2.6.2 affects us in the same way and yes, 2.6.1 works fine. We are downgrading to 2.6.1 and reevaluating our uses of realm notifications. We will remove them where necessary. This has caused major heartburn with our clients.

austinzheng · 2017-05-09T22:03:45Z

We're deeply sorry that this issue has caused you problems. Would you be willing to share any information about what the apps are doing when the deadlock occurs, any stack traces, etc? I will attempt to get this escalated with the appropriate engineers as well.

semireg · 2017-05-10T14:18:03Z

This use case certainly fits the mold of the cyclical parent <--> child relationships. We think we're managing it properly, but the issue with deadlock in notifications makes us less confident.

austinzheng · 2017-05-10T19:48:40Z

This is great. Is there any chance you can create a reproduction case based on the subset of the code that is causing the problem, so we can poke at a live example?

semireg · 2017-05-10T19:57:20Z

Unfortunately, no. We're in the same boat as with this - #4428 (comment)

austinzheng · 2017-05-10T19:58:33Z

Alright, I see you provided some suggestions for creating a similar project. I'll put something together and see what happens.

austinzheng · 2017-05-11T00:03:33Z

@semireg Thanks for the tips in the linked comment. I put together a test application earlier today and have been playing around with it. Haven't reproduced the deadlock yet, although I understand this type of problem can be tricky to replicate. If I uploaded the project, would you be willing to take a look and see if there is anything blindingly obvious that I need to change to more closely match the conditions of your apps?

semireg · 2017-05-11T00:04:56Z

You bet.

BartBM · 2017-06-19T05:57:32Z

We experience a temporary deadlock problem using Realm 2.6.1 (lock for about 13 seconds). There is a workaround by removing all addNotificationBlock methods.
What I wanted to add is that when we upgrade from realm 2.6.1 to >= 2.6.2 we experience a permanent deadlock and can't get out of this state.

austinzheng · 2017-06-19T16:19:14Z

Hi @antonkiselev. Thanks again for providing a working reproduction case. We're trialling a new program: to express our appreciation for your time and effort, if you'd like a free Realm T-shirt please email your address and T-shirt size to help@realm.io, then post back here so we know it's you.

antonkiselev · 2017-06-20T20:10:40Z

@austinzheng I've sent email few minutes ago. Thanks for supporting community with nice gifts. Waiting for a bug fix in upcoming release :)

bdash · 2017-06-20T20:12:03Z

As an update, I'm currently investigating this.

bdash · 2017-06-20T20:40:11Z

@antonkiselev, your issue appears to be different than the problem that was originally reported here.

In your test case, the deadlock occurs when a background thread uses NSAttributedString's HTML parsing functionality from within a Realm write transaction. This HTML parsing delegates to WebKit, which is a main-thread-only API, and waits for the work to complete before returning. At the same time, your main thread is blocked attempting to begin a Realm write transaction. It's unable to do so as a background thread is already within a write transaction. The background thread is also unable to make progress as it requires the main thread to make progress in order to parse the HTML.

My immediate suggestion to address this would be to avoid using NSAtrributedString's HTML parsing functionality within Realm write transactions as you have the potential to hit a deadlock like this if you ever perform a write transaction on the main thread.

If anyone else is still able to reproduce the deadlock as originally described and can share a reproducible case with us, we'd be very grateful for it. Feel free to share it privately via email to help@realm.io.

BartBM · 2017-07-04T06:43:06Z

@bdash I've emailed a reproducible scenario. Hope it helps!

weibel · 2017-07-21T13:23:58Z

We have had this issue for a while and have tried different approaches to solve it.

The TL;DR, is that we think we have found a way to circumvent it.
The root cause for the deadlock should still be identified and dealt with inside Realm.

We would have Realm randomly lock up on RealmCoordinator::wait_for_notifiers on either beginWriteTransaction or refresh if there was a Realm notification listener connected to any objects involved in those operations.

First we mitigated the deadlock on beginWriteTransaction by dispatching write transactions to a background queue. We did use an autoreleasepool inside the background work as suggested by the docs. Then after the transaction had finished we would dispatch back to the main queue in case we needed to process any results from the transaction.

Back on the main dispatch queue we would often need to refresh to see the results of the transaction. This is backed up by the following

From #4837

[…]when dispatch_async’ing to the main thread immediately after a write transaction it's possible for the async block to be dequeued and invoked before Realm's commit notification makes it to the main thread. In this case you may see the Realm as not having refreshed to the latest version, and calling Realm.refresh() is the correct way to deal with that.

This suggest that the auto refresh feature is not synced with the main queue. This could actually be the case. This guy ran some tests https://blackpixel.com/writing/2013/11/performselectoronmainthread-vs-dispatch-async.html and also https://stackoverflow.com/questions/10440412/perform-on-next-run-loop-whats-wrong-with-gcd

At this point refresh would still cause a deadlock with random intervals

This lead us to try to see if we could start execution on the main thread after we knew for sure the Realm had been auto refreshed.

Instead of going with dispatch_async to the main queue we tried using the performSelector family of methods, which are guaranteed to run on the next run loop cycle. So far it's working for us.

This should also mitigate the need to explicitly refresh as auto refresh should have run at this time.

kharmabum · 2017-07-26T02:10:10Z

Thanks @weibel, much appreciated. FWIW I also alleviated this issue (and improved realm notification related performance) by replacing my object references with primary key references.

#import "NSThread-PerformBlock.h"

@implementation NSThread (PerformBlock)

+ (void)performBlockOnMainThread:(void (^)())block {
    [[NSThread mainThread] orca_performBlock:block];
}

+ (void)runBlock:(void (^)())block{
    block();
}

- (void)performBlock:(void (^)())block {
    if ([[NSThread currentThread] isEqual:self]) {
        block();
    } else {
        [self performBlock:block waitUntilDone:NO];
    }
}
- (void)performBlock:(void (^)())block waitUntilDone:(BOOL)wait {
    
    [NSThread performSelector:@selector(runBlock:)
                     onThread:self
                   withObject:[block copy]
                waitUntilDone:wait];
}

@end

    fileprivate(set)
    var creator: User? {
        get {
            if isInvalidated { return nil }
            guard let key = _creatorPrimaryKey else { return nil }
            return User.fetch(key)
        }
        set {
            _creatorPrimaryKey = newValue?.key
        }
    }

nikolsky2 · 2017-08-09T06:19:50Z

I've got the same deadlock on Realm Swift 2.9.1 in write transaction. Moving all realm accessing code to main thread, as @antonkiselev said, did not fix the problem. I've tried moving everything to the background threads, as @weibel specified, I got lock inside realm.refresh.

I've used this function for all write transactions on main thread in the app. The first realm write goes through successfully, the second one gets stuck in try self.write(block)

extension Realm {
	func performWriteTransaction(_ block: (() throws -> Void)) {
	    precondition(!isInWriteTransaction, "realm.write() was called when isInWriteTransaction == true")
	    dispatchPrecondition(condition: .onQueue(.main))

	    do {
	        try self.write(block)
	    } catch {
	        print("Realm write error")
	    }
	}
}

Also I can add when I disable all notification blocks it doesn't fall in a deadlock.

plm75 · 2017-08-12T08:03:40Z

I'm facing a similar issue where the app "freezes" for quite some time apparently due to the triggering of a batch write operation.

It must be noted that I have never had this issue when running the app in the simulator. Only on real devices.

Can other people in this issue confirm that as well or is this maybe another issue ?

kharmabum · 2017-08-19T00:07:01Z

@plm75 I did notice that what was appearing as a deadlock was in fact realm notification logic (apparently triggered by the circular references) that would sometimes eventually complete after a very long period of time, O(minutes).

nikolsky2 · 2017-08-22T03:59:19Z

I'm not getting any issues on 2.10.0 anymore. Thank you!

bdash · 2017-08-22T04:23:33Z

There were changes in v2.10.0 that should mitigate this issue in many cases. I'd love to hear from anyone that's still seeing this after updating to v2.10.0 or newer.

kharmabum · 2017-08-23T23:15:16Z

Will do! Thank you.

emuye · 2017-10-27T23:15:58Z

Just ran into a deadlock with several Realm threads trying to write in v2.10.10. I've attached the backtrace. This isn't something that's reproducible but I'm happy to share our schema if that's useful.

backtrace.txt.zip

bdash · 2017-10-27T23:26:22Z

@emuye, your problem appears to be unrelated to the issue as originally reported here.

Your backtraces show that thread 24 has an open Realm write transaction, and is blocked on a call to dispatch_sync. Similarly, your main thread is blocked on a call to dispatch_sync. Two other background threads appear to be waiting to acquire Realm's write lock, which they cannot do until thread 24 releases it. At least one of these threads looks like it may be executing in the context of the dispatch queue that the various dispatch_sync calls are attempting to target.

This seems like a classic case of inconsistent lock ordering leading to a deadlock. One part of your code acquires Realm's write lock then attempts to dispatch_sync to a specific serial queue, while another part of your code attempts to acquire Realm's write lock while already running on the serial queue.

emuye · 2017-10-27T23:35:06Z

Thanks so much @bdash! I missed the open Realm write on thread 24. I agree this is unrelated (and completely an issue in our code) :)

jpsim added the T-Help label Apr 25, 2017

jpsim added the S:Waiting For User label Apr 25, 2017

pigeondev2 assigned jpsim Apr 25, 2017

jpsim assigned kishikawakatsumi and unassigned jpsim Apr 25, 2017

realm-ci removed the S:Waiting For User label Apr 25, 2017

jpsim added the S:Waiting For User label Apr 25, 2017

realm-ci removed the S:Waiting For User label Apr 25, 2017

jpsim added the S:Waiting For User label Apr 25, 2017

realm-ci removed the S:Waiting For User label May 9, 2017

austinzheng added the S:Waiting For User label May 9, 2017

realm-ci removed the S:Waiting For User label May 10, 2017

austinzheng added the S:Waiting For User label May 10, 2017

realm-ci removed the S:Waiting For User label May 10, 2017

austinzheng added T-Bug and removed T-Help labels May 10, 2017

bdash self-assigned this Jun 16, 2017

austinzheng added S:In Progress and removed S:P1 Backlog labels Jun 20, 2017

bdash added Reproduction-Required S:P1 Backlog and removed S:In Progress labels Jun 20, 2017

bmunkholm added S:Backlog and removed S:P1 Backlog labels Jul 10, 2017

bmunkholm removed the S:Backlog label Aug 21, 2017

bdash removed their assignment Mar 21, 2018

bmunkholm added the O-Community label Jun 13, 2018

bmunkholm closed this as completed Jan 31, 2019

github-actions bot locked as resolved and limited conversation to collaborators Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock on beginWriteTransaction #4896

Deadlock on beginWriteTransaction #4896

o15a3d4l11s2 commented Apr 24, 2017

jpsim commented Apr 25, 2017

kevincador commented Apr 25, 2017

jpsim commented Apr 25, 2017

kevincador commented Apr 25, 2017

jpsim commented Apr 25, 2017

TimOliver commented May 3, 2017

semireg commented May 9, 2017

austinzheng commented May 9, 2017

semireg commented May 10, 2017

austinzheng commented May 10, 2017

semireg commented May 10, 2017

austinzheng commented May 10, 2017

austinzheng commented May 11, 2017

semireg commented May 11, 2017 via email

BartBM commented Jun 19, 2017 •

edited

Loading

austinzheng commented Jun 19, 2017

antonkiselev commented Jun 20, 2017

bdash commented Jun 20, 2017

bdash commented Jun 20, 2017

BartBM commented Jul 4, 2017

weibel commented Jul 21, 2017 •

edited

Loading

kharmabum commented Jul 26, 2017 •

edited

Loading

nikolsky2 commented Aug 9, 2017 •

edited

Loading

plm75 commented Aug 12, 2017

kharmabum commented Aug 19, 2017

nikolsky2 commented Aug 22, 2017

bdash commented Aug 22, 2017

kharmabum commented Aug 23, 2017

emuye commented Oct 27, 2017

bdash commented Oct 27, 2017

emuye commented Oct 27, 2017

Deadlock on beginWriteTransaction #4896

Deadlock on beginWriteTransaction #4896

Comments

o15a3d4l11s2 commented Apr 24, 2017

Goals

Expected Results

Actual Results

Steps to Reproduce

Code Sample

Version of Realm and Tooling

jpsim commented Apr 25, 2017

kevincador commented Apr 25, 2017

jpsim commented Apr 25, 2017

kevincador commented Apr 25, 2017

jpsim commented Apr 25, 2017

TimOliver commented May 3, 2017

semireg commented May 9, 2017

austinzheng commented May 9, 2017

semireg commented May 10, 2017

austinzheng commented May 10, 2017

semireg commented May 10, 2017

austinzheng commented May 10, 2017

austinzheng commented May 11, 2017

semireg commented May 11, 2017 via email

BartBM commented Jun 19, 2017 • edited Loading

austinzheng commented Jun 19, 2017

antonkiselev commented Jun 20, 2017

bdash commented Jun 20, 2017

bdash commented Jun 20, 2017

BartBM commented Jul 4, 2017

weibel commented Jul 21, 2017 • edited Loading

kharmabum commented Jul 26, 2017 • edited Loading

nikolsky2 commented Aug 9, 2017 • edited Loading

plm75 commented Aug 12, 2017

kharmabum commented Aug 19, 2017

nikolsky2 commented Aug 22, 2017

bdash commented Aug 22, 2017

kharmabum commented Aug 23, 2017

emuye commented Oct 27, 2017

bdash commented Oct 27, 2017

emuye commented Oct 27, 2017

BartBM commented Jun 19, 2017 •

edited

Loading

weibel commented Jul 21, 2017 •

edited

Loading

kharmabum commented Jul 26, 2017 •

edited

Loading

nikolsky2 commented Aug 9, 2017 •

edited

Loading