-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Native libc crash on commit transaction in multithreaded environment #2224
Comments
Hm... unfortunately, the stacktrace here doesn't contain a whole lot of meaningful information other than that the crash indeed happens when committing a transaction. Can you share more information about your use case? Is this a release build - if so, are you using the AOT compiler for Android? How large is the Realm file and how much free space/memory is on the device at the time of the crash? You mention that the crash happens in a multithreaded environment - can you share some code samples that show how you open a Realm file, write to it, and close it - maybe there's some smoking gun there. |
Hi @nirinchev. Thanks for the quick response. I am gathering more information and plan to publish it shortly. |
Hi @nirinchev. First of all my name is Gino (nicknames are nice but the real name is perhaps better) Below you can find the required information: The app was created in release; The app during the lifecycle connects to a gateway and exchanges information with it. Below the code affected by this flow. (I hope it's enough)
private static async Task SaveSystemFunctions(
SFDiscoveryResponse response,
Gateway.GatewayTypes gatewayTypes,
string sfCategory,
bool deleteSfs = false)
{
await DataManager.Instance.UpdateSystemFunctions(
response,
gatewayTypes,
sfCategory,
Gateway.IsVideoGateway(gatewayTypes),
deleteSfs).ConfigureAwait(false);
}
public async Task UpdateSystemFunctions(
SFDiscoveryResponse sfdr,
Gateway.GatewayTypes gatewayType,
string sfCategory,
bool isVideoentryphoneSf = false,
bool deleteSfs = false)
{
DiffAndRemove(sfdr, gatewayType, sfCategory, deleteSfs);
**await PersistSourceItems(sfdr, isVideoentryphoneSf, gatewayType).ConfigureAwait(false);
await RealmAsyncServiceProvider.Connections
.AsyncContextExec(UpdateAggregatorsCount, true)
.ConfigureAwait(false);**
UpdateHueEntries();
}
private async Task PersistSourceItems(SFDiscoveryResponse sfdr, bool isVideoentryphoneSf, Gateway.GatewayTypes gatewayType)
{
var toAdd = new List<ISystemFunctionData>();
var r = ByMeRealmProvider.GetRealm();
{
foreach (var result in sfdr.Result)
{
var sourceSfs = result.Sf;
var idAmbient = result.IdAmbient;
if (sourceSfs.Count > 0)
{
foreach (var x in sourceSfs)
{
switch (x.Sftype)
{
case SystemFunction.Scene:
{
var sceneryData = CreateSceneryDataObject(x, sfdr.Source, idAmbient);
toAdd.Add(sceneryData);
break;
}
case SystemFunction.SceneActivator:
{
var sceneActivatorData = CreateSceneActivatorObject(x, sfdr.Source);
toAdd.Add(sceneActivatorData);
break;
}
default:
{
if (isVideoentryphoneSf)
{
var oldIdAmbientUsers = Queries.GetAllVideoentryphone(r)
.Where(s => s.Sftype != VG2FHelper.DigitalCamera)
.ToList()
.ToDictionary(s =>
s.Idsf, s => s.IdAmbientUser);
oldIdAmbientUsers.TryGetValue(x.Idsf, out var oldIdAmbientUser);
var item = GetVideoEntryphoneSf(sfdr, x, idAmbient, gatewayType, oldIdAmbientUser);
toAdd.Add(item);
}
else if (IsKnxSf(x))
{
var knx = GetKnxSf(x, idAmbient, sfdr.Source);
toAdd.Add(knx);
}
else
{
var sfd = CreateSystemFunctionDataObject(x, idAmbient, sfdr.Source);
toAdd.Add(sfd);
if (x.Sstype == ClimaRealization.ClimaZone ||
x.Sstype == ClimaRealization.ClimaControl)
{
var climaOnOff = GetClimaOnOfModalityData(x.Idsf, sfdr.Source, x.Sstype == ClimaRealization.ClimaControl);
if (climaOnOff != null)
r.Write(() => r.Add(climaOnOff));
}
}
break;
}
}
}
}
else
{
var sfd = CreateSystemFunctionDataObject(result, sfdr.Source);
toAdd.Add(sfd);
}
}
r.TryDispose();
if (toAdd.Count > 0)
{
var toBePersisted = toAdd.OfType<RealmObject>().ToList();
**await RealmAsyncServiceProvider.Connections.PersistAsync(toBePersisted).ConfigureAwait(false);**
}
}
}
public async Task PersistAsync(List<RealmObject> toBePersisted)
{
var f = new Func<Realm, bool>(r =>
{
r.Write(() =>
{
foreach (var x in toBePersisted)
**Realm.Add**(x, true);
});
return true;
});
await AsyncContextExec(f).ConfigureAwait(false);
} private RealmAsyncServiceProvider()
{
AcThread = new AsyncContextThread();
AcThread.Factory.Run(() =>
{
**Realm = ByMeRealmProvider.GetRealm();**
RealmInit.Set();
});
} |
Hey Gino, thanks for the code samples - for the most part, things seem fine. In |
Hi @nirinchev, thanks for the reply. Below the AsyncContextExec implementation. public async Task<T> AsyncContextExec<T>(Func<Realm, T> toExec, bool isTransaction = false)
{
var f = new Func<Task<T>>(async () =>
{
RealmInit.WaitOne();
Task<T> t = default;
T ret = default;
try
{
if (toExec == null)
{
await Task.FromException(new ArgumentException("Execution object is null")).ConfigureAwait(false);
}
else
{
if (isTransaction)
{
t = AcThread.Factory.Run(() =>
{
T res = default;
Realm.Write(() => res = toExec.Invoke(Realm));
return res;
});
}
else
{
t = AcThread.Factory.Run(() => toExec.Invoke(Realm));
}
ret = await t.ConfigureAwait(false);
}
}
catch (Exception ex)
{
if (t?.Exception != null)
{
LogBroker.Instance.TraceDebug("aggregate exception caught during async context execution!");
var e = t.Exception.Flatten().InnerExceptions;
foreach (var x in e)
Utils.TraceException(x);
}
else
{
Utils.TraceException(ex);
}
}
return ret;
});
return await Scheduler.Exec(f).ConfigureAwait(false);
} |
actually it should be 'r' as you mention, my bad. It should not be a problem as the 'r' reference is set to 'Realm' inside "AsyncContextExec()". |
Hm... the async code is a bit complicated to read - am I correct in assuming that the purpose of the code is to serialize all Realm access on a single thread? If that's the case, then it's hard for me to see how this code can correlate with the stacktrace posted in #2207. The crash there happens when the app calls |
Your assumption is right, we did serialize Realm read/writes via a singleton containing a private Realm belonging to an AsyncContexThread, believing the problem we were facing (mutex crash) was due to parallel realm writes on different realm instances in concurrent threads; even by doing that though, the error persists. By error I mean the one outlined in the stacktrace below, where you can see a native crash during pthread_mutex_lock() which most likely happens during the execution of the code posted by @lagmac , sharing the same description (Thread Pool Wor) of the one from which the discussion started.
Am I right assuming that by doing read/write calls to the same realm instance via an AsyncContexThread, the calls are serialized and safe to perform? Thanks in advance! |
I'll keep looking at it, but in the meantime, I have an idea about something we could try - if you can reproduce that occasionally in your development environment, I can prepare a debug build of the native binaries for you to run with, which will give us a more meaningful stacktrace. |
This would be perfect, we replicate the issue easily. |
@fede-marsiglia hm, now that I look at it - you seem to be using version 5.1.1 of the .NET SDK, which is somewhat old. Would you mind trying to upgrade to 10.1.0? If you're only using the local database, it should be a fairly seamless upgrade and should bring a bunch of bug fixes and performance improvements. While I don't believe it will fix the mutex issue, it will be much easier for me to produce a debug build of the 10.x line of releases and when we do identify the issue and fix it, the fix will be released as a 10.x release, so you'll need to upgrade anyway. |
Just upgraded to 10.1.0, waiting for the debug build. Thanks again! |
Here's a link to an archive containing the nupkg-es with the debug binaries. I hope that running the app with these would give us a more meaningful stacktrace for the crash 🤞 |
Thank you @nirinchev, will let you know as soon as we reproduce the issue. |
We just reproduced the issue, stacktrace below:
|
Very helpful, thank you! I'll ping some folks from the Core database team to look at it and take over. |
@nirinchev could you please share updates as soon as you have them? Unfortunately it is mandatory that we have this issue fixed in order to release our app, and deadline is short. Again, we really appreciate your prompt and careful support. |
I transferred this to the Core team and they are looking into it, but we don't have a timeline for a fix yet. |
Understandable. Can I ask if the issue is 100% Realm related or there is the possibility of a bad API usage from our part? |
Okay, so it looks like the Realm instance you're getting is invalid. This should be impossible, but apparently we have a bug somewhere. So few more clarifying questions:
|
@nirinchev seems a problem like the one I had with caching giving me a Realm that should be closed. |
We weren't able to reproduce the problem in a separate test environment, and since this arises in a code section that gets triggered during comunication beetween our app and our client's gateways, we can't share a self containing project that exhibits this same issue. As for the second request, I'll soon write a separate post with all the details you need. |
Hi @nirinchev, here is a simplified, pseudo code version of what we actually do when the issue arises. I focused only on parts that make explicit realm use, hopefully it will be easier for you to identify suspicious code patterns. I've also unfolded all the called methods so that, even if the result is a wall of text, following the code should be straightforward. Let me know if something is unclear and I'll dive deeper into implementation details. Thanks! Update(realm)
{
realm.Write(() =>
{
if (c1)
{
...
foreach (var x in list1)
{
var type = GetSystemFunctionType(x);
var items = realm.All<type>();
foreach (var y in items)
realm.Remove(y)
}
...
}
if (c2)
{
...
if (c3)
{
var item = realm.All<t1>.FirstOrDefault(c4);
if (item != null)
realm.Remove(item);
}
...
}
if (c5)
{
var item = realm.All<t2>().FirstOrDefault(c6);
if (item != null && c7)
{
...
var sfs = realm.All<t3>();
...
foreach (x in list2)
{
var item = realm.All<t3>().FirstOrDefault(c8);
if (item != null)
realm.Remove(x)
}
...
}
}
foreach (var x in list3)
{
...
var type = GetSystemFunctionType(x);
var items = realm.All<type>();
foreach (var y in items)
realm.Remove(y)
...
}
foreach (var x in list4)
{
...
var obj = CreateRealmObject(x);
realm.Add(obj, update: true)
...
}
foreach (var x in list5)
{
if (x is RealmObject obj)
{
if (c9)
{
...
obj.Property = (int )n;
...
}
else
{
...
var value = realm.All<t4>().Count() +
realm.All<t5>().Count() +
realm.All<t6>().Count() +
realm.All<t7>().Count();
x.Property = (int )value;
...
}
}
}
}
} This same method gets called every time we do attach to a gateway, so, in case of multiple attach at the same time, it could theoretically run in parallel with other copies. |
Hey Federico, I'm sorry - I must have not been very clear with my request. The code you posted seems just fine and is unlikely to be the source of the issue. The stacktrace points to an issue when opening a realm file (i.e. when calling GetInstance) - instead of obtaining a valid native instance, it looks like we're getting an invalid one. So if you do get to a point where you're using the Realm, it's unlikely that this will cause issues. My best guess is that you're hitting some corner case with the caching mechanism where the cache thinks it has a valid Realm instance to return, but in reality it has already been disposed/invalidated. So can you try and look at your code for all the places where you're opening and closing a Realm file. Namely, it's interesting to know:
Thanks again for your help tracking that down! |
|
Okay, so let me try to summarize some of your answers and see if I understand things correctly:
|
IpcClient.SaveSystemFunctions() is the method in which we do all the transactions we need to store the informations coming from the gateway, using DataManager.Instance.UpdateSystemFunction() internally, which contains the pseudo code shared in my previous post. I confirm that at that point the synchronization context is null
|
Okay, that's great! One follow up question - what does
the runtime will guarantee that |
The purpose behind TryDispose() is to have meaningful informations during the disposage of a realm instance in case of an exception; the relevant part is as follows: try
{
if (obj is Realm re)
{
var disposed = false;
try
{
if (!re.IsClosed)
{
re.Dispose();
disposed = true;
Interlocked.Decrement(ref Utils.RealmCurrentInstances);
}
}
catch (Exception)
{
MainThread.BeginInvokeOnMainThread(() =>
{
try
{
if (!re.IsClosed)
{
re.Dispose();
disposed = true;
Interlocked.Decrement(ref Utils.RealmCurrentInstances);
}
}
catch (Exception ex)
{
Utils.TraceTryDisposeException(ex, callingMethod, callingFilePath, callingFileLineNumber);
disposed = false;
}
});
}
return disposed;
}
}
catch (Exception ex)
{
Utils.TraceTryDisposeException(ex, callingMethod, callingFilePath, callingFileLineNumber);
} We will certainly try the using pattern to see if it has a positive impact. Thanks again! |
@fede-marsiglia sorry to bother you but checking in - did replacing that with a |
Hi @nirinchev, unfortunately, due to other priorities, I wasn't able to do extensive testing with the tweak you suggested. I will surely take some time today to do that, will let you know what I find. Thanks! |
No worries, take as much time as you need! I was checking in mostly to make sure we were on the same page and that you weren't waiting on us for something without me realizing it 😄 |
Hi @nirinchev, sorry for not showing up until now, but here I am. After setting up automated testing to trigger the described behavior on 2 different app versions, both with the tweak suggested and without it, we weren't able to reproduce the issue with none of them. At this point, this could be caused by some random variables dependant upon the specific characteristics of our client's system. Will do more testing and let you know. |
Hi @nirinchev, do you have any news regarding the issue? Thanks in advance. |
Hi @fede-marsiglia we don't have any updates. In your previous message you said:
so I've been waiting for updates from your team. |
Ok, I was referring specifically to this line
Is there any news? |
Oh, for that one, my next message summed up the results from their investigation:
Essentially, the Realm reference appears to be invalidated even though the Realm itself is still in use. Unfortunately, without a somewhat reliable way to repro, it's next to impossible to say what's causing it to be invalidated. I did a thorough review of the code that handles the Realm instance lifecycle and couldn't find any smoking guns there, so if the bug is in the Realm SDK, then it's certainly not obvious. |
Ok, thanks again for your support, we'll try to find a consistent way to reproduce the issue |
This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further. |
As per discussion #2207 opened (now closed) a few days ago by my colleague, a similar problem has now also occurred on a Samsung A40 device. This issue is related to a crash during a commit transaction as per the crash report below.
My questions are :
Goals
Successfully commit the transaction
Expected Results
The transaction commit must complete successfully
Actual Results
Native libc crash
--------- beginning of crash
01-18 09:01:49.253 917 917 F linker : CANNOT LINK EXECUTABLE "/system/bin/sec_diag_uart_log": library "libdiag_system.so" not found
01-18 15:37:01.267 9030 3102 F libc : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x72616d69725024 in tid 3102 (Thread Pool Wor), pid 9030 (com.vimar.view)
01-18 15:37:01.634 3566 3566 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
01-18 15:37:01.634 3566 3566 F DEBUG : Build fingerprint: 'samsung/a70qeea/a70q:10/QP1A.190711.020/A705FNXXU5CTK4:user/release-keys'
01-18 15:37:01.634 3566 3566 F DEBUG : Revision: '14'
01-18 15:37:01.634 3566 3566 F DEBUG : ABI: 'arm64'
01-18 15:37:01.645 3566 3566 F DEBUG : Timestamp: 2021-01-18 15:37:01+0100
01-18 15:37:01.645 3566 3566 F DEBUG : pid: 9030, tid: 3102, name: Thread Pool Wor >>> com.vimar.view <<<
01-18 15:37:01.645 3566 3566 F DEBUG : uid: 10319
01-18 15:37:01.645 3566 3566 F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x72616d69725024
01-18 15:37:01.645 3566 3566 F DEBUG : x0 00000079d5cd1800 x1 0000000000000029 x2 0000000000000000 x3 00000079f9441ed4
01-18 15:37:01.645 3566 3566 F DEBUG : x4 0000007a090efce0 x5 0000007a090efd00 x6 00000079d616ae80 x7 00000079c6682850
01-18 15:37:01.645 3566 3566 F DEBUG : x8 0000000000000029 x9 7972616d69725014 x10 0000000000000000 x11 0000000000000004
01-18 15:37:01.645 3566 3566 F DEBUG : x12 00000079cb4b0040 x13 0000000000002587 x14 00000079c0000000 x15 0000007b01d43560
01-18 15:37:01.645 3566 3566 F DEBUG : x16 00000079f9b20420 x17 0000007b01257ffc x18 0000000000000008 x19 00000079d6038048
01-18 15:37:01.645 3566 3566 F DEBUG : x20 0000000000000000 x21 00000079d5cd1d88 x22 00000079c57fc020 x23 00000079c57fc020
01-18 15:37:01.645 3566 3566 F DEBUG : x24 0000000000000000 x25 0000007a02a77b48 x26 0000007a02aca4f8 x27 0000000000000000
01-18 15:37:01.645 3566 3566 F DEBUG : x28 00000079c57fc020 x29 00000079c57fa840
01-18 15:37:01.645 3566 3566 F DEBUG : sp 00000079c57fa6f0 lr 00000079f979c1f8 pc 00000079f96d2e2c
01-18 15:37:01.918 3566 3566 F DEBUG :
01-18 15:37:01.918 3566 3566 F DEBUG : backtrace:
01-18 15:37:01.918 3566 3566 F DEBUG : #00 pc 00000000003e5e2c /data/app/com.vimar.view-EpmnC9DopvgAw2qLIES2NA==/lib/arm64/librealm-wrappers.so (BuildId: 464c0f025e0b843293e1a31d06965fd887cc188c)
01-18 15:37:01.918 3566 3566 F DEBUG : #1 pc 00000000004af1f4 /data/app/com.vimar.view-EpmnC9DopvgAw2qLIES2NA==/lib/arm64/librealm-wrappers.so (BuildId: 464c0f025e0b843293e1a31d06965fd887cc188c)
01-18 15:37:01.918 3566 3566 F DEBUG : #2 pc 0000000000457d20 /data/app/com.vimar.view-EpmnC9DopvgAw2qLIES2NA==/lib/arm64/librealm-wrappers.so (BuildId: 464c0f025e0b843293e1a31d06965fd887cc188c)
01-18 15:37:01.918 3566 3566 F DEBUG : #3 pc 000000000045ec20 /data/app/com.vimar.view-EpmnC9DopvgAw2qLIES2NA==/lib/arm64/librealm-wrappers.so (BuildId: 464c0f025e0b843293e1a31d06965fd887cc188c)
01-18 15:37:01.918 3566 3566 F DEBUG : #4 pc 00000000001c7948 /data/app/com.vimar.view-EpmnC9DopvgAw2qLIES2NA==/lib/arm64/librealm-wrappers.so (BuildId: 464c0f025e0b843293e1a31d06965fd887cc188c)
01-18 15:37:01.918 3566 3566 F DEBUG : #5 pc 00000000001a9e94 /data/app/com.vimar.view-EpmnC9DopvgAw2qLIES2NA==/lib/arm64/librealm-wrappers.so (BuildId: 464c0f025e0b843293e1a31d06965fd887cc188c)
01-18 15:37:01.919 3566 3566 F DEBUG : #6 pc 0000000000161820 /data/app/com.vimar.view-EpmnC9DopvgAw2qLIES2NA==/lib/arm64/librealm-wrappers.so (shared_realm_commit_transaction+44) (BuildId: 464c0f025e0b843293e1a31d06965fd887cc188c)
01-18 15:37:01.919 3566 3566 F DEBUG : #7 pc 000000000004b134 anonymous:79fc9f8000
Steps to Reproduce
non reproducible, happens in random contexts.
Version of Realm and Tooling
Realm 5.1.1 on Android/Ios
XamarinForms
The text was updated successfully, but these errors were encountered: