Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in SyncManager::unregister_session #5752

Closed
nirinchev opened this issue Aug 16, 2022 · 9 comments · Fixed by #5886
Closed

Crash in SyncManager::unregister_session #5752

nirinchev opened this issue Aug 16, 2022 · 9 comments · Fixed by #5886
Assignees

Comments

@nirinchev
Copy link
Member

SDK and version

SDK : .NET
Version: 10.15.1
Core: 12.5.0

Observations

  • How frequent do the crash occur?
    • Rarely
  • Does it happen in production or during dev/test?
    • dev/test
  • Can the crash be reproduced by you?
    • Occasionally
  • Can you provide instructions for how we can reproduce it?
    • No, there's no consistent pattern

Crash log / stacktrace

#0  0x000001a83a27cc in pthread_mutex_lock
#1  0xdb348001a82f91a8 in std::__1::mutex::lock()
#2  0x75630002f014a9c0 in realm::SyncManager::unregister_session(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)
#3  0x000002f01586a0 in std::__1::__shared_ptr_emplace<realm::SyncSession::ExternalReference, std::__1::allocator<realm::SyncSession::ExternalReference> >::__on_zero_shared()
#4  0x000002f0048730 in realm_syncsession_destroy
#5  0x000002fb03adac in  (wrapper managed-to-native) Realms.Sync.SessionHandle/NativeMethods:destroy (intptr) [{0x2f5156460} + 0x6c]  (0x2fb03ad40 0x2fb03ae0c) [0x130f02a80 - Unity Child Domain]
#6  0x000002fb03ab10 in  Realms.Sync.SessionHandle:Unbind () [{0x2f57e4988} + 0x20]  (0x2fb03aaf0 0x2fb03ab20) [0x130f02a80 - Unity Child Domain]
#7  0x000002fb03a990 in  Realms.SharedRealmHandle:UnbindLockedList () [{0x2b221bf00} + 0x68]  (0x2fb03a928 0x2fb03aa20) [0x130f02a80 - Unity Child Domain]
#8  0x000002fb03a4c4 in  Realms.SharedRealmHandle:ReleaseHandle () [{0x2e6c681a0} + 0xa4]  (0x2fb03a420 0x2fb03a5ec) [0x130f02a80 - Unity Child Domain]
#9  0x000002f9662108 in  System.Runtime.InteropServices.SafeHandle:DangerousReleaseInternal (bool) [{0x14238f6f8} + 0x218]  (0x2f9661ef0 0x2f9662128) [0x130f02a80 - Unity Child Domain]
#10 0x000002faf5c4c4 in  System.Runtime.InteropServices.SafeHandle:InternalFinalize () [{0x141cad768} + 0x24]  (0x2faf5c4a0 0x2faf5c4d4) [0x130f02a80 - Unity Child Domain]
#11 0x000002faeb7bf0 in  System.Runtime.InteropServices.SafeHandle:Dispose (bool) [{0x14238ea38} + 0x30]  (0x2faeb7bc0 0x2faeb7c00) [0x130f02a80 - Unity Child Domain]
#12 0x000002faf5c434 in  System.Runtime.InteropServices.SafeHandle:Finalize () [{0x14238e9c0} + 0x24]  (0x2faf5c410 0x2faf5c46c) [0x130f02a80 - Unity Child Domain]
#13 0x000002f9739704 in  (wrapper runtime-invoke) object:runtime_invoke_virtual_void__this__ (object,intptr,intptr,intptr) [{0x141959e98} + 0x74]  (0x2f9739690 0x2f97397a4) [0x130f02a80 - Unity Child Domain]
#14 0x00000135a74728 in mono_gc_run_finalize
#15 0x00000135a763d4 in finalizer_thread
#16 0x00000135a3d570 in start_wrapper_internal
#17 0x00000135a3d41c in start_wrapper
#18 0x00000135abccb0 in GC_inner_start_routine
#19 0x00000135abcc38 in GC_start_routine
#20 0x000001a83a826c in _pthread_start

Steps & Code to Reproduce

Happens in Unity when I reload the assembly. This causes the finalizer to collect all C# instances, which in turn release any shared pointers they were holding. Based on the stacktrace, when destroying a sync session, it tries to unregister from the manager and a mutex fails to lock. I don't know what the reason for that could be, but I figured someone on the Core team might.

@jbreams
Copy link
Contributor

jbreams commented Aug 17, 2022

What kind of crash happens in pthread_mutex_lock? Does it segfault or abort or something else? I suspect things are just not being torn down in the right order somewhere. Would it be possible to use the thread sanitizer when reproducing this? That would almost certainly show us what's being destroyed in the wrong order, but I have no idea if/how you'd use it with .NET.

@nirinchev
Copy link
Member Author

It's a segfault as far as I can tell. I'm also not sure how to use the thread sanitizer, but will see if @fealebenpae can help.

@nirinchev
Copy link
Member Author

nirinchev commented Aug 19, 2022

We also saw that in a dart CI run:

2022-08-19T10:26:20.1408150Z [INFO] Realm: Connection[1]: Disconnected
2022-08-19T10:26:20.3234710Z [INFO] Realm: Connection[1]: Session[1]: client_reset_config = false, Realm exists = true, client reset = false
2022-08-19T10:26:20.3235450Z [INFO] Realm: Connected to endpoint '52.200.206.35:443' (from '10.213.7.9:49430')
2022-08-19T10:26:20.7315140Z [INFO] Realm: Connection[1]: Session[1]: Integrated 1 changesets from pending bootstrap for query version 0, producing client version 10. 0 changesets remaining in bootstrap
2022-08-19T10:26:21.0197120Z [INFO] Realm: Connection[1]: Session[1]: Integrated 1 changesets from pending bootstrap for query version 1, producing client version 14. 0 changesets remaining in bootstrap
2022-08-19T10:26:21.1716550Z [INFO] Realm: Connection[1]: Session[1]: client_reset_config = false, Realm exists = true, client reset = false
2022-08-19T10:26:21.1717330Z [INFO] Realm: Connected to endpoint '52.200.206.35:443' (from '10.213.7.9:49432')
2022-08-19T10:26:21.7141040Z [INFO] Realm: Connection[1]: Session[1]: Integrated 1 changesets from pending bootstrap for query version 0, producing client version 10. 0 changesets remaining in bootstrap
2022-08-19T10:26:21.8808930Z [INFO] Realm: Connection[1]: Session[1]: Integrated 1 changesets from pending bootstrap for query version 1, producing client version 14. 0 changesets remaining in bootstrap
2022-08-19T10:26:23.5372260Z 
2022-08-19T10:26:23.5473760Z 00:51 �[32m+232�[0m�[33m ~10�[0m�[31m -1�[0m: test/session_test.dart: SyncSession.getProgressStream forCurrentlyOutstandingWork �[1m�[31m[E]�[0m�[0m
2022-08-19T10:26:23.5475590Z ===== CRASH =====
2022-08-19T10:26:23.5574960Z   Expected: <3695>
2022-08-19T10:26:23.5577450Z si_signo=Segmentation fault: 11(11), si_code=0, si_addr=0x0
2022-08-19T10:26:23.5682320Z     Actual: <3348>
2022-08-19T10:26:23.5684040Z version=2.17.6 (stable) (Tue Jul 12 12:54:37 2022 +0200) on "macos_x64"
2022-08-19T10:26:23.5801630Z   
2022-08-19T10:26:23.5804120Z pid=1838, thread=11811, isolate_group=main(0x7fd4cd01b000), isolate=main(0x7fd4cd426400)
2022-08-19T10:26:23.5905770Z isolate_instructions=1045ecd20, vm_instructions=1045ecd20
2022-08-19T10:26:23.6012300Z   pc 0x00007fff2067ed86 fp 0x00007000034665b0 pthread_mutex_lock+0x4
2022-08-19T10:26:23.6136380Z   pc 0x00007fff2061a3d9 fp 0x00007000034665c0 std::__1::mutex::lock()+0x9
2022-08-19T10:26:23.6239050Z   pc 0x00000001103ebe51 fp 0x00007000034665f0 realm::SyncSession::unregister_progress_notifier(unsigned long long)+0x21
2022-08-19T10:26:23.6344780Z   pc 0x00000001102af82d fp 0x0000700003466600 realm_sync_session_unregister_progress_notifier+0xd
2022-08-19T10:26:23.6446170Z   pc 0x0000000106f05b6b fp 0x0000700003466628 Unknown symbol
2022-08-19T10:26:23.6547470Z   pc 0x000000010f19b988 fp 0x0000700003466660 Unknown symbol
2022-08-19T10:26:23.6584060Z   pc 0x000000010f19b547 fp 0x00007000034666c0 Unknown symbol
2022-08-19T10:26:23.6591360Z   pc 0x000000010f19b3f4 fp 0x0000700003466708 Unknown symbol
2022-08-19T10:26:23.6593340Z   pc 0x000000010f19b29b fp 0x0000700003466750 Unknown symbol
2022-08-19T10:26:23.6594940Z   pc 0x000000010f19b12a fp 0x0000700003466788 Unknown symbol
2022-08-19T10:26:23.6596570Z   pc 0x000000010d5aae63 fp 0x00007000034667e8 Unknown symbol
2022-08-19T10:26:23.6597380Z   pc 0x000000010f19af78 fp 0x0000700003466828 Unknown symbol
2022-08-19T10:26:23.6599510Z   pc 0x000000010f19a56b fp 0x0000700003466870 Unknown symbol
2022-08-19T10:26:23.6600920Z   pc 0x000000010f125637 fp 0x00007000034668b0 Unknown symbol
2022-08-19T10:26:23.6603440Z   pc 0x000000010f1253d2 fp 0x00007000034668f8 Unknown symbol
2022-08-19T10:26:23.6605270Z   pc 0x000000010f124f67 fp 0x0000700003466948 Unknown symbol
2022-08-19T10:26:23.6606340Z   pc 0x000000010f614b5f fp 0x0000700003466980 Unknown symbol
2022-08-19T10:26:23.6606720Z   pc 0x000000010f610ee1 fp 0x00007000034669d0 Unknown symbol
2022-08-19T10:26:23.6608520Z   pc 0x000000010f610b46 fp 0x0000700003466a18 Unknown symbol
2022-08-19T10:26:23.6611050Z   pc 0x000000010d5d3ec3 fp 0x0000700003466a60 Unknown symbol
2022-08-19T10:26:23.6616750Z   pc 0x000000010d5d3b3b fp 0x0000700003466aa0 Unknown symbol
2022-08-19T10:26:23.6619120Z   pc 0x000000010d5d3a69 fp 0x0000700003466ac8 Unknown symbol
2022-08-19T10:26:23.6646270Z   pc 0x000000010d5cc9ee fp 0x0000700003466b08 Unknown symbol
2022-08-19T10:26:23.6647360Z   pc 0x000000010d59ea88 fp 0x0000700003466b48 Unknown symbol
2022-08-19T10:26:23.6650830Z   pc 0x0000000106f02a0c fp 0x0000700003466bc0 Unknown symbol
2022-08-19T10:26:23.6652100Z   pc 0x000000010475723d fp 0x0000700003466c60 dart::DartEntry::InvokeCode(dart::Code const&, unsigned long, dart::Array const&, dart::Array const&, dart::Thread*)+0x14d
2022-08-19T10:26:23.6653260Z   pc 0x0000000104757078 fp 0x0000700003466cc0 dart::DartEntry::InvokeFunction(dart::Function const&, dart::Array const&, dart::Array const&, unsigned long)+0x128
2022-08-19T10:26:23.6654290Z   pc 0x00000001047592c2 fp 0x0000700003466d10 dart::DartLibraryCalls::HandleMessage(long long, dart::Instance const&)+0x132
2022-08-19T10:26:23.6667260Z   pc 0x0000000104780ee2 fp 0x0000700003466df0 dart::IsolateMessageHandler::HandleMessage(std::__2::unique_ptr<dart::Message, std::__2::default_delete<dart::Message> >)+0x322
2022-08-19T10:26:23.6669310Z   pc 0x00000001047abe5d fp 0x0000700003466e60 dart::MessageHandler::HandleMessages(dart::MonitorLocker*, bool, bool)+0x12d
2022-08-19T10:26:23.6670330Z   pc 0x00000001047ac56f fp 0x0000700003466eb0 dart::MessageHandler::TaskCallback()+0x1df
2022-08-19T10:26:23.6671390Z   pc 0x00000001048e1b87 fp 0x0000700003466f30 dart::ThreadPool::WorkerLoop(dart::ThreadPool::Worker*)+0x147
2022-08-19T10:26:23.6672350Z   pc 0x00000001048e1fee fp 0x0000700003466f60 dart::ThreadPool::Worker::Main(unsigned long)+0x6e
2022-08-19T10:26:23.6673720Z   pc 0x000000010485001f fp 0x0000700003466fb0 dart::OSThread::GetMaxStackSize()+0xaf
2022-08-19T10:26:23.6674750Z   pc 0x00007fff206838fc fp 0x0000700003466fd0 _pthread_start+0xe0
2022-08-19T10:26:23.6675600Z   pc 0x00007fff2067f443 fp 0x0000700003466ff0 thread_start+0xf

@nirinchev
Copy link
Member Author

We also got a user report of this: realm/realm-dart#858

@nirinchev
Copy link
Member Author

I tried downgrading to various versions all the way to Core 11.14 and it seems the issue still exists. As far as I can tell, it happens when we destroy the .NET session while an upload is in progress. It destroys the shared_ptr to the native session and the crash occurs.

@sync-by-unito
Copy link

sync-by-unito bot commented Aug 22, 2022

➤ Nikola Irinchev commented:

The dart user reporting this claims it only started happening after upgrading from Core 12.1.0 to 12.5.1. While I can repro the Unity crash with 12.1.0, it may point to these being different bugs or to the older version of the dart SDK not exercising the buggy code for some reason.

@nielsenko
Copy link
Contributor

@jbreams
Copy link
Contributor

jbreams commented Aug 25, 2022

@danieltabacaru, can you take a look? It seems like we have more than a few instances here. Maybe we can add more logging here to try to track down what's getting destroyed in the wrong order here.

@nicola-cab
Copy link
Member

This is a duplicate of #5493. Going to close 5493 and use this issue to track both.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants