[monovm] Assertion at object.c:657, condition `lock->done' not met #96872

jeromelaban · 2024-01-11T22:29:38Z

Description

When running some threaded code in a MAUI app (android in this case), the app may fail with the following error:

[nyname.UnoApp16] * Assertion at /__w/1/s/src/mono/mono/metadata/object.c:657, condition `lock->done' not met

runtime/src/mono/mono/metadata/object.c

Line 657 in cf34ef0

g_assert (lock->done);

This happens when connecting in parallel to multiple websockets, see the StartConnection() in MainPage.xaml.cs.

Reproduction Steps

Build the following app: 96872-threading-crash.zip
Start the app with a debugger

Expected behavior

The app does not crash

Actual behavior

the app crashes immediately most of the time

Regression?

Yes, was working fine in 8.0.100.

Known Workarounds

Yes, don't execute any threaded code.

Configuration

dotnet --version
8.0.101

dotnet workload update --print-rollback
==workloadRollbackDefinitionJsonOutputStart==
{
  "microsoft.net.sdk.android": "34.0.52/8.0.100",
  "microsoft.net.sdk.ios": "17.0.8490/8.0.100",
  "microsoft.net.sdk.maccatalyst": "17.0.8490/8.0.100",
  "microsoft.net.sdk.macos": "14.0.8490/8.0.100",
  "microsoft.net.sdk.maui": "8.0.3/8.0.100",
  "microsoft.net.sdk.tvos": "17.0.8490/8.0.100",
  "microsoft.net.workload.mono.toolchain.current": "8.0.1/8.0.100",
  "microsoft.net.workload.emscripten.current": "8.0.1/8.0.100",
  "microsoft.net.workload.emscripten.net6": "8.0.1/8.0.100",
  "microsoft.net.workload.emscripten.net7": "8.0.1/8.0.100",
  "microsoft.net.workload.mono.toolchain.net6": "8.0.1/8.0.100",
  "microsoft.net.workload.mono.toolchain.net7": "8.0.1/8.0.100",
  "microsoft.net.sdk.aspire": "8.0.0-preview.2.23619.3/8.0.100"
}
==workloadRollbackDefinitionJsonOutputEnd==

Other information

No response

The text was updated successfully, but these errors were encountered:

lambdageek · 2024-01-11T23:45:29Z

Probably I messed something up in #93875

lambdageek · 2024-01-11T23:55:26Z

Oh... yea...

runtime/src/mono/mono/metadata/object.c

Lines 615 to 657 in cf34ef0

    
           		if (!lock->done) { 
        
           			int timeout_ms = 500; 
        
           			int wait_result = mono_coop_cond_timedwait (&lock->cond, &lock->mutex, timeout_ms); 
        
           			if (wait_result == -1) { 
        
           				/* timed out - go around again from the beginning.  If we got here 
        
           				 * from the "is_blocked = FALSE" case, above (another thread was 
        
           				 * blocked on the current thread, but on a lock that was already 
        
           				 * done but it didn't get to wake up yet), then it might still be 
        
           				 * the case that the current thread cannot proceed even if the other 
        
           				 * thread got to wake up - there might be a new deadlock.  We need 
        
           				 * to re-evaluate. 
        
           				 * 
        
           				 * This can happen if two threads A and B need to call the cctors 
        
                                            * for classes X and Y but in opposite orders, and also call a cctor 
        
                                            * for a third class Z.  (That is thread A wants to init: X, Z, Y; 
        
                                            * thread B wants to init: Y, Z, X.)  In that case, if B is waiting 
        
                                            * for A to finish initializing Z, and A (the current thread ) 
        
                                            * already finished Z and wants to init Y. In A, control will come 
        
                                            * here with "lock" being Y's lock.  But we will time out because B 
        
                                            * will see that A is responsible for initializing X and will also 
        
                                            * block.  So A is waiting for B to finish Y and B is waiting for A 
        
                                            * to finish X.  So the fact that A allowed B to wait for Z to 
        
                                            * finish didn't actually let us make progress.  Thread A must go 
        
                                            * around to the top once more and try to init Y - and detect that 
        
                                            * there is now a deadlock between X and Y. 
        
           				 */ 
        
           				mono_type_init_unlock (lock); 
        
           				// clean up blocked thread hash and lock refcount. 
        
           				mono_type_initialization_lock (); 
        
           				g_hash_table_remove (blocked_thread_hash, GUINT_TO_POINTER (tid)); 
        
           				gboolean deleted = unref_type_lock (lock); 
        
           				if (deleted) 
        
           					g_hash_table_remove (type_initialization_hash, vtable); 
        
           				mono_type_initialization_unlock (); 
        
           				goto retry_top; 
        
           			} else if (wait_result == 0) { 
        
           				/* Success: we were signaled that the other thread is done.  Proceed */ 
        
           			} else { 
        
           				g_assert_not_reached (); 
        
           			} 
        
           		} 
        
           		mono_type_init_unlock (lock); 
        
           		g_assert (lock->done);

on line 615 we know !lock->done
then on 650-651 we find out that wait_result == 0 - ie we were signaled on lock->cond. And then we fall through to 656. But we probably need a barrier so that we actually see lock->done (although I thought mutex locking/unlocking is a barrier.)

~~alternately a thread that is doing the initialization is dying and we're calling mono_release_type_locks which doesn't set lock->done~~ nope... release_type_locks also sets lock->done.

I'll need to try this in a debugger after all. Trying to guess what is wrong isn't working

lambdageek · 2024-01-12T00:14:26Z

Just noticed another thing that is wrong. when we goto retry_top, we repeat HANDLE_FUNCTION_ENTER() without doing a corresponding HANDLE_FUNCTION_RETURN, so we're incidentally messing up the coop handle stack.

lambdageek · 2024-01-12T00:17:20Z

Oh. It's probably a spurious wakeup. We need to actually re-evaluate lock->done if wait_result == 0

When using condition variables there is always a Boolean
predicate involving shared variables associated with each
condition wait that is true if the thread should proceed.
Spurious wakeups from the pthread_cond_timedwait() or
pthread_cond_wait() functions may occur. Since the return from
pthread_cond_timedwait() or pthread_cond_wait() does not imply
anything about the value of this predicate, the predicate should
be re-evaluated upon such return.

https://man7.org/linux/man-pages/man3/pthread_cond_wait.3p.html

lambdageek · 2024-01-12T15:52:32Z

Can repro the crash on a real Pixel 6a, but not in an emulator

the condition variable may be signaled even if the initialization by the other thread is not done yet. Handle spurious wakeups the same way as timeouts: go around once more from the beginning. Fixes dotnet#96872

the condition variable may be signaled even if the initialization by the other thread is not done yet. Handle spurious wakeups the same way as timeouts: go around once more from the beginning. Fixes #96872

* mono_runtime_class_init_full: handle spurious wakeups the condition variable may be signaled even if the initialization by the other thread is not done yet. Handle spurious wakeups the same way as timeouts: go around once more from the beginning. Fixes #96872 and #96804 * fix unbalanced handle frames if we goto retry_top, don't set up a new handle frame that lacks a matching HANDLE_FUNCTION_RETURN_VAL. Instead setup the handle frame once upfront

lambdageek · 2024-01-12T22:37:35Z

Reopening until the 8.0.x backport is merged

#96905) * mono_runtime_class_init_full: handle spurious wakeups the condition variable may be signaled even if the initialization by the other thread is not done yet. Handle spurious wakeups the same way as timeouts: go around once more from the beginning. Fixes #96872 and #96804 * fix unbalanced handle frames if we goto retry_top, don't set up a new handle frame that lacks a matching HANDLE_FUNCTION_RETURN_VAL. Instead setup the handle frame once upfront --------- Co-authored-by: Aleksey Kliger <alklig@microsoft.com>

lambdageek · 2024-01-13T04:57:32Z

Backport has been merged, too.

ChristopherStephan · 2024-01-16T10:23:13Z

We were also able to reproduce this error with Android SDK 31, 33 and 34.

liejuntao001 · 2024-01-17T17:21:49Z

When will the fix be released? We are all stuck by this issue.
Both 8.0.101 and 8.0.100 not working.

lambdageek · 2024-01-17T18:53:17Z

When will the fix be released? We are all stuck by this issue.

@liejuntao001 see #96804 (comment)
there is a suggestion in there for how to enable additional logging that may help you to work around the issue.

servicing releases for .NET come out approximately monthly. The fix for this issue is queued up for the February servicing release.

Both 8.0.101 and 8.0.100 not working.

This assertion is new in 8.0.101. What is the issue in 8.0.100? Could you open a new GitHub issue for it.

jeromelaban · 2024-01-17T18:55:53Z

This assertion is new in 8.0.101. What is the issue in 8.0.100? Could you open a new GitHub issue for it.

@lambdageek as rolling back workloads is not easy (it requires rollback files), changing the active version of the SDK in global.json does not have any effect on the active workload, which can cause confusion. I hope that dotnet/designs#294 will clear a lot of this confusion...

The fix for this issue is queued up for the February servicing release

As for the monthly servicing, February is quite far away for an issue that arises when updating visual studio, and for which rolling back is difficult.

lambdageek · 2024-01-17T19:38:43Z

This assertion is new in 8.0.101. What is the issue in 8.0.100? Could you open a new GitHub issue for it.

@lambdageek as rolling back workloads is not easy (it requires rollback files), changing the active version of the SDK in global.json does not have any effect on the active workload, which can cause confusion. I hope that dotnet/designs#294 will clear a lot of this confusion...

You're right. that would explain it.

The fix for this issue is queued up for the February servicing release

As for the monthly servicing, February is quite far away for an issue that arises when updating visual studio, and for which rolling back is difficult.

As a temporary workaround you might be able to force the 8.0.101 SDK to use the 8.0.0 runtime pack with something like this:

<Target Name="UpdateRuntimePackVersion" BeforeTargets="ProcessFrameworkReferences">
  <ItemGroup>
    <KnownRuntimePack Condition="%(RuntimePackLabels) == 'Mono'" LatestRuntimeFrameworkVersion="8.0.0" />
  </ItemGroup>
</Target>

liejuntao001 · 2024-01-17T22:28:07Z

This workaround works for me.

* mono_runtime_class_init_full: handle spurious wakeups the condition variable may be signaled even if the initialization by the other thread is not done yet. Handle spurious wakeups the same way as timeouts: go around once more from the beginning. Fixes dotnet#96872 and dotnet#96804 * fix unbalanced handle frames if we goto retry_top, don't set up a new handle frame that lacks a matching HANDLE_FUNCTION_RETURN_VAL. Instead setup the handle frame once upfront

orimslala · 2024-01-27T03:39:09Z

Hi, has the fix been released. i am getting crashes as well..
Assertion at /__w/1/s/src/mono/mono/metadata/object.c:657, condition `lock->done' not met

BeepBeepBopBop · 2024-01-30T20:30:54Z

Hello, I am still experiencing this issue despite being up to date with MAUI latest version and also including the workaround in the csproj.

mlyrstad · 2024-01-31T07:34:16Z

Hello, I am still experiencing this issue despite being up to date with MAUI latest version and also including the workaround in the csproj.

I don't think the fix is released yet. These guys don't believe in hotfixes, it seems, even for critical issues like this. May have something to do with the fact that the error is so far down in the system, I dunno. We have to wait for the Februrary push.

bitzeal-johan · 2024-02-03T11:09:05Z

When is the February push out? It is February now. We're waiting. Our customers projects are crashing.

gdamino · 2024-02-06T21:06:12Z

This bug broken our production App.
Our app starts some threads to do some stuff with tcp sockets, it crash on every user devices and even on emulators.
* Assertion at /__w/1/s/src/mono/mono/metadata/object.c:657, condition lock->done not met
It's a critical issue, no workarounds works here when it will fixed?

bcaceiro · 2024-02-09T06:01:33Z

So when will this be available in MAUI? I am getting this crashes all the time

filipnavara · 2024-02-10T12:00:10Z

When is the February push out?

As always, on patch Tuesday - the second Tuesday of a month - so February 13th.

paulwroe · 2024-02-13T14:22:17Z

Any joy with the new update, has anyone had a chance to try it (if it's available)

blackmesacode · 2024-02-13T19:26:24Z

Upgrading to VS 17.9.0 fixed the issue for me 👍

paulwroe · 2024-02-13T19:48:51Z

I'm on Visual Studio for Mac 2022 they've got a new update out today so I'll try that tomorrow, I also use Rider and there's a Mono update for VS, which I think is where the issue is, fingers crossed!

gdamino · 2024-02-14T13:34:55Z

Sdk 8.0.200 fixed the issue on Debug configuration for me, but still persist in Release for android. No way

ghost added the untriaged New issue has not been triaged by the area owner label Jan 11, 2024

dotnet-issue-labeler bot added the area-Build-mono label Jan 11, 2024

jeromelaban mentioned this issue Jan 11, 2024

[Hot Reload] metadata.c:1326, condition `GINT_TO_UINT32(idx) < table_info_get_rows (t)' not met #93860

Closed

lambdageek self-assigned this Jan 11, 2024

lambdageek added this to the 8.0.x milestone Jan 11, 2024

ghost removed the untriaged New issue has not been triaged by the area owner label Jan 11, 2024

lambdageek added area-VM-threading-mono and removed area-Build-mono labels Jan 11, 2024

jeromelaban mentioned this issue Jan 12, 2024

[Epic] Opened Microsoft issues tracking unoplatform/uno#982

Open

lambdageek mentioned this issue Jan 12, 2024

[mono] Fix class initialization spurious wakeups #96903

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 12, 2024

lambdageek mentioned this issue Jan 12, 2024

[release/8.0-staging] [mono] Fix class initialization spurious wakeups #96905

Merged

lambdageek closed this as completed in #96903 Jan 12, 2024

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 12, 2024

lambdageek reopened this Jan 12, 2024

lambdageek closed this as completed Jan 13, 2024

borrmann mentioned this issue Jan 26, 2024

Android App crashes on startup dotnet/maui#20176

Closed

georgiuk mentioned this issue Jan 27, 2024

Maui app crashes on startup Missionware/cognibase-public#2

Closed

lambdageek mentioned this issue Feb 26, 2024

Random SIGSEGVs after porting from Xamarin to net8-android #98933

Closed

github-actions bot locked and limited conversation to collaborators Mar 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[monovm] Assertion at object.c:657, condition `lock->done' not met #96872

[monovm] Assertion at object.c:657, condition `lock->done' not met #96872

jeromelaban commented Jan 11, 2024 •

edited

Loading

lambdageek commented Jan 11, 2024

lambdageek commented Jan 11, 2024 •

edited

Loading

lambdageek commented Jan 12, 2024

lambdageek commented Jan 12, 2024 •

edited

Loading

lambdageek commented Jan 12, 2024

lambdageek commented Jan 12, 2024

lambdageek commented Jan 13, 2024

ChristopherStephan commented Jan 16, 2024

liejuntao001 commented Jan 17, 2024

lambdageek commented Jan 17, 2024

jeromelaban commented Jan 17, 2024 •

edited

Loading

lambdageek commented Jan 17, 2024

liejuntao001 commented Jan 17, 2024

orimslala commented Jan 27, 2024 •

edited

Loading

BeepBeepBopBop commented Jan 30, 2024

mlyrstad commented Jan 31, 2024

bitzeal-johan commented Feb 3, 2024

gdamino commented Feb 6, 2024

bcaceiro commented Feb 9, 2024

filipnavara commented Feb 10, 2024

paulwroe commented Feb 13, 2024

blackmesacode commented Feb 13, 2024

paulwroe commented Feb 13, 2024

gdamino commented Feb 14, 2024

[monovm] Assertion at object.c:657, condition `lock->done' not met #96872

[monovm] Assertion at object.c:657, condition `lock->done' not met #96872

Comments

jeromelaban commented Jan 11, 2024 • edited Loading

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

lambdageek commented Jan 11, 2024

lambdageek commented Jan 11, 2024 • edited Loading

lambdageek commented Jan 12, 2024

lambdageek commented Jan 12, 2024 • edited Loading

lambdageek commented Jan 12, 2024

lambdageek commented Jan 12, 2024

lambdageek commented Jan 13, 2024

ChristopherStephan commented Jan 16, 2024

liejuntao001 commented Jan 17, 2024

lambdageek commented Jan 17, 2024

jeromelaban commented Jan 17, 2024 • edited Loading

lambdageek commented Jan 17, 2024

liejuntao001 commented Jan 17, 2024

orimslala commented Jan 27, 2024 • edited Loading

BeepBeepBopBop commented Jan 30, 2024

mlyrstad commented Jan 31, 2024

bitzeal-johan commented Feb 3, 2024

gdamino commented Feb 6, 2024

bcaceiro commented Feb 9, 2024

filipnavara commented Feb 10, 2024

paulwroe commented Feb 13, 2024

blackmesacode commented Feb 13, 2024

paulwroe commented Feb 13, 2024

gdamino commented Feb 14, 2024

jeromelaban commented Jan 11, 2024 •

edited

Loading

lambdageek commented Jan 11, 2024 •

edited

Loading

lambdageek commented Jan 12, 2024 •

edited

Loading

jeromelaban commented Jan 17, 2024 •

edited

Loading

orimslala commented Jan 27, 2024 •

edited

Loading