Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix concurrency issues on TransactionScope timeout #3483

Merged
merged 22 commits into from
Mar 5, 2024

Conversation

fredericDelaporte
Copy link
Member

@fredericDelaporte fredericDelaporte commented Feb 5, 2024

fix #3355

Possible breaking changes:

  • A thread synchronization timeout may now occur in case of transaction scope timeout, throwing an additional exception. The additional throw can be disabled through the new setting transaction.ignore_session_synchronization_failures.
  • The default value of transaction.system_completion_lock_timeout has been lowered from 5000 (5 seconds) to 1000 (1 second).

@fredericDelaporte fredericDelaporte added this to the next minor milestone Feb 5, 2024
@fredericDelaporte fredericDelaporte force-pushed the GH3355 branch 2 times, most recently from 9503506 to 591ef23 Compare February 5, 2024 21:05
while (context == null && timeOutGuard.ElapsedMilliseconds < _systemTransactionCompletionLockTimeout)
{
// Na�ve yield.
Thread.Sleep(10);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this somehow without spin lock?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could, if we were putting thread locking mechanisms directly in the session, probably whatever the transaction mechanism, instead of keeping them in the transaction factory and applied only on transaction scope finalization.

The Wait done by BeginProcess actually belongs to the transaction factory and does nothing by default: it locks only during transaction scope finalization. This was done that way to minimize the impact of locks required to handle the transaction scope end processing occurring on concurrent threads.

As a result, when the session initiates some processing inside an ongoing transaction scope, no locks are applied. So, when a transaction timeout occurs concurrently to the session processing, we have no locking mechanism enabled against which we could wait for the end of the session processing.

@fredericDelaporte fredericDelaporte force-pushed the GH3355 branch 2 times, most recently from 233e291 to d3d1f14 Compare February 11, 2024 17:39
@fredericDelaporte
Copy link
Member Author

Rebased and forced pushed to have all the Teamcity build and maybe fix the WIP status which seems to have some failures for whatever reason.

@fredericDelaporte fredericDelaporte changed the title Fix concurrency issues on TransactionScope timeout WIP - Fix concurrency issues on TransactionScope timeout Feb 13, 2024
@fredericDelaporte fredericDelaporte changed the title WIP - Fix concurrency issues on TransactionScope timeout Fix concurrency issues on TransactionScope timeout Feb 13, 2024
@fredericDelaporte fredericDelaporte force-pushed the GH3355 branch 2 times, most recently from eb0224c to cff5d98 Compare February 13, 2024 23:55
hazzik
hazzik previously approved these changes Feb 21, 2024
hazzik
hazzik previously approved these changes Feb 21, 2024
@fredericDelaporte
Copy link
Member Author

The PostGreSQL build has failed with error:

======= Failed test run #1 ==========
  Test didn't clean up after itself. session closed: False; database cleaned: True; connection closed: True

  TearDown : NUnit.Framework.AssertionException : Test didn't clean up after itself. session closed: False; database cleaned: True; connection closed: True
     at NHibernate.Test.TestCase.TearDown() in D:\BuildAgent\work\30546188361a242\src\NHibernate.Test\TestCase.cs:line 214
  --TearDown
     at NUnit.Framework.Assert.ReportFailure(String message)
     at NUnit.Framework.Assert.Fail(String message, Object[] args)
     at NUnit.Framework.Assert.Fail(String message)
     at NHibernate.Test.TestCase.TearDown() in D:\BuildAgent\work\30546188361a242\src\NHibernate.Test\TestCase.cs:line 214
======= Failed test run #2 ==========
  Test didn't clean up after itself. session closed: False; database cleaned: True; connection closed: True
  TearDown : NUnit.Framework.AssertionException : Test didn't clean up after itself. session closed: False; database cleaned: True; connection closed: True
  at NHibernate.Test.TestCase.TearDown() in D:\BuildAgent\work\30546188361a242\src\NHibernate.Test\TestCase.cs:line 214
  --TearDown
     at NUnit.Framework.Assert.ReportFailure(String message)
     at NUnit.Framework.Assert.Fail(String message, Object[] args)
     at NUnit.Framework.Assert.Fail(String message)
     at NHibernate.Test.TestCase.TearDown() in D:\BuildAgent\work\30546188361a242\src\NHibernate.Test\TestCase.cs:line 214

I do not reproduce it locally. There are no trace of an unhandled exception, which would have prevented the session disposal.

I am relaunching that build to check if that is systematic or flaky.

Comment on lines +664 to +666
// A concurrency issue exists with the legacy setting allowing to use the session from transaction completion, which
// may cause session leaks. Ignore them.
FailOnNotClosedSession = !UseConnectionOnSystemTransactionPrepare;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PostgreSQL CI has again failed with the session leak. I am adding a condition to ignore that leak.

@fredericDelaporte
Copy link
Member Author

As this sometimes happens, the Oracle build had a timeout, taking more than one hour to complete. On relaunch, it had worked.

@fredericDelaporte fredericDelaporte enabled auto-merge (squash) March 3, 2024 20:07
@fredericDelaporte fredericDelaporte merged commit 0c3c705 into nhibernate:master Mar 5, 2024
23 checks passed
@hazzik hazzik added the r: Fixed label Mar 5, 2024
@fredericDelaporte fredericDelaporte deleted the GH3355 branch March 6, 2024 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

InvalidOperationException on SequencedHashMap.OrderedEnumerator.MoveNext
2 participants