Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Break hangs on HashSet when a loop is formed on entries due to a concurrent operation #28225

Merged
merged 3 commits into from
Mar 26, 2018

Conversation

safern
Copy link
Member

@safern safern commented Mar 19, 2018


if (collisionCount >= _slots.Length)
{
// The chain of entries forms a loop; which means a concurrent update has happened.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: ";" => ","


if (collisionCount >= _slots.Length)
{
// The chain of entries forms a loop; which means a concurrent update has happened.
Copy link
Member

@stephentoub stephentoub Mar 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto (for all such cases)

if (collisionCount >= _slots.Length)
{
// The chain of entries forms a loop; which means a concurrent update has happened.
throw new InvalidOperationException(SR.InvalidOperation_ConcurrentOperationsNotSupported);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably use a ThrowHelper scheme instead of throw for these as some of these loops are particularly hot paths. There is not a ThrowHelper.cs in CoreFX common (probably should be). You can copy one of the several sprinkled around CoreFX sources.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not a ThrowHelper.cs in CoreFX common (probably should be)

#26636 (comment)

@danmoseley
Copy link
Member

@Anipik Alpine build failed:

  Traceback (most recent call last):
    File "/mnt/j/workspace/dotnet_corefx/master/alpine-TGroup_netcoreapp+CGroup_Debug+AGroup_x64+TestOuter_false_prtest/Tools/DumplingHelper.py", line 131, in <module>
      main(sys.argv)
    File "/mnt/j/workspace/dotnet_corefx/master/alpine-TGroup_netcoreapp+CGroup_Debug+AGroup_x64+TestOuter_false_prtest/Tools/DumplingHelper.py", line 113, in main
      install_dumpling()
    File "/mnt/j/workspace/dotnet_corefx/master/alpine-TGroup_netcoreapp+CGroup_Debug+AGroup_x64+TestOuter_false_prtest/Tools/DumplingHelper.py", line 34, in install_dumpling
      print("An unexpected error was encountered while installing dumpling.py: " + sys.exc_info()[0])
  TypeError: cannot concatenate 'str' and 'type' objects
/mnt/j/workspace/dotnet_corefx/master/alpine-TGroup_netcoreapp+CGroup_Debug+AGroup_x64+TestOuter_false_prtest/Tools/Dumpling.targets(30,5): error MSB3073: The command "python /mnt/j/workspace/dotnet_corefx/master/alpine-TGroup_netcoreapp+CGroup_Debug+AGroup_x64+TestOuter_false_prtest/Tools/DumplingHelper.py install_dumpling" exited with code 1. [/mnt/j/workspace/dotnet_corefx/master/alpine-TGroup_netcoreapp+CGroup_Debug+AGroup_x64+TestOuter_false_prtest/src/tests.builds]

In a quick look, I think sys.exec_info()[0] is not stringable, or at least not always. A better way seems to be to add import traceback and do

print("An unexpected error was encountered while installing dumpling.py: " + traceback.format_exc())

Could you please fix in dumplinghelper.py.

{
internal static class ThrowHelper
{
internal static void ThrowInvalidOperationException_ConcurrentOperationsNotSupported() => throw CreateInvalidOperationException_ConcurrentOperationsNotSupported();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be just:

internal static void ThrowInvalidOperationException_ConcurrentOperationsNotSupported() => throw new InvalidOperationException(SR.InvalidOperation_ConcurrentOperationsNotSupported);

The complex throw helper pattern with intermediate CreateInvalidOperationException is a necessary only for cases where the code needs to be compatible with old JITs that do not perform the right optimizations. It is not the case for System.Collections.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thought it was for all cases. Thanks, will update the PR.

Copy link
Member

@jkotas jkotas Mar 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This drop in the bucket since the rest of the Hashset is not using the ThrowHelper. Using it in one place only helps very little.

These manual throw helpers are pain to deal with for Mono folks who are trying to reuse the code. cc @marek-safar

Unless this has a measurable performance impact for something that matters, we should avoid introducing these manual throw helpers.

If we care about the performance benefits, we should build the plugin for IL linker that autogenerates them everywhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we care about the performance benefits, we should build the plugin for IL linker that autogenerates them everywhere.

👍 I'd be very happy with that :)

{
internal static class ThrowHelper
{
internal static void ThrowInvalidOperationException_ConcurrentOperationsNotSupported() => throw CreateInvalidOperationException_ConcurrentOperationsNotSupported();
Copy link
Member

@jkotas jkotas Mar 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This drop in the bucket since the rest of the Hashset is not using the ThrowHelper. Using it in one place only helps very little.

These manual throw helpers are pain to deal with for Mono folks who are trying to reuse the code. cc @marek-safar

Unless this has a measurable performance impact for something that matters, we should avoid introducing these manual throw helpers.

If we care about the performance benefits, we should build the plugin for IL linker that autogenerates them everywhere.

@safern
Copy link
Member Author

safern commented Mar 20, 2018

This drop in the bucket since the rest of the Hashset is not using the ThrowHelper

So you would suggest updating HashSet to use ThrowHelper in all other places where is throwing, or measure if it is worth it, or not using ThrowHelper at all?

@jkotas
Copy link
Member

jkotas commented Mar 20, 2018

I would suggest not using the manual ThrowHelper at all.

@@ -265,6 +266,13 @@ public bool Contains(T item)
{
return true;
}

if (collisionCount >= _slots.Length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take _slots to a local ref for its multiple uses?
Slot[] slots = _slots;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious and to learn, what is the benefit of doing that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses the array in a register rather than a memory location; also applies extra optimisations because it cannot change between uses unlike the memory location version

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thanks for explaining :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benaadams is this something the JIT might be able to do itself in future? Always a shame to write code in an artificial way.

Copy link
Member

@benaadams benaadams Mar 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, though bounds checks have already regressed to become even more conservative when the array comes from memory dotnet/coreclr#15756

@@ -293,6 +301,7 @@ public bool Remove(T item)
int hashCode = InternalGetHashCode(item);
int bucket = hashCode % _buckets.Length;
int last = -1;
int collisionCount = 0;
for (int i = _buckets[bucket] - 1; i >= 0; last = i, i = _slots[i].next)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take _slots to a local ref for its multiple uses?
Slot[] slots = _slots;

@safern
Copy link
Member Author

safern commented Mar 20, 2018

I would suggest not using the manual ThrowHelper at all.

Ok. Will update to throw from the class itself. Adding a private method that throws the exception would help instead of throwing it inside every instance?

@jkotas
Copy link
Member

jkotas commented Mar 20, 2018

This file does not use the throw helpers at all. I would maintain the consistent style. Throw inplace without any helpers methods like vast majority of the code out there.

@safern
Copy link
Member Author

safern commented Mar 21, 2018

@jkotas looks good now?

@safern
Copy link
Member Author

safern commented Mar 21, 2018

@dotnet-bot test this please

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

}
else
{
if (_lastIndex == _slots.Length)
if (_lastIndex == slots.Length)
{
IncreaseCapacity();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to grab slots again here as IncreaseCapacity() will change it. So

IncreaseCapacity();
slots = _slots;

}
else
{
if (_lastIndex == _slots.Length)
if (_lastIndex == slots.Length)
{
IncreaseCapacity();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to grab slots again here as IncreaseCapacity() will change it. So

IncreaseCapacity();
slots = _slots;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right, thanks for the heads up.

@safern
Copy link
Member Author

safern commented Mar 22, 2018

test this please

@safern
Copy link
Member Author

safern commented Mar 22, 2018

test Windows x64 Debug Build
test OSX x64 Debug Build

@safern
Copy link
Member Author

safern commented Mar 22, 2018

test Linux x64 Debug Build
test Linux x64 Release Build

@safern
Copy link
Member Author

safern commented Mar 23, 2018

test Linux x64 Debug Build
test Windows x86 Release Build

@safern
Copy link
Member Author

safern commented Mar 23, 2018

@mmitche seems like there is a bug in the regex job trigger. I triggered 2 specific jobs and it restarted 3 as you can see above. This happened for the last 2 times.

@safern
Copy link
Member Author

safern commented Mar 26, 2018

test Linux x64 Debug Build please

@safern
Copy link
Member Author

safern commented Mar 26, 2018

@danmosemsft the failures here doesn’t look related, good to merge? the Linux builds seem in a pretty flaky state.

@benaadams
Copy link
Member

Not sure how to get the 1 failure

Fedora.26.Amd64.Open - x64 - Debug
[Pipeline] [Fedora.26.Amd64.Open - x64 - Debug (7dc02b32-3fe0-4f33-a236-8d33812aa46b)] httpRequest
HttpMethod: GET
URL: https://helix.dot.net/api/2017-04-14/aggregate/jobs?groupBy=job.name&maxResultSets=1&filter.name=7dc02b32-3fe0-4f33-a236-8d33812aa46b
Sending request to url: https://helix.dot.net/api/2017-04-14/aggregate/jobs?groupBy=job.name&maxResultSets=1&filter.name=7dc02b32-3fe0-4f33-a236-8d33812aa46b
Response Code: HTTP/1.1 200 OK
Response: 
[{"Key":{"job.name":"7dc02b32-3fe0-4f33-a236-8d33812aa46b"},"Data":{"Analysis":[{"Name":"xunit","Status":{"pass":301186,"fail":1,"skip":384}}],"WorkItemStatus":{"pass":232}}}]
Success code from [100‥399]
] echo
Info: Failed 1/301571 (384 skipped)

@danmoseley danmoseley merged commit be48e96 into dotnet:master Mar 26, 2018
@danmoseley
Copy link
Member

Please open an issue for the sockets failure you hit here.

@benaadams
Copy link
Member

Fedora.26.Amd64.Open-x64-Debug

System.Net.Sockets.Tests
 System.Net.Sockets.Tests.ExecutionContextFlowTest/
  ExecutionContext_FlowsOnlyOnceAcrossAsyncOperations  
Assert.InRange() Failure
Range:  (1 - 60)
Actual: 117
  at System.Net.Sockets.Tests.ExecutionContextFlowTest.<>c.<<ExecutionContext_FlowsOnlyOnceAcrossAsyncOperations>b__11_0>d.MoveNext() in /mnt/j/workspace/dotnet_corefx/master/linux-TGroup_netcoreapp+CGroup_Debug+AGroup_x64+TestOuter_false_prtest/src/System.Net.Sockets/tests/FunctionalTests/ExecutionContextFlowTest.netcoreapp.cs:line 47
--- End of stack trace from previous location where exception was thrown ---
--- End of stack trace from previous location where exception was thrown ---

@benaadams
Copy link
Member

@safern
Copy link
Member Author

safern commented Mar 26, 2018

Thanks @benaadams

@safern safern deleted the HangsHashSet branch March 27, 2018 01:57
@karelz karelz added this to the 2.1.0 milestone Mar 27, 2018
mynkow added a commit to Elders/Cronus that referenced this pull request Mar 21, 2019
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…urrent operation (dotnet/corefx#28225)

* Break hangs on HashSet when a loop is formed on entries due to a concurrent operation

* PR Feedback, copy _slots to local ref

* Get a copy of -slots after IncreaseCapacity changes it


Commit migrated from dotnet/corefx@be48e96
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants