-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Managed memory leak of ActivationId #6929
Comments
Hi Tal, I accidentally deleted your Gitter thread while trying to expand it - my apologies. Thank you for the investigation! I've opened #6930 with the one-liner you suggested. Regarding the logistics of getting a fix into production, are you able to upgrade to 3.4.x? |
Okay, Thanks Reuben. |
I saw your response regarding a private build, I think that it is possible for us as we do it with the OrleansDashboard (since it stopped supporting netstandard2.0 ) |
I meant you cherry picking the 3.4.1 commit on top of 3.3.0 and building it yourself, @talweiss1982. |
If it is planned to release this week than I'll wait. Thought it might take a while until it is released. |
@talweiss1982 v3.4.1 is up on nuget.org now. Release notes here: https://github.com/dotnet/orleans/releases/tag/v3.4.1 I'll close this issue for now |
Hi guys,
I have been investigating a memory dump of one of our services that grown to 13GB of memory.
We are using Orleans 3.3.0 running on net472 framework
The bottom of the dump shows (windbg):
00007ff909f4aef8 67509 450187368 System.Collections.Generic.HashSet
1+Slot[[Orleans.Runtime.ActivationId, Orleans.Core.Abstractions]][] 00007ff9095b6f20 21165530 507972720 Orleans.Runtime.ActivationId 00007ff9099e2618 21237239 509693736 System.WeakReference
1[[Orleans.Runtime.ActivationId, Orleans.Core.Abstractions]]00007ff9099e2e38 21237238 1019387424 System.Collections.Concurrent.ConcurrentDictionary
2+Node[[Orleans.Runtime.UniqueKey, Orleans.Core.Abstractions],[System.WeakReference
1[[Orleans.Runtime.ActivationId, Orleans.Core.Abstractions]], mscorlib]]00007ff9096539b0 21761379 1218637224 Orleans.Runtime.UniqueKey
000001c02523d1b0 13356465 7171352622 Free
Lets ignore the fragmentation (although its probably related), we have 1.2 GB of UniqueKey we have 1GB of dictionary nodes 0.5GB of WeakRefrences and 0.5GB of ActivationId
I have checked the GCRoot of some of those UniqueKey instances and here are the roots:
Root #1
...
-> 000001c125a9ef40 Orleans.Runtime.ActivationDirectory
-> 000001c125a9ef80 System.Collections.Concurrent.ConcurrentDictionary
2[[Orleans.Runtime.ActivationId, Orleans.Core.Abstractions],[Orleans.Runtime.ActivationData, Orleans.Runtime]] -> 000001c125bf80b8 System.Collections.Concurrent.ConcurrentDictionary
2+Tables[[Orleans.Runtime.ActivationId, Orleans.Core.Abstractions],[Orleans.Runtime.ActivationData, Orleans.Runtime]]-> 000001c265be8078 System.Collections.Concurrent.ConcurrentDictionary
2+Node[[Orleans.Runtime.ActivationId, Orleans.Core.Abstractions],[Orleans.Runtime.ActivationData, Orleans.Runtime]][] -> 000001c125bf75e0 System.Collections.Concurrent.ConcurrentDictionary
2+Node[[Orleans.Runtime.ActivationId, Orleans.Core.Abstractions],[Orleans.Runtime.ActivationData, Orleans.Runtime]]-> 000001c0e5af57e8 Orleans.Runtime.ActivationData
-> 000001c0e5af5900 System.Collections.Generic.HashSet
1[[Orleans.Runtime.ActivationId, Orleans.Core.Abstractions]] -> 000001c22cf46690 System.Collections.Generic.HashSet
1+Slot[[Orleans.Runtime.ActivationId, Orleans.Core.Abstractions]][]-> 000001c02635d5f8 Orleans.Runtime.ActivationId
-> 000001c02635d5c0 Orleans.Runtime.UniqueKey
Root #2
000000a64a57ca70 00007ff9091ec0f3 Orleans.Interner
2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].FindOrCreate(System.__Canon, System.Func
2<System.__Canon,System.__Canon>)rsp+30: 000000a64a57caa0
-> 000001c125a935d8 Orleans.Interner
2[[Orleans.Runtime.UniqueKey, Orleans.Core.Abstractions],[Orleans.Runtime.ActivationId, Orleans.Core.Abstractions]] -> 000001c125a93600 System.Collections.Concurrent.ConcurrentDictionary
2[[Orleans.Runtime.UniqueKey, Orleans.Core.Abstractions],[System.WeakReference1[[Orleans.Runtime.ActivationId, Orleans.Core.Abstractions]], mscorlib]] -> 000001c37ce19b08 System.Collections.Concurrent.ConcurrentDictionary
2+Tables[[Orleans.Runtime.UniqueKey, Orleans.Core.Abstractions],[System.WeakReference1[[Orleans.Runtime.ActivationId, Orleans.Core.Abstractions]], mscorlib]] -> 000001c000001020 System.Collections.Concurrent.ConcurrentDictionary
2+Node[[Orleans.Runtime.UniqueKey, Orleans.Core.Abstractions],[System.WeakReference1[[Orleans.Runtime.ActivationId, Orleans.Core.Abstractions]], mscorlib]][] -> 000001c306ecc9c0 System.Collections.Concurrent.ConcurrentDictionary
2+Node[[Orleans.Runtime.UniqueKey, Orleans.Core.Abstractions],[System.WeakReference`1[[Orleans.Runtime.ActivationId, Orleans.Core.Abstractions]], mscorlib]]-> 000001c02635d5c0 Orleans.Runtime.UniqueKey
So root #2 originates from https://github.com/dotnet/orleans/blob/v3.3.0/src/Orleans.Core.Abstractions/IDs/ActivationId.cs#L16
I must say I think that the use of the interner class to hold unique keys defeats the purpose of the class (to act as a GC safe object pool) but lets not dwell on that, if the ActivationId were not rooted the interner should not have imploded (in the dump it had 30 million dictionary slots)
At first I couldn't locate the HashSet holding the activation ids as it seems you guys deleted the branch 3.3.0 and I was looking at the master at a commit with a status of update changelog for 3.3.0
I decompiled the assembly and found the renegade HashSet to be https://github.com/dotnet/orleans/blob/v3.3.0/src/Orleans.Runtime/Catalog/ActivationData.cs#L115
Notice that on https://github.com/dotnet/orleans/blob/v3.3.0/src/Orleans.Runtime/Catalog/ActivationData.cs#L305
you add the ActivationId of whomever sent this message to the above HashSet
Although when a request is marked as handled it is not removed, see:
https://github.com/dotnet/orleans/blob/v3.3.0/src/Orleans.Runtime/Catalog/ActivationData.cs#L319
This is causing us to implode in term of memory and eventually crash.
I think that the scenario in which this will happen is if you have grains that are never been de-activated as they are very active but they are been invoked by different grains that have shorter lifespan thus they have unique ActivationIds in our case there are 15 silos with 65k Grains on each silo so it is very easy to accumulate a large number of unique ids in relatively short time (within days)
The fix here is a one liner at https://github.com/dotnet/orleans/blob/v3.3.0/src/Orleans.Runtime/Catalog/ActivationData.cs#L320 add:
RunningRequestsSenders.Remove(message.SendingActivation);
I would have submitted a PR but I'm not sure how to add this to a Tag and master doesn't have this code and we really need this fix in Orleans 3.3.x our production is suffering from this.
The text was updated successfully, but these errors were encountered: