Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in DistributedData on cross-node replication #5553

Closed
Tavriets opened this issue Jan 31, 2022 · 5 comments · Fixed by #5556
Closed

Memory leak in DistributedData on cross-node replication #5553

Tavriets opened this issue Jan 31, 2022 · 5 comments · Fixed by #5556

Comments

@Tavriets
Copy link

Version Information
Version: Akka.NET 1.4.32
Akka.NET Modules: Akka.DistributedData
Environment: Windows 11, .Net 6.0, no containers

Akka.DistributedData module leaks memory while replicating DData updates between nodes.

Steps to reproduce the behavior:

  1. Create new project using Petabridge Akka.Cluster Starter App template (pb-akka-cluster, v.1.2.0).
  2. Add Akka.DistributedData module.
  3. Modify existing AkkaService class to perform periodic updates of the LWWRegister items through the DData replicator with WriteAll consistency.
  4. Run 2 or more instances of the test app to form a cluster with data replication among nodes.
  5. Each DData update operation would leave GC non-collectable LWWRegister instance locked on ActorRefResolveCache's ThreadLocal instance which eventually would flood Gen2.

Reproduction: Akka.DistributedData-MemLeak

Expected behavior
DData replicator should not leave zombie instances of LocalActorRef type after DData replication is done.

Actual behavior
scr1

The number of retained ActorRefResolveCache will grow proportionally to the number of DData updates:
scr2

A typical retention tree of the objects survived through GCs:
scr3

Some additional observations:

  • Problem seems to be replicable only within multi-node cluster. Single-node cluster without cross-node DData replication behaves well.
  • Problem is not only limited to LWWRegister. Other CRDT types would stuck in Gen2 in the same way.
  • The retrieval of values stored in the DData does not cause memleak, so it seems to be related to the update replication only.
@Arkatufus
Copy link
Contributor

@Tavriets Thank you for the bug report, we'll be investigating it.

@Arkatufus
Copy link
Contributor

Arkatufus commented Jan 31, 2022

This issue is not limited to DData, it affects any local actors that are being messaged through Akka.Remote. It shows up very prominently in DData because the WriteAggregate actor have a large payload and it is being created and removed in rapid succession.

The real issue lies in 2 places, ActorRefResolveCache and LocalActorRef class:

  • ActorRefResolveCache holds a reference to LocalActorRef when it is addressed by a remote node. This blocks the LocalActorRef from being GC-ed until the cache is full and evicted from the cache.
  • LocalActorRef holds a reference to the original Props instance while it does not have any mechanism to release them. It is the ActorCell responsibility to release Props instance when the actor is killed/stopped.

@Aaronontheweb
Copy link
Member

So to clarify, the issue here is that these LocalActorRefs have already been killed and are not being GCed?

@Arkatufus
Copy link
Contributor

Non-GC-ed LocalActorRef is fine since they're quite small, the problem is that it held a reference to Props that belongs to a dead actor.

@Aaronontheweb
Copy link
Member

This fix will be available in our next nightly build https://getakka.net/community/getting-access-to-nightly-builds.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants