Memory leak in DistributedData on cross-node replication #5553

Tavriets · 2022-01-31T04:11:12Z

Version Information
Version: Akka.NET 1.4.32
Akka.NET Modules: Akka.DistributedData
Environment: Windows 11, .Net 6.0, no containers

Akka.DistributedData module leaks memory while replicating DData updates between nodes.

Steps to reproduce the behavior:

Create new project using Petabridge Akka.Cluster Starter App template (pb-akka-cluster, v.1.2.0).
Add Akka.DistributedData module.
Modify existing AkkaService class to perform periodic updates of the LWWRegister items through the DData replicator with WriteAll consistency.
Run 2 or more instances of the test app to form a cluster with data replication among nodes.
Each DData update operation would leave GC non-collectable LWWRegister instance locked on ActorRefResolveCache's ThreadLocal instance which eventually would flood Gen2.

Reproduction: Akka.DistributedData-MemLeak

Expected behavior
DData replicator should not leave zombie instances of LocalActorRef type after DData replication is done.

Actual behavior

The number of retained ActorRefResolveCache will grow proportionally to the number of DData updates:

A typical retention tree of the objects survived through GCs:

Some additional observations:

Problem seems to be replicable only within multi-node cluster. Single-node cluster without cross-node DData replication behaves well.
Problem is not only limited to LWWRegister. Other CRDT types would stuck in Gen2 in the same way.
The retrieval of values stored in the DData does not cause memleak, so it seems to be related to the update replication only.

Arkatufus · 2022-01-31T17:03:43Z

@Tavriets Thank you for the bug report, we'll be investigating it.

Arkatufus · 2022-01-31T21:41:03Z

This issue is not limited to DData, it affects any local actors that are being messaged through Akka.Remote. It shows up very prominently in DData because the WriteAggregate actor have a large payload and it is being created and removed in rapid succession.

The real issue lies in 2 places, ActorRefResolveCache and LocalActorRef class:

ActorRefResolveCache holds a reference to LocalActorRef when it is addressed by a remote node. This blocks the LocalActorRef from being GC-ed until the cache is full and evicted from the cache.
LocalActorRef holds a reference to the original Props instance while it does not have any mechanism to release them. It is the ActorCell responsibility to release Props instance when the actor is killed/stopped.

Aaronontheweb · 2022-01-31T21:46:14Z

So to clarify, the issue here is that these LocalActorRefs have already been killed and are not being GCed?

Arkatufus · 2022-01-31T21:47:32Z

Non-GC-ed LocalActorRef is fine since they're quite small, the problem is that it held a reference to Props that belongs to a dead actor.

Aaronontheweb · 2022-02-01T00:03:49Z

This fix will be available in our next nightly build https://getakka.net/community/getting-access-to-nightly-builds.html

Aaronontheweb added the perf label Jan 31, 2022

Aaronontheweb assigned Arkatufus Jan 31, 2022

Aaronontheweb added akka-ddata potential bug labels Jan 31, 2022

Aaronontheweb added this to the 1.4.33 milestone Jan 31, 2022

Arkatufus mentioned this issue Jan 31, 2022

Remove direct reference to Props in LocalActorRef #5556

Merged

Arkatufus added akka-actor akka-remote and removed akka-ddata labels Jan 31, 2022

Aaronontheweb closed this as completed in #5556 Feb 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak in DistributedData on cross-node replication #5553

Memory leak in DistributedData on cross-node replication #5553

Tavriets commented Jan 31, 2022

Arkatufus commented Jan 31, 2022

Arkatufus commented Jan 31, 2022 •

edited

Loading

Aaronontheweb commented Jan 31, 2022

Arkatufus commented Jan 31, 2022

Aaronontheweb commented Feb 1, 2022

Memory leak in DistributedData on cross-node replication #5553

Memory leak in DistributedData on cross-node replication #5553

Comments

Tavriets commented Jan 31, 2022

Arkatufus commented Jan 31, 2022

Arkatufus commented Jan 31, 2022 • edited Loading

Aaronontheweb commented Jan 31, 2022

Arkatufus commented Jan 31, 2022

Aaronontheweb commented Feb 1, 2022

Arkatufus commented Jan 31, 2022 •

edited

Loading