Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Akka.Cluster.Sharding ShardRegion - DurableData bottleneck #5190

Closed
Aaronontheweb opened this issue Aug 10, 2021 · 1 comment
Closed

Akka.Cluster.Sharding ShardRegion - DurableData bottleneck #5190

Aaronontheweb opened this issue Aug 10, 2021 · 1 comment
Labels
akka-cluster-sharding akka-ddata-durable LMDB implementation for persisting durable data perf

Comments

@Aaronontheweb
Copy link
Member

Version Information
Version of Akka.NET? v1.4.23 (and also reproduced with v1.4.22)
Which Akka.NET Modules? Akka.Cluster.Sharding + DData

Occurs on Linux and Windows

Describe the performance issue
Using a reproduction sample I created while testing our solution for #5174, I used the following configuration + code:

akka {
  actor {
    provider = cluster
  }
  
  remote {
    dot-netty.tcp {
      public-hostname = "localhost"
      hostname = "0.0.0.0"
      port = 4051
    }
  }            

  cluster {
    downing-provider-class = "Akka.Cluster.SplitBrainResolver, Akka.Cluster"
    split-brain-resolver {
      active-strategy = keep-majority
    }
    
    sharding{
        state-store-mode = ddata
        remember-entities = on
    }
    
    seed-nodes = [] 
    roles = []
  }
}
var sharding = ClusterSharding.Get(ClusterSystem);
var shardRegion = sharding.Start("entity", s => Props.Create<EntityActor>(s),
    ClusterShardingSettings.Create(ClusterSystem),
    new EntityRouter(100));

var cluster = Cluster.Get(ClusterSystem);
cluster.Join(cluster.SelfAddress);

Cluster.Get(ClusterSystem).RegisterOnMemberUp(() =>
{
    ClusterSystem.Scheduler.Advanced.ScheduleRepeatedly(TimeSpan.FromMilliseconds(100),
        TimeSpan.FromMilliseconds(100),
        () =>
        {
            for (var i = 0; i < 25; i++)
            {
                shardRegion.Tell(new EntityCmd(ThreadLocalRandom.Current.Next().ToString()));
            }
        });
});

The sharding configuration is very vanilla - a basic setup without many frills.

Data and Specs
This configuration allowed for:

  • Up to 100 shards
  • A potentially unbounded number of entities
  • 250 ShardRegion msg/s

Within a few seconds of starting up the solution, I began receiving messages along the lines of:

[WARNING][8/10/2021 2:58:19 AM][Thread 0012][akka.tcp://ClusterSys@desktop-13cpqtr:4051/system/sharding/entity] entity: Requested shard homes [0, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 22, 23, 24, 25, 26, 27, 28, 29, 3, 31, 3
4, 37, 39, 4, 40, 41, 42, 43, 44, 45, 46, 47, 50, 56, 57, 59, 61, 62, 63, 64, 65, 68, 69, 7, 70, 71, 74, 75, 76, 77, 79, 8, 82, 84, 85, 86, 87, 88, 89, 9, 92, 93, 95, 96, 97, 98, 99] from coordinator at [[akka://ClusterSys/system
/sharding/entityCoordinator/singleton/coordinator#208095769]]. [2790] total buffered messages.

The issue here appears to be that the messages we're attempting to route to the entity actors starve out the messages needed to allocate the shards where those entity actors will live. That's a problem.

This issue does not occur when:

  • remember-entities = off
  • state-store-mode = persistence - although I haven't tried this using a for-real Akka.Persistence store yet. Only the defaults (in-memory journal, file system snapshot.)

This leads me to believe that the performance issue here is likely caused by how we interact with the DurableStore when using DData mode.

You can see the full demo here: https://github.com/Aaronontheweb/Akka.Cluster.Sharding.DDataDemo

Expected behavior
I'd expect the ShardRegion to be able to support the creation of thousands of entities per second, particularly at node startup when remember-entities = on and it should be able to still allocate shards while it's doing that AND processing new messages intended for those entities.

Actual behavior
The system locked up and the buffer perpetually filled up.

Environment
.NET Core 3.1, Windows 10 bare metal
.NET Core 3.1, Ubuntu 20.04, WSL2

@Aaronontheweb Aaronontheweb added akka-cluster-sharding perf akka-ddata-durable LMDB implementation for persisting durable data labels Aug 10, 2021
@Aaronontheweb Aaronontheweb changed the title Akka.Cluster.Sharding ShardRegion bottleneck Akka.Cluster.Sharding ShardRegion - DurableData bottleneck Aug 10, 2021
@Aaronontheweb
Copy link
Member Author

Mostly resolved via the changes introduced in Akka.NET v1.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
akka-cluster-sharding akka-ddata-durable LMDB implementation for persisting durable data perf
Projects
None yet
Development

No branches or pull requests

1 participant