Skip to content

Akka.Streams: GroupBy shouldn't require users to pre-calculate max number of groups #7514

@Aaronontheweb

Description

@Aaronontheweb

Version Information
Version of Akka.NET? v1.5.38
Which Akka.NET Modules? Akka.Streams

Describe the bug

So referencing my code sample from #7512 again, which you can run here: https://share.linqpad.net/bkgp72uf.linq

// Generate a stable set of data records.
var records = GenerateRecords(numEntities: 50, numPeriods: 10, recordsPerPeriod: 3);

// Build the stream graph.
// Group by EntityId with parallelism of 5.
var stream = Source
	.From(records)
	.GroupBy(5, r => r.EntityId)

This will fail with a TooManySubsStreamsException - because my test data set has 50 unique EntityIds but I've only specified a maximum of 5 substreams. This seems like a really poor design choice to me - why should users have to pre-calculate how many groups their streams are going to be processed in advance?

A much better approach would be for it to work the way my hack does:

.GroupBy(10, r => r.EntityId.GetHashCode() % 5)

Rather than blow up with a stupid exception, just shove the entities into the appropriate grouping as a partitioning strategy instead.

Additional context

If people really depend on / want to keep the current GroupBy behavior - fine, we can keep it even though I think it sucks. Maybe I'll call this .Partition instead (and there's another equally bad stream stage with that name too.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    DXDeveloper experience issues - papercuts, footguns, and other non-bug problems.akka-streamsdiscussion

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions