add groupByOrdered #346

bbrehm · 2025-09-17T10:42:30Z

groupBy is a really useful operation.

Unfortunately, we cannot really use the old groupBy in the commercial product: We go to a lot of effort to have reproducible results, and the iterator order of unordered hashmaps is generally not reproducible (because the identity-hash of objects is not reproducible).

One would be tempted to just change groupBy to return an ordered immutable Map. This won'tt work in practice: We cannot reliably overwrite the scala stdlib groupBy (people can write List(a,b,c).groupBy which doesn't hit our implicit code at all).

Hence, a new name is needed, in order to make it obvious from our source code whether the groupBy operation respects order or not. However, the new groupByOrdered is a drop-in replacement for the old groupBy (it has exactly the same signature).

Once we have replaced most uses of groupBy with groupByOrdered, we should then deprecate groupBy.

PS. cc fabs and malte because git blame shows that you use groupBy a lot, to inform you about the issue and possible solution.

… more of a drop-in replacement for groupBy

mpollmeier · 2025-09-17T11:08:04Z

core/src/main/scala/flatgraph/traversal/Language.scala

-  def groupBy[K](f: A => K): Map[K, List[A]]                                       = l.groupBy(f)
+  /** Execute the traversal and group elements by a given transformation function, ignoring the iterator order. Use is discouraged. */
+  @Doc(info =
+    "Execute the traversal and group elements by a given transformation function, ignoring the iterator order. Use is discouraged."


Suggested change

"Execute the traversal and group elements by a given transformation function, ignoring the iterator order. Use is discouraged."

"Execute the traversal and group elements by a given transformation function, ignoring the iterator order. If you need reproducable results, please use `groupByOrdered` instead."

I added an explanation of the issue. But use is really discouraged: This is a giant footgun that has blown us up multiple times. (I still shudder at the bug with iteration order in the legacy occurenceHash...)

maltek

bike-shedding on the name: what do you think about groupByStable?

maltek · 2025-09-17T11:05:34Z

core/src/main/scala/flatgraph/traversal/Language.scala

+    while (iterator.hasNext) {
+      val item = iterator.next
+      val key  = f(item)
+      res.getOrElseUpdate(key, List.newBuilder[A]).addOne(item)


If we're going to have our own variant, we might as well change it to not use the slowest of all the data structures (linked lists). I.e. return a Vector or an ArraySeq instead.

That makes it slightly less of a drop-in replacement (pattern matching on the result requires different operators), but IMO that's an acceptable cost.

👍🏻 for Vector or ArraySeq

Vector or ArraySeq requires us to copy the data once more. I prepared a version with LinkedHashMap[K, ArrayBuffer[A]]; the java.util.LinkedHashMap implementation is basically the same as scala.collection.mutable.LinkedHashMap, and writing our own to support even faster groupBy is way overkill.

Immutable data structures are way overrated compared to "just don't mutate the datastructure".

Vector is an amazing feat of engineering. The cool thing is not "immutable", the cool thing is "O(1) snapshots", plus niche applications like good write performance for ZFS on tape drives and hard-drives (writes are sequential!) and SSDs (no write amplification because non-overwriting).

That being said, we only very rarely make active use of O(1) snapshots, and we are running on SRAM/DRAM that supports overwriting, as opposed to flash (which cannot be overwritten).

core/src/main/scala/flatgraph/traversal/Language.scala

bbrehm · 2025-09-17T11:36:58Z

Yeah, this could be faster if we didn't need it to be a drop-in replacement. That was the version in the first commit. The VectorMap is probably even worse than the List part for the drop-in version.

Indeed, the fastest variant would be groupByStable: Iterable[K, mutable.ArrayBuffer[V]] (because LinkedHashMap is also a bit of a performance horror show). A quick review of uses shows that most uses would agree with that (i.e. almost all of the time, we end up iterating over the thing and nothing else).

I tend to agree with you malte.

maltek · 2025-09-22T11:52:31Z

I would vote for groupByStable: scala.collection.Map[K, scala.collection.Seq[V]] then: the result can only be used as an immutable value, but the actually returned implementation can be a mutable implementation.

bbrehm added 3 commits September 17, 2025 11:25

add groupByOrdered step

6a096f6

change API to return LinkedHashMap[K, List[A]], in order to make this…

f33e726

… more of a drop-in replacement for groupBy

muhaha, VectorMap does the job!

fd8e6a5

bbrehm requested review from mpollmeier, maltek, fabsx00 and ml86 September 17, 2025 10:43

mpollmeier approved these changes Sep 17, 2025

View reviewed changes

maltek reviewed Sep 17, 2025

View reviewed changes

rename to groupByStable; add a fast version

7c66b1e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add groupByOrdered #346

add groupByOrdered #346

Uh oh!

bbrehm commented Sep 17, 2025 •

edited

Loading

Uh oh!

mpollmeier Sep 17, 2025

Uh oh!

bbrehm Sep 17, 2025

Uh oh!

maltek left a comment

Uh oh!

maltek Sep 17, 2025 •

edited

Loading

Uh oh!

mpollmeier Sep 17, 2025

Uh oh!

bbrehm Sep 17, 2025

Uh oh!

Uh oh!

bbrehm commented Sep 17, 2025

Uh oh!

maltek commented Sep 22, 2025

Uh oh!

Uh oh!

	"Execute the traversal and group elements by a given transformation function, ignoring the iterator order. Use is discouraged."
	"Execute the traversal and group elements by a given transformation function, ignoring the iterator order. If you need reproducable results, please use `groupByOrdered` instead."

add groupByOrdered #346

Are you sure you want to change the base?

add groupByOrdered #346

Uh oh!

Conversation

bbrehm commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mpollmeier Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

bbrehm Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

maltek left a comment

Choose a reason for hiding this comment

Uh oh!

maltek Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mpollmeier Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

bbrehm Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bbrehm commented Sep 17, 2025

Uh oh!

maltek commented Sep 22, 2025

Uh oh!

Uh oh!

bbrehm commented Sep 17, 2025 •

edited

Loading

maltek Sep 17, 2025 •

edited

Loading