[query] eliminate optimization that can blow RAM #13619

danking · 2023-09-13T21:30:17Z

CHANGELOG: On some pipelines, since at least 0.2.58 (commit 23813af), Hail could use essentially unbounded amounts of memory. This change removes "optimization" rules that accidentally caused that.

Closes #13606

CHANGELOG: On some pipelines, since at least 0.2.58 (commit 23813af), Hail could use essentially unbounded amounts of memory. This change removes "optimization" rules that accidentally caused that.

patrick-schultz

These optimizations are too important to just disable. I think a better quickfix is to do

    case ToStream(ToArray(s), false) if s.typ.isInstanceOf[TStream] => s

    case ToStream(Let(name, value, ToArray(x)), false) if x.typ.isInstanceOf[TStream] =>
      Let(name, value, x)

The right fix is probably to add a requiresMemoryManagementPerElement flag to StreamMap---which could be generally useful when the producer didn't care about memory management, but the map body allocates a lot and wants to free after each row---and then use that to make these smarter:

    case ToStream(ToArray(s), false) if s.typ.isInstanceOf[TStream] => s
    case ToStream(ToArray(s), true) if s.typ.isInstanceOf[TStream] =>
      StreamMap(s, uid, Ref(uid), requiresMemoryManagementPerElement = true)

    case ToStream(Let(name, value, ToArray(x)), false) if x.typ.isInstanceOf[TStream] =>
      Let(name, value, x)
    case ToStream(Let(name, value, ToArray(x)), false) if x.typ.isInstanceOf[TStream] =>
      Let(name, value, StreamMap(x, uid, Ref(uid), requiresMemoryManagementPerElement = true))

And I still don't fully understand why these were deoptimizing the gnomad pipeline.

danking · 2023-09-14T19:28:10Z

FWIW, I finally found a simpler reproducer. It really takes some doing to convince the simplifier to apply this rule.

This operation should use a constant ~1GiB of RAM (in reality, in a non-broken pipeline it uses closer to 8GiB, but, still, a constant amount of RAM), but in reality memory use grows with each row processed

import hail as hl
ht = hl.utils.range_table(1)
ht = ht.key_by()
ht = ht.select(rows = hl.range(10))
ht = ht.explode('rows')
ht = ht.annotate(garbage=hl.range(1024 ** 3))
ht.write('/tmp/foo.ht', overwrite=True)

The simplifier cannot simplify the pipeline if the key is still present so this pipeline is sufficient to restore normal memory usage:

import hail as hl
ht = hl.utils.range_table(1)
ht = ht.select(rows = hl.range(10))
ht = ht.explode('rows')
ht = ht.annotate(garbage=hl.range(1024 ** 3))
ht.write('/tmp/foo.ht', overwrite=True)

The "bad" WritePartition body IR looks like this:

(StreamFlatMap __iruid_447
  (StreamRange -1 True
    (GetField start (Ref __iruid_446))
    (GetField end (Ref __iruid_446))
    (I32 1))
  (StreamMap __iruid_448
    (StreamRange 1 False (I32 0) (I32 10) (I32 1))
    (InsertFields
      (Literal Struct{} <literal value>)
      ("rows" "garbage")
      (rows (Ref __iruid_448))
      (garbage
        (ToArray
          (StreamRange 2 False
            (I32 0)
            (I32 1073741824)
            (I32 1)))))))

The "good" IR looks like this:

(StreamFlatMap __iruid_480
  (StreamRange -1 True
    (GetField start (Ref __iruid_479))
    (GetField end (Ref __iruid_479))
    (I32 1))
  (Let __iruid_481
    (MakeStruct
      (idx (Ref __iruid_480))
      (rows
        (ToArray
          (StreamRange 1 False (I32 0) (I32 10) (I32 1)))))
    (StreamMap __iruid_482
      (ToStream True (GetField rows (Ref __iruid_481)))
      (InsertFields
        (Ref __iruid_481)
        ("idx" "rows" "garbage")
        (rows (Ref __iruid_482))
        (garbage
          (ToArray
            (StreamRange 2 False
              (I32 0)
              (I32 1073741824)
              (I32 1))))))))

Notice, in particular, that the StreamMap inside the StreamFlatMap uses memory management because the originating ToStream uses memory management.

danking · 2023-09-14T19:37:11Z

Issue for StreamMap memory management: #13623

done.

danking · 2023-09-14T22:09:57Z

@patrick-schultz bump for tomorrow

patrick-schultz

That's a very clear example, thanks!

[query] eliminate optimization that can blow RAM

083af9f

CHANGELOG: On some pipelines, since at least 0.2.58 (commit 23813af), Hail could use essentially unbounded amounts of memory. This change removes "optimization" rules that accidentally caused that.

danking assigned patrick-schultz Sep 13, 2023

patrick-schultz previously requested changes Sep 14, 2023

View reviewed changes

optimize only if memory management is not required

b8e7054

danking mentioned this pull request Sep 14, 2023

[query] StreamMap should accept a requiresMemoryManagementPerElement argument #13623

Open

patrick-schultz approved these changes Sep 14, 2023

View reviewed changes

danking merged commit 43c7597 into hail-is:main Sep 15, 2023
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[query] eliminate optimization that can blow RAM #13619

[query] eliminate optimization that can blow RAM #13619

danking commented Sep 13, 2023 •

edited

Loading

patrick-schultz left a comment

danking commented Sep 14, 2023

danking commented Sep 14, 2023

danking commented Sep 14, 2023

patrick-schultz left a comment

[query] eliminate optimization that can blow RAM #13619

[query] eliminate optimization that can blow RAM #13619

Conversation

danking commented Sep 13, 2023 • edited Loading

patrick-schultz left a comment

Choose a reason for hiding this comment

danking commented Sep 14, 2023

danking commented Sep 14, 2023

danking commented Sep 14, 2023

patrick-schultz left a comment

Choose a reason for hiding this comment

danking commented Sep 13, 2023 •

edited

Loading