[query] gnomAD pipeline blows RAM if using split_multi #13606

danking · 2023-09-11T20:32:08Z

What happened?

Here's a tail of the log showing the rapidly increasing RAM use.

I'm working on a simple replicable pipeline now. Does not depend on the use of filter_changed_loci.

2023-09-11 16:22:59.815 : INFO: RegionPool: REPORT_THRESHOLD: 1.0G allocated (662.3M blocks / 363.4M chunks), regions.size = 3, 0 current java objects, thread 24: Thread-3
2023-09-11 16:23:01.488 : INFO: executing D-Array [table_scan_prefix_sums_singlestage] with 1 tasks, contexts size = 430.00 B, globals size = 2.52 MiB
2023-09-11 16:23:01.540 : INFO: RegionPool: initialized for thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:01.567 : INFO: RegionPool: REPORT_THRESHOLD: 2.2M allocated (64.0K blocks / 2.1M chunks), regions.size = 1, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:01.572 : INFO: RegionPool: REPORT_THRESHOLD: 4.2M allocated (64.0K blocks / 4.1M chunks), regions.size = 1, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:01.573 : INFO: RegionPool: REPORT_THRESHOLD: 4.3M allocated (64.0K blocks / 4.2M chunks), regions.size = 1, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:01.573 : INFO: RegionPool: REPORT_THRESHOLD: 4.3M allocated (128.0K blocks / 4.2M chunks), regions.size = 2, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:01.573 : INFO: RegionPool: REPORT_THRESHOLD: 12.3M allocated (192.0K blocks / 12.1M chunks), regions.size = 3, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:01.579 : INFO: RegionPool: REPORT_THRESHOLD: 12.4M allocated (192.0K blocks / 12.2M chunks), regions.size = 3, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:01.582 : INFO: RegionPool: REPORT_THRESHOLD: 35.3M allocated (768.0K blocks / 34.5M chunks), regions.size = 12, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:01.588 : INFO: RegionPool: REPORT_THRESHOLD: 57.7M allocated (768.0K blocks / 56.9M chunks), regions.size = 12, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:01.604 : INFO: RegionPool: REPORT_THRESHOLD: 74.5M allocated (768.0K blocks / 73.7M chunks), regions.size = 12, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:01.715 : INFO: RegionPool: REPORT_THRESHOLD: 139.5M allocated (1.0M blocks / 138.5M chunks), regions.size = 16, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:54.351 : INFO: RegionPool: REPORT_THRESHOLD: 264.3M allocated (1.7M blocks / 262.6M chunks), regions.size = 16, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:55.562 : INFO: RegionPool: REPORT_THRESHOLD: 513.1M allocated (2.3M blocks / 510.8M chunks), regions.size = 16, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:55.799 : INFO: RegionPool: REPORT_THRESHOLD: 1.0G allocated (3.1M blocks / 1.0G chunks), regions.size = 16, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:56.277 : INFO: RegionPool: REPORT_THRESHOLD: 2.0G allocated (4.4M blocks / 2.0G chunks), regions.size = 16, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:57.231 : INFO: RegionPool: REPORT_THRESHOLD: 4.0G allocated (7.3M blocks / 4.0G chunks), regions.size = 16, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:59.198 : INFO: RegionPool: REPORT_THRESHOLD: 8.0G allocated (12.9M blocks / 8.0G chunks), regions.size = 16, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:24:40.382 : INFO: RegionPool: REPORT_THRESHOLD: 16.0G allocated (4.6G blocks / 11.4G chunks), regions.size = 16, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:25:11.138 : INFO: RegionPool: REPORT_THRESHOLD: 32.0G allocated (9.2G blocks / 22.8G chunks), regions.size = 16, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)

Version

0.2.120

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

danking · 2023-09-13T14:57:01Z

Here is a straight-line pipeline that replicates the high memory use. In my experience this can get up to 100GiB of RAM use. https://gist.github.com/danking/3432deabd997ce08515b2088e202a039

The VDS file is privileged. Next steps:

replicate on a public VDS like the HGDP/1KG VDS.
delete as much code as possible from this file to reduce the possible causes.

CHANGELOG: On some pipelines, since at least 0.2.58 (commit 23813af), Hail could use essentially unbounded amounts of memory. This change removes "optimization" rules that accidentally caused that. Closes #13606

danking added the bug label Sep 11, 2023

danking changed the title ~~[query] gnomAD pipeline blows RAM if using split_multi(..., filter_changed_loci=True)~~ [query] gnomAD pipeline blows RAM if using split_multi Sep 11, 2023

danking mentioned this issue Sep 13, 2023

[query] eliminate optimization that can blow RAM #13619

Merged

danking closed this as completed in #13619 Sep 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[query] gnomAD pipeline blows RAM if using split_multi #13606

[query] gnomAD pipeline blows RAM if using split_multi #13606

danking commented Sep 11, 2023 •

edited

Loading

danking commented Sep 13, 2023

[query] gnomAD pipeline blows RAM if using split_multi #13606

[query] gnomAD pipeline blows RAM if using split_multi #13606

Comments

danking commented Sep 11, 2023 • edited Loading

What happened?

Version

Relevant log output

danking commented Sep 13, 2023

danking commented Sep 11, 2023 •

edited

Loading