Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] gnomAD pipeline blows RAM if using split_multi #13606

Closed
danking opened this issue Sep 11, 2023 · 1 comment · Fixed by #13619
Closed

[query] gnomAD pipeline blows RAM if using split_multi #13606

danking opened this issue Sep 11, 2023 · 1 comment · Fixed by #13619
Labels

Comments

@danking
Copy link
Contributor

danking commented Sep 11, 2023

What happened?

Here's a tail of the log showing the rapidly increasing RAM use.

I'm working on a simple replicable pipeline now. Does not depend on the use of filter_changed_loci.

2023-09-11 16:22:59.815 : INFO: RegionPool: REPORT_THRESHOLD: 1.0G allocated (662.3M blocks / 363.4M chunks), regions.size = 3, 0 current java objects, thread 24: Thread-3
2023-09-11 16:23:01.488 : INFO: executing D-Array [table_scan_prefix_sums_singlestage] with 1 tasks, contexts size = 430.00 B, globals size = 2.52 MiB
2023-09-11 16:23:01.540 : INFO: RegionPool: initialized for thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:01.567 : INFO: RegionPool: REPORT_THRESHOLD: 2.2M allocated (64.0K blocks / 2.1M chunks), regions.size = 1, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:01.572 : INFO: RegionPool: REPORT_THRESHOLD: 4.2M allocated (64.0K blocks / 4.1M chunks), regions.size = 1, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:01.573 : INFO: RegionPool: REPORT_THRESHOLD: 4.3M allocated (64.0K blocks / 4.2M chunks), regions.size = 1, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:01.573 : INFO: RegionPool: REPORT_THRESHOLD: 4.3M allocated (128.0K blocks / 4.2M chunks), regions.size = 2, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:01.573 : INFO: RegionPool: REPORT_THRESHOLD: 12.3M allocated (192.0K blocks / 12.1M chunks), regions.size = 3, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:01.579 : INFO: RegionPool: REPORT_THRESHOLD: 12.4M allocated (192.0K blocks / 12.2M chunks), regions.size = 3, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:01.582 : INFO: RegionPool: REPORT_THRESHOLD: 35.3M allocated (768.0K blocks / 34.5M chunks), regions.size = 12, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:01.588 : INFO: RegionPool: REPORT_THRESHOLD: 57.7M allocated (768.0K blocks / 56.9M chunks), regions.size = 12, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:01.604 : INFO: RegionPool: REPORT_THRESHOLD: 74.5M allocated (768.0K blocks / 73.7M chunks), regions.size = 12, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:01.715 : INFO: RegionPool: REPORT_THRESHOLD: 139.5M allocated (1.0M blocks / 138.5M chunks), regions.size = 16, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:54.351 : INFO: RegionPool: REPORT_THRESHOLD: 264.3M allocated (1.7M blocks / 262.6M chunks), regions.size = 16, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:55.562 : INFO: RegionPool: REPORT_THRESHOLD: 513.1M allocated (2.3M blocks / 510.8M chunks), regions.size = 16, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:55.799 : INFO: RegionPool: REPORT_THRESHOLD: 1.0G allocated (3.1M blocks / 1.0G chunks), regions.size = 16, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:56.277 : INFO: RegionPool: REPORT_THRESHOLD: 2.0G allocated (4.4M blocks / 2.0G chunks), regions.size = 16, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:57.231 : INFO: RegionPool: REPORT_THRESHOLD: 4.0G allocated (7.3M blocks / 4.0G chunks), regions.size = 16, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:23:59.198 : INFO: RegionPool: REPORT_THRESHOLD: 8.0G allocated (12.9M blocks / 8.0G chunks), regions.size = 16, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:24:40.382 : INFO: RegionPool: REPORT_THRESHOLD: 16.0G allocated (4.6G blocks / 11.4G chunks), regions.size = 16, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)
2023-09-11 16:25:11.138 : INFO: RegionPool: REPORT_THRESHOLD: 32.0G allocated (9.2G blocks / 22.8G chunks), regions.size = 16, 0 current java objects, thread 115: Executor task launch worker for task 0.0 in stage 37.0 (TID 442)

Version

0.2.120

Relevant log output

No response

@danking danking added the bug label Sep 11, 2023
@danking danking changed the title [query] gnomAD pipeline blows RAM if using split_multi(..., filter_changed_loci=True) [query] gnomAD pipeline blows RAM if using split_multi Sep 11, 2023
@danking
Copy link
Contributor Author

danking commented Sep 13, 2023

Here is a straight-line pipeline that replicates the high memory use. In my experience this can get up to 100GiB of RAM use. https://gist.github.com/danking/3432deabd997ce08515b2088e202a039

The VDS file is privileged. Next steps:

  • replicate on a public VDS like the HGDP/1KG VDS.
  • delete as much code as possible from this file to reduce the possible causes.

danking added a commit that referenced this issue Sep 15, 2023
CHANGELOG: On some pipelines, since at least 0.2.58 (commit 23813af),
Hail could use essentially unbounded amounts of memory. This change
removes "optimization" rules that accidentally caused that.

Closes #13606
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant