Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenomicsDBImport extremely inefficient output #6487

Open
1 task
mishaploid opened this issue Mar 5, 2020 · 10 comments
Open
1 task

GenomicsDBImport extremely inefficient output #6487

mishaploid opened this issue Mar 5, 2020 · 10 comments
Assignees

Comments

@mishaploid
Copy link

Bug Report

Affected tool(s) or class(es)

GenomicsDBImport

Affected version(s)

  • Public release version 4.1.4.1

Description

Running GenomicsDBImport on an HPC cluster using SLURM, admin mentioned that the jobs are writing inefficiently to shared storage (@spikebike will follow up with HPC specifics and logs).

Steps to reproduce

Using GATK jar /share/apps/gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx60g -Xms60g -jar /share/apps/gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar GenomicsDBImport --genomicsdb-workspace-path data/interim/combined_database_bpres/0004 --batch-size 50 --reader-threads 6 --sample-name-map data/processed/sample_map --intervals data/processed/scattered_intervals/0004-scattered.intervals --tmp-dir /scratch/sdturner/genomicsdbimport/0004

Expected behavior

My understanding is that it may be more efficient to use a small buffer and write the final database in full.

Actual behavior

Again my (limited) understanding is that the tool is writing output multiple times and throwing out all but the last write. Here is an example of a log for a 2.6 Mb region and 295 samples:

07:24:39.198 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/apps/gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
Feb 28, 2020 7:24:39 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
07:24:39.616 INFO  GenomicsDBImport - ------------------------------------------------------------
07:24:39.617 INFO  GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.4.1
07:24:39.617 INFO  GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
07:24:39.617 INFO  GenomicsDBImport - Executing as sdturner@c6-74 on Linux v4.15.0-65-generic amd64
07:24:39.617 INFO  GenomicsDBImport - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_20-b26
07:24:39.617 INFO  GenomicsDBImport - Start Date/Time: February 28, 2020 7:24:39 AM PST
07:24:39.617 INFO  GenomicsDBImport - ------------------------------------------------------------
07:24:39.617 INFO  GenomicsDBImport - ------------------------------------------------------------
07:24:39.618 INFO  GenomicsDBImport - HTSJDK Version: 2.21.0
07:24:39.618 INFO  GenomicsDBImport - Picard Version: 2.21.2
07:24:39.618 INFO  GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
07:24:39.618 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
07:24:39.618 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
07:24:39.618 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
07:24:39.618 INFO  GenomicsDBImport - Deflater: IntelDeflater
07:24:39.618 INFO  GenomicsDBImport - Inflater: IntelInflater
07:24:39.618 INFO  GenomicsDBImport - GCS max retries/reopens: 20
07:24:39.618 INFO  GenomicsDBImport - Requester pays: disabled
07:24:39.618 INFO  GenomicsDBImport - Initializing engine
07:24:44.689 INFO  IntervalArgumentCollection - Processing 2644303 bp from intervals
07:24:44.693 INFO  GenomicsDBImport - Done initializing engine
07:24:44.988 INFO  GenomicsDBImport - Vid Map JSON file will be written to /group/jrigrp10/Bo-demography/data/interim/combined_database_bpres/0004/vidmap.json
07:24:44.988 INFO  GenomicsDBImport - Callset Map JSON file will be written to /group/jrigrp10/Bo-demography/data/interim/combined_database_bpres/0004/callset.json
07:24:44.988 INFO  GenomicsDBImport - Complete VCF Header will be written to /group/jrigrp10/Bo-demography/data/interim/combined_database_bpres/0004/vcfheader.vcf
07:24:44.988 INFO  GenomicsDBImport - Importing to array - /group/jrigrp10/Bo-demography/data/interim/combined_database_bpres/0004/genomicsdb_array
07:24:44.989 INFO  ProgressMeter - Starting traversal
07:24:44.989 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Batches Processed   Batches/Minute
07:24:45.173 INFO  GenomicsDBImport - Starting batch input file preload
07:26:54.394 INFO  GenomicsDBImport - Finished batch preload
07:26:54.395 INFO  GenomicsDBImport - Importing batch 1 with 50 samples
13:18:55.420 INFO  ProgressMeter -          C1:10577213            354.2                     1              0.0
13:18:55.420 INFO  GenomicsDBImport - Done importing batch 1/6
13:18:55.421 INFO  GenomicsDBImport - Starting batch input file preload
13:21:46.821 INFO  GenomicsDBImport - Finished batch preload
13:21:46.822 INFO  GenomicsDBImport - Importing batch 2 with 50 samples
19:28:12.311 INFO  ProgressMeter -          C1:10577213            723.5                     2              0.0
19:28:12.312 INFO  GenomicsDBImport - Done importing batch 2/6
19:28:12.312 INFO  GenomicsDBImport - Starting batch input file preload
19:30:16.471 INFO  GenomicsDBImport - Finished batch preload
19:30:16.473 INFO  GenomicsDBImport - Importing batch 3 with 50 samples
00:53:26.358 INFO  ProgressMeter -          C1:10577213           1048.7                     3              0.0
00:53:26.359 INFO  GenomicsDBImport - Done importing batch 3/6
00:53:26.360 INFO  GenomicsDBImport - Starting batch input file preload
00:54:29.501 INFO  GenomicsDBImport - Finished batch preload
00:54:29.502 INFO  GenomicsDBImport - Importing batch 4 with 50 samples
06:31:35.151 INFO  ProgressMeter -          C1:10577213           1386.8                     4              0.0
06:31:35.152 INFO  GenomicsDBImport - Done importing batch 4/6
06:31:35.152 INFO  GenomicsDBImport - Starting batch input file preload
06:32:34.398 INFO  GenomicsDBImport - Finished batch preload
06:32:34.404 INFO  GenomicsDBImport - Importing batch 5 with 50 samples
11:49:07.726 INFO  ProgressMeter -          C1:10577213           1704.4                     5              0.0
11:49:07.727 INFO  GenomicsDBImport - Done importing batch 5/6
11:49:07.727 INFO  GenomicsDBImport - Starting batch input file preload
11:49:48.117 INFO  GenomicsDBImport - Finished batch preload
11:49:48.117 INFO  GenomicsDBImport - Importing batch 6 with 45 samples
16:32:47.060 INFO  ProgressMeter -          C1:10577213           1988.0                     6              0.0
16:32:47.060 INFO  GenomicsDBImport - Done importing batch 6/6
16:32:47.061 INFO  ProgressMeter -          C1:10577213           1988.0                     6              0.0
16:32:47.062 INFO  ProgressMeter - Traversal complete. Processed 6 total batches in 1988.0 minutes.
16:32:47.062 INFO  GenomicsDBImport - Import of all batches to GenomicsDB completed!
16:32:47.062 INFO  GenomicsDBImport - Shutting down engine
[February 29, 2020 4:32:47 PM PST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 1,988.13 minutes.
Runtime.totalMemory()=57266405376

And the SLURM log:

Name                : combine_gvcfs.count=0004
User                : sdturner
Partition           : med2
Nodes               : c6-74
Cores               : 21
GPUs                : 0
State               : COMPLETED
Submit              : 2020-02-28T07:24:31
Start               : 2020-02-28T07:24:31
End                 : 2020-02-29T16:32:58
Reserved walltime   : 14-00:00:00
Used walltime       : 1-09:08:27
Used CPU time       : 1-10:10:37
% User (Computation): 89.40%
% System (I/O)      : 10.60%
Mem reserved        : 80G/node
Max Mem used        : 62.43G (c6-74)
Max Disk Write      : 0.00  (c6-74)
Max Disk Read       : 307.20K (c6-74)

@ldgauthier
Copy link
Contributor

The Hail team noticed this as well. I think Cotton told me that GDB appeared to be writing 3X more data than he expected.

@nalinigans how hard would this be to optimize?

@spikebike
Copy link

spikebike commented Mar 5, 2020

Just to provide some concrete logs as to exactly what is happening. I'm using systemtap on a NFS server to track exactly what is going on during these runs.

For one second (the first column), for one run, for one file:

1583360232 192.168.6.75:58626 nfsd.proc4.write 290,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 546,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 817,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 918,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 1020,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 1194,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 1235,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 1499,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 1669,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 1808,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 1838,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 1879,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 1909,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 1939,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 1991,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 2059,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 2089,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 2178,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 2347,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 2414,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 2559,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 2762,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 2858,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 2975,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 3141,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 3256,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 3330,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 3413,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 3443,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 3534,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 3564,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 3594,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 3662,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 3720,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 3818,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 4256,245760 /BaseQRankSum.tdb
1583360232 192.168.6.75:58626 nfsd.proc4.write 297,249856 /BaseQRankSum.tdb

Note the following about the above I/O:

  • The position (the 2nd number) is NOT changing, so each write seeks to a position and overwrites that block
  • As a result the previous 34 writes are thrown away and are useless, only the 35th write makes it to the resulting file.
  • Even the last IO of 4256 bytes, only moves the position up by 4096 bytes.
  • This problem turns a relatively I/O friendly sequential workload turns into a bunch of tiny writes with seeks for each one. TERRIBLE for spinning disks.

The solution I propose is just to have two 64KB buffers and only write 64KB when the first buffer is full with the overflow going into the second buffer. This would not only eliminate the seeks, but also reduce the IOP rate by a factor of 600-700 and change this random workload to a mostly sequential. I can write an example code that outputs 100 bytes into a buffer and then writes 64KB at a time if that's helpful.

@nalinigans
Copy link
Collaborator

@mishaploid, I am assuming the 295 to be single sample vcfs. What do the data/processed/scattered_intervals/0004-scattered.intervals look like? Are the intervals very small? Have you tried larger intervals?

@spikebike, thanks for the systemtap output. We do have large internal buffers to help with this type of usage, but will revisit the code to figure out the behavior you are seeing. We do have some experimental optimizations not rolled out yet for writing minimally to shared filesystems. Would you be able to run your tests if we create a gatk branch next week with those changes?

@ldgauthier, I think the Hail team used multi sample vcfs as well. We do have some optimizations(work-in-progress) for importing multi sample vcfs that will get rolled out in the next GenomicsDB release.

@mishaploid
Copy link
Author

Thanks for the quick reply @nalinigans! The 295 does refer to individual sample vcfs. The intervals are small at about 2.6 Mb each (example file below). I was originally running by chromosome, but the amount of time to complete GenomicsDBImport was intractable.

Example interval file:

@HD	VN:1.5
@SQ	SN:C1	LN:52030199	M5:c411f5ed81798b28a72a816e0ac15f8e	UR:file:/home/sdturner/Bo-demography/data/external/ref/Boleracea_chromosomes.fasta
@SQ	SN:C2	LN:63782621	M5:25808b628a0a9601b772c7877881989e	UR:file:/home/sdturner/Bo-demography/data/external/ref/Boleracea_chromosomes.fasta
@SQ	SN:C3	LN:73711317	M5:7ec8c0c4ed4870f051cac2f747d175db	UR:file:/home/sdturner/Bo-demography/data/external/ref/Boleracea_chromosomes.fasta
@SQ	SN:C4	LN:65381821	M5:1ecb26734734c810afc8aab43f18a289	UR:file:/home/sdturner/Bo-demography/data/external/ref/Boleracea_chromosomes.fasta
@SQ	SN:C5	LN:54996917	M5:9a53eae4f3e499cc8d01359879e9ad9a	UR:file:/home/sdturner/Bo-demography/data/external/ref/Boleracea_chromosomes.fasta
@SQ	SN:C6	LN:46538988	M5:3d8bbd6bffd039864cf1f06f59212a12	UR:file:/home/sdturner/Bo-demography/data/external/ref/Boleracea_chromosomes.fasta
@SQ	SN:C7	LN:58257932	M5:395956824e320028a238506fa4697fc6	UR:file:/home/sdturner/Bo-demography/data/external/ref/Boleracea_chromosomes.fasta
@SQ	SN:C8	LN:50921300	M5:b37be0bbf7c7509d15f0941305f32db3	UR:file:/home/sdturner/Bo-demography/data/external/ref/Boleracea_chromosomes.fasta
@SQ	SN:C9	LN:63239600	M5:20d51228f903e8f216a94c45361f5a09	UR:file:/home/sdturner/Bo-demography/data/external/ref/Boleracea_chromosomes.fasta
C1	10577213	13221515	+	.

@ldgauthier
Copy link
Contributor

@nalinigans the particular Hail experiment I'm thinking of was importing three single-sample VCFs, although it was many versions ago (just after the fix for the absolute path requirement.)

@spikebike
Copy link

spikebike commented Mar 5, 2020

@nalinigans Sure, we would be happy to run any code you suggest. Maybe your internal buffers are only for reads? Definitely seeing not only tiny writes, but tiny writes that overwrite previous writes. Only 1/40th or so of the writes are ending up in the final files.

SSDs don't particularly care about sequential vs random writes, but our spinning disks do. The users workload was resulting in 12,000 IO operations per second for hours.

@nalinigans
Copy link
Collaborator

@spikebike, just started looking at this issue again. We are benchmarking operations with NFS and will put out an optimized library soon. But, GenomicsDB does use filesystem locking to allow for simultaneous reads/writes.

  • Did you try export TILEDB_DISABLE_FILE_LOCKING=1 ?
  • Are you using the --consolidate true option with GenomicsDBImport explicitly? If yes, would it be possible to set that option to false which is also the default.

@GATKSupportTeam
Copy link
Collaborator

(related to Zendesk ticket #5153)

@mishaploid
Copy link
Author

mishaploid commented May 4, 2020

Hi @nalinigans, sorry I missed your recent post. Just adding that I haven't tried export TILEDB_DISABLE_FILE_LOCKING=1 and don't have the --consolidate flag specified so it should be using the defaults.

@nalinigans
Copy link
Collaborator

@mishaploid and @spikebike. 4.1.8.0 has a new option - --genomicsdb-shared-posixfs-optimizations for GenomicsDBImport that disable file locking and minimize writes to NFS. We are interested to know if this option helps your use case even as we continue making performance improvements. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants