Empirical formula predicting disk space usage? #491

SHuang-Broad · 2021-11-03T15:52:39Z

Have you checked the FAQ? https://github.com/google/deepvariant/blob/r1.2/docs/FAQ.md:
Yes

Describe the issue:

When running WDL workflows backed with PAPI, I get PAPI error 10, which indicates the disk is full.

Setup

Operating system: Docker image coming with DV-Margin-Pepper: kishwars/pepper_deepvariant:r0.4.1
DeepVariant version: Docker image coming with DV-Margin-Pepper: kishwars/pepper_deepvariant:r0.4.1
Installation method (Docker, built from source, etc.): Docker
Type of data: ONT, GRCh38, process by chromosome.

Steps to reproduce:

# This is the command from Pepper, but judged from the log, the command failed during the DV stage.
  run_pepper_margin_deepvariant \
            call_variant \
            -b ~{bam} \
            -f ~{ref_fasta} \
            -t "${num_core}" \
            -s "${SM}" \
            -o "~{output_root}" \
            -p "~{prefix}" \
            --gvcf \
            --phased_output \
            --ont

Relevant part of the log file (which is over 200MB):

run_pepper_margin_deepvariant call_variant -b /cromwell_root/fc-1aea7e86-3760-4d8f-9f98-d199e815e8e2/7a319de0-a99a-4429-84a6-20c8f2b9373f/ONTWholeGenome/977d19ea-5082-4605-8595-803df94ec9dc/call-CallVariants/CallVariants/2ab0b7ef-d657-4d70-9d3c-3b9b74720a00/call-size_balanced_scatter/shard-2/cacheCopy/T708322218_ONT.10_14-p.bam -f /cromwell_root/broad-dsde-methods-long-reads/resources/references/grch38_noalt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa -t 64 -s 6061-SL-0029 -o /cromwell_root/pepper_output -p T708322218_ONT.10_14-p.deepvariant_pepper --gvcf --phased_output --ont
[11-03-2021 13:40:40] INFO: VARIANT CALLING MODULE SELECTED
[11-03-2021 13:40:40] INFO: [1/9] RUNNING THE FOLLOWING COMMAND
-------
mkdir -p /cromwell_root/pepper_output; 
mkdir -p /cromwell_root/pepper_output/logs; 
mkdir -p /cromwell_root/pepper_output/intermediate_files;
-------
[11-03-2021 13:40:40] INFO: [2/9] RUNNING THE FOLLOWING COMMAND
-------
time pepper_snp call_variant -b /cromwell_root/fc-1aea7e86-3760-4d8f-9f98-d199e815e8e2/7a319de0-a99a-4429-84a6-20c8f2b9373f/ONTWholeGenome/977d19ea-5082-4605-8595-803df94ec9dc/call-CallVariants/CallVariants/2ab0b7ef-d657-4d70-9d3c-3b9b74720a00/call-size_balanced_scatter/shard-2/cacheCopy/T708322218_ONT.10_14-p.bam -f /cromwell_root/broad-dsde-methods-long-reads/resources/references/grch38_noalt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa -t 64 -m /opt/pepper_models/PEPPER_SNP_R941_ONT_V4.pkl -o /cromwell_root/pepper_output/pepper_snp/ -s 6061-SL-0029 -w 4 -bs 64 --ont 2>&1 | tee /cromwell_root/pepper_output/logs/1_pepper_snp.log
-------
[11-03-2021 13:40:41] INFO: CALL VARIANT MODULE SELECTED.
[11-03-2021 13:40:41] INFO: ONT PROFILE SET FOR VARIANT CALLING.
[11-03-2021 13:40:41] INFO: RUN-ID: 11032021_134041
[11-03-2021 13:40:41] INFO: IMAGE OUTPUT: /cromwell_root/pepper_output/pepper_snp/images_11032021_134041/
[11-03-2021 13:40:41] STEP 1: GENERATING IMAGES
[11-03-2021 13:40:41] INFO: COMMON CONTIGS FOUND: ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrM', 'chrX', 'chrY']
[11-03-2021 13:40:41] INFO: TOTAL CONTIGS: 25 TOTAL INTERVALS: 30895
[11-03-2021 13:40:41] STARTING THREAD: 0 FOR 483 INTERVALS
[11-03-2021 13:40:41] INFO:  10/483 COMPLETE (2%) [ELAPSED TIME: 0 Min 0 Sec]
...
[11-03-2021 13:42:49] INFO:  470/483 COMPLETE (97%) [ELAPSED TIME: 2 Min 8 Sec]
[11-03-2021 13:42:49] INFO:  480/483 COMPLETE (99%) [ELAPSED TIME: 2 Min 8 Sec]
[11-03-2021 13:42:49] THREAD 0 FINISHED SUCCESSFULLY.
[11-03-2021 13:44:25] FINISHED IMAGE GENERATION
[11-03-2021 13:44:25] TOTAL ELAPSED TIME FOR IMAGE GENERATION: 3 Min 44 Sec
[11-03-2021 13:44:25] STEP 2: RUNNING INFERENCE
[11-03-2021 13:44:25] INFO: PREDICTION OUTPUT: /cromwell_root/pepper_output/pepper_snp/predictions_11032021_134041/
[11-03-2021 13:44:25] INFO: DISTRIBUTED CPU SETUP.
[11-03-2021 13:44:25] INFO: TOTAL CALLERS: 64
[11-03-2021 13:44:25] INFO: THREADS PER CALLER: 1
[11-03-2021 13:44:25] INFO: MODEL LOADING TO ONNX
[11-03-2021 13:45:22] INFO: BATCHES PROCESSED 5/35.
[11-03-2021 13:46:21] INFO: BATCHES PROCESSED 10/35.
[11-03-2021 13:47:17] INFO: BATCHES PROCESSED 15/35.
[11-03-2021 13:48:11] INFO: BATCHES PROCESSED 20/35.
[11-03-2021 13:49:06] INFO: BATCHES PROCESSED 25/35.
[11-03-2021 13:49:59] INFO: BATCHES PROCESSED 30/35.
[11-03-2021 13:50:39] INFO: BATCHES PROCESSED 35/35.
[11-03-2021 13:50:39] INFO: THREAD 0 FINISHED SUCCESSFULLY.
[11-03-2021 13:50:44] INFO: FINISHED PREDICTION
[11-03-2021 13:50:44] INFO: ELAPSED TIME: 6 Min 18 Sec
[11-03-2021 13:50:44] INFO: PREDICTION FINISHED SUCCESSFULLY. 
[11-03-2021 13:50:44] TOTAL ELAPSED TIME FOR INFERENCE: 6 Min 18 Sec
[11-03-2021 13:50:44] STEP 3: RUNNING FIND CANDIDATES
[11-03-2021 13:50:44] INFO: PREDICTION OUTPUT: /cromwell_root/pepper_output/pepper_snp/
[11-03-2021 13:50:44] INFO: PROCESSING CONTIG: chr10
[11-03-2021 13:53:46] INFO: FINISHED PROCESSING chr10, TOTAL CANDIDATES FOUND: 345013.
[11-03-2021 13:53:53] INFO: PROCESSING CONTIG: chr14
[11-03-2021 13:54:02] INFO: FINISHED PROCESSING chr14, TOTAL CANDIDATES FOUND: 3092.
[11-03-2021 13:54:02] TOTAL ELAPSED TIME FOR VARIANT CALLING: 13 Min 21 Sec

real	13m23.051s
user	579m29.953s
sys	11m32.825s
[11-03-2021 13:54:03] INFO: [3/9] RUNNING THE FOLLOWING COMMAND
-------
mv /cromwell_root/pepper_output/pepper_snp/*.vcf /cromwell_root/pepper_output/PEPPER_SNP_OUPUT.vcf; 
bgzip /cromwell_root/pepper_output/PEPPER_SNP_OUPUT.vcf; 
tabix -p vcf /cromwell_root/pepper_output/PEPPER_SNP_OUPUT.vcf.gz; 
rm -rf /cromwell_root/pepper_output/pepper_snp/; 
echo "CONTIGS FOUND IN PEPPER SNP VCF:"; 
zcat /cromwell_root/pepper_output/PEPPER_SNP_OUPUT.vcf.gz | grep -v '#' | cut -f1 | uniq
-------
CONTIGS FOUND IN PEPPER SNP VCF:
chr10
chr14
[11-03-2021 13:54:07] INFO: [4/9] RUNNING THE FOLLOWING COMMAND
-------
time margin phase /cromwell_root/fc-1aea7e86-3760-4d8f-9f98-d199e815e8e2/7a319de0-a99a-4429-84a6-20c8f2b9373f/ONTWholeGenome/977d19ea-5082-4605-8595-803df94ec9dc/call-CallVariants/CallVariants/2ab0b7ef-d657-4d70-9d3c-3b9b74720a00/call-size_balanced_scatter/shard-2/cacheCopy/T708322218_ONT.10_14-p.bam /cromwell_root/broad-dsde-methods-long-reads/resources/references/grch38_noalt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa /cromwell_root/pepper_output/PEPPER_SNP_OUPUT.vcf.gz /opt/margin_dir/params/misc/allParams.ont_haplotag.json -t 64 -V -o /cromwell_root/pepper_output/MARGIN_PHASED.PEPPER_SNP_MARGIN 2>&1 | tee /cromwell_root/pepper_output/logs/2_margin_haplotag.log;
mv /cromwell_root/pepper_output/*.bam /cromwell_root/pepper_output/MARGIN_PHASED.PEPPER_SNP_MARGIN.haplotagged.bam; 
samtools index -@64 /cromwell_root/pepper_output/MARGIN_PHASED.PEPPER_SNP_MARGIN.haplotagged.bam
-------
Running OpenMP with 64 threads.
> Parsing model parameters from file: /opt/margin_dir/params/misc/allParams.ont_haplotag.json
> Parsed 346237 HET VCF entries from /cromwell_root/pepper_output/PEPPER_SNP_OUPUT.vcf.gz; skipped 0 for region, 0 for not being PASS, 115453 for being homozygous, 0 for being INDEL
> Set up bam chunker in 20s with chunk size 100000 and overlap 10000 (for region=chr10,chr14), resulting in 1342 total chunks
> Ordering chunks by estimated depth
> Setup complete, beginning run
> Polishing  3% complete (46/1342).  Estimated time remaining: unknown
> Polishing  5% complete (80/1342).  Estimated time remaining: 3m 10s
> Polishing  9% complete (131/1342).  Estimated time remaining: 2m 51s
> Polishing 14% complete (191/1342).  Estimated time remaining: 2m 39s
> Polishing 17% complete (233/1342).  Estimated time remaining: 2m 36s
> Polishing 19% complete (259/1342).  Estimated time remaining: 2m 33s
> Polishing 25% complete (338/1342).  Estimated time remaining: 2m 24s
> Polishing 31% complete (418/1342).  Estimated time remaining: 2m 15s
> Polishing 36% complete (487/1342).  Estimated time remaining: 2m 2s
> Polishing 40% complete (547/1342).  Estimated time remaining: 1m 57s
> Polishing 45% complete (605/1342).  Estimated time remaining: 1m 45s
> Polishing 48% complete (650/1342).  Estimated time remaining: 1m 39s
> Polishing 54% complete (725/1342).  Estimated time remaining: 1m 27s
> Polishing 60% complete (812/1342).  Estimated time remaining: 1m 16s
> Polishing 68% complete (913/1342).  Estimated time remaining: 59s
> Polishing 73% complete (984/1342).  Estimated time remaining: 50s
> Polishing 75% complete (1015/1342).  Estimated time remaining: 47s
> Polishing 77% complete (1036/1342).  Estimated time remaining: 43s
> Polishing 78% complete (1058/1342).  Estimated time remaining: 41s
> Polishing 84% complete (1138/1342).  Estimated time remaining: 30s
> Polishing 92% complete (1235/1342).  Estimated time remaining: 14s
> Starting merge
> Merging results from 1342 chunks.
> Merging took 7s
> Merge cleanup took 0s
  Separated reads with divisions: H1 475116, H2 453908, and H0 159194
> Wrote haplotyped bams in 1m 43s
> Finished phasing in 18m 46s.

real	18m47.373s
user	245m51.236s
sys	2m8.903s
mv: '/cromwell_root/pepper_output/MARGIN_PHASED.PEPPER_SNP_MARGIN.haplotagged.bam' and '/cromwell_root/pepper_output/MARGIN_PHASED.PEPPER_SNP_MARGIN.haplotagged.bam' are the same file
[11-03-2021 14:13:04] INFO: [5/9] RUNNING THE FOLLOWING COMMAND
-------
time pepper_hp call_variant -b /cromwell_root/pepper_output/MARGIN_PHASED.PEPPER_SNP_MARGIN.haplotagged.bam -f /cromwell_root/broad-dsde-methods-long-reads/resources/references/grch38_noalt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa -t 64 -m /opt/pepper_models/PEPPER_HP_R941_ONT_V4.pkl -o /cromwell_root/pepper_output/pepper_hp/ -s 6061-SL-0029 -w 4 -bs 64 --ont 2>&1 | tee /cromwell_root/pepper_output/logs/3_pepper_hp.log
-------
[11-03-2021 14:13:05] INFO: CALL VARIANT MODULE SELECTED
[11-03-2021 14:13:05] INFO: ONT VARIANT CALLING PROFILE SET.
[11-03-2021 14:13:05] INFO: RUN-ID: 11032021_141305
[11-03-2021 14:13:05] INFO: IMAGE OUTPUT: /cromwell_root/pepper_output/pepper_hp/images_11032021_141305/
[11-03-2021 14:13:05] STEP 1: GENERATING IMAGES:
[11-03-2021 14:13:05] INFO: COMMON CONTIGS FOUND: ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrM', 'chrX', 'chrY']
[11-03-2021 14:13:05] INFO: TOTAL CONTIGS: 25 TOTAL INTERVALS: 30895 TOTAL BASES: 3094460376
[11-03-2021 14:13:05] STARTING THREAD: 0 FOR 483 INTERVALS
[11-03-2021 14:13:05] INFO: [THREAD 00] 10/483 COMPLETE (2%) [ELAPSED TIME: 0 Min 0 Sec]
[11-03-2021 14:13:06] INFO: [THREAD 00] 20/483 COMPLETE (4%) [ELAPSED TIME: 0 Min 0 Sec]
[11-03-2021 14:13:06] INFO: [THREAD 00] 30/483 COMPLETE (6%) [ELAPSED TIME: 0 Min 0 Sec]
[11-03-2021 14:13:06] INFO: [THREAD 00] 40/483 COMPLETE (8%) [ELAPSED TIME: 0 Min 0 Sec]
[11-03-2021 14:13:06] INFO: [THREAD 00] 50/483 COMPLETE (10%) [ELAPSED TIME: 0 Min 0 Sec]
[11-03-2021 14:13:06] INFO: [THREAD 00] 60/483 COMPLETE (12%) [ELAPSED TIME: 0 Min 0 Sec]
[11-03-2021 14:13:06] INFO: [THREAD 00] 70/483 COMPLETE (14%) [ELAPSED TIME: 0 Min 0 Sec]
[11-03-2021 14:13:06] INFO: [THREAD 00] 80/483 COMPLETE (16%) [ELAPSED TIME: 0 Min 0 Sec]
[11-03-2021 14:13:06] INFO: [THREAD 00] 90/483 COMPLETE (18%) [ELAPSED TIME: 0 Min 0 Sec]
[11-03-2021 14:13:06] INFO: [THREAD 00] 100/483 COMPLETE (20%) [ELAPSED TIME: 0 Min 0 Sec]
[11-03-2021 14:13:06] INFO: [THREAD 00] 110/483 COMPLETE (22%) [ELAPSED TIME: 0 Min 0 Sec]
[11-03-2021 14:13:06] INFO: [THREAD 00] 120/483 COMPLETE (24%) [ELAPSED TIME: 0 Min 0 Sec]
[11-03-2021 14:13:06] INFO: [THREAD 00] 130/483 COMPLETE (26%) [ELAPSED TIME: 0 Min 0 Sec]
[11-03-2021 14:13:06] INFO: [THREAD 00] 140/483 COMPLETE (28%) [ELAPSED TIME: 0 Min 0 Sec]
[11-03-2021 14:13:06] INFO: [THREAD 00] 150/483 COMPLETE (31%) [ELAPSED TIME: 0 Min 0 Sec]
[11-03-2021 14:13:06] INFO: [THREAD 00] 160/483 COMPLETE (33%) [ELAPSED TIME: 0 Min 0 Sec]
[11-03-2021 14:13:06] INFO: [THREAD 00] 170/483 COMPLETE (35%) [ELAPSED TIME: 0 Min 0 Sec]
[11-03-2021 14:13:06] INFO: [THREAD 00] 180/483 COMPLETE (37%) [ELAPSED TIME: 0 Min 0 Sec]
[11-03-2021 14:13:06] INFO: [THREAD 00] 190/483 COMPLETE (39%) [ELAPSED TIME: 0 Min 0 Sec]
[11-03-2021 14:13:06] INFO: [THREAD 00] 200/483 COMPLETE (41%) [ELAPSED TIME: 0 Min 1 Sec]
[11-03-2021 14:13:07] INFO: [THREAD 00] 210/483 COMPLETE (43%) [ELAPSED TIME: 0 Min 1 Sec]
[11-03-2021 14:13:07] INFO: [THREAD 00] 220/483 COMPLETE (45%) [ELAPSED TIME: 0 Min 1 Sec]
[11-03-2021 14:13:07] INFO: [THREAD 00] 230/483 COMPLETE (47%) [ELAPSED TIME: 0 Min 1 Sec]
[11-03-2021 14:13:07] INFO: [THREAD 00] 240/483 COMPLETE (49%) [ELAPSED TIME: 0 Min 1 Sec]
[11-03-2021 14:13:07] INFO: [THREAD 00] 250/483 COMPLETE (51%) [ELAPSED TIME: 0 Min 1 Sec]
[11-03-2021 14:13:07] INFO: [THREAD 00] 260/483 COMPLETE (53%) [ELAPSED TIME: 0 Min 1 Sec]
[11-03-2021 14:14:45] INFO: [THREAD 00] 270/483 COMPLETE (55%) [ELAPSED TIME: 1 Min 39 Sec]
[11-03-2021 14:16:39] INFO: [THREAD 00] 280/483 COMPLETE (57%) [ELAPSED TIME: 3 Min 33 Sec]
[11-03-2021 14:17:02] INFO: [THREAD 00] 290/483 COMPLETE (60%) [ELAPSED TIME: 3 Min 56 Sec]
[11-03-2021 14:17:02] INFO: [THREAD 00] 300/483 COMPLETE (62%) [ELAPSED TIME: 3 Min 56 Sec]
[11-03-2021 14:17:02] INFO: [THREAD 00] 310/483 COMPLETE (64%) [ELAPSED TIME: 3 Min 56 Sec]
[11-03-2021 14:17:02] INFO: [THREAD 00] 320/483 COMPLETE (66%) [ELAPSED TIME: 3 Min 56 Sec]
[11-03-2021 14:17:02] INFO: [THREAD 00] 330/483 COMPLETE (68%) [ELAPSED TIME: 3 Min 56 Sec]
[11-03-2021 14:17:02] INFO: [THREAD 00] 340/483 COMPLETE (70%) [ELAPSED TIME: 3 Min 56 Sec]
[11-03-2021 14:17:10] INFO: [THREAD 00] 350/483 COMPLETE (72%) [ELAPSED TIME: 4 Min 4 Sec]
[11-03-2021 14:17:10] INFO: [THREAD 00] 360/483 COMPLETE (74%) [ELAPSED TIME: 4 Min 4 Sec]
[11-03-2021 14:17:10] INFO: [THREAD 00] 370/483 COMPLETE (76%) [ELAPSED TIME: 4 Min 4 Sec]
[11-03-2021 14:17:10] INFO: [THREAD 00] 380/483 COMPLETE (78%) [ELAPSED TIME: 4 Min 4 Sec]
[11-03-2021 14:17:10] INFO: [THREAD 00] 390/483 COMPLETE (80%) [ELAPSED TIME: 4 Min 4 Sec]
[11-03-2021 14:17:10] INFO: [THREAD 00] 400/483 COMPLETE (82%) [ELAPSED TIME: 4 Min 4 Sec]
[11-03-2021 14:17:10] INFO: [THREAD 00] 410/483 COMPLETE (84%) [ELAPSED TIME: 4 Min 4 Sec]
[11-03-2021 14:17:10] INFO: [THREAD 00] 420/483 COMPLETE (86%) [ELAPSED TIME: 4 Min 5 Sec]
[11-03-2021 14:17:10] INFO: [THREAD 00] 430/483 COMPLETE (89%) [ELAPSED TIME: 4 Min 5 Sec]
[11-03-2021 14:17:11] INFO: [THREAD 00] 440/483 COMPLETE (91%) [ELAPSED TIME: 4 Min 5 Sec]
[11-03-2021 14:17:11] INFO: [THREAD 00] 450/483 COMPLETE (93%) [ELAPSED TIME: 4 Min 5 Sec]
[11-03-2021 14:17:11] INFO: [THREAD 00] 460/483 COMPLETE (95%) [ELAPSED TIME: 4 Min 5 Sec]
[11-03-2021 14:17:11] INFO: [THREAD 00] 470/483 COMPLETE (97%) [ELAPSED TIME: 4 Min 5 Sec]
[11-03-2021 14:17:11] INFO: [THREAD 00] 480/483 COMPLETE (99%) [ELAPSED TIME: 4 Min 5 Sec]
[11-03-2021 14:17:11] INFO: THREAD 0 FINISHED SUCCESSFULLY.
[11-03-2021 14:18:09] INFO: FINISHED IMAGE GENERATION
[11-03-2021 14:18:09] INFO: ELAPSED TIME: 5 Min 4 Sec
[11-03-2021 14:18:09] STEP 2: RUNNING INFERENCE
[11-03-2021 14:18:09] INFO: OUTPUT: /cromwell_root/pepper_output/pepper_hp/predictions_11032021_141305/
[11-03-2021 14:18:09] INFO: DISTRIBUTED CPU SETUP.
[11-03-2021 14:18:09] INFO: TOTAL CALLERS: 64
[11-03-2021 14:18:09] INFO: THREADS PER CALLER: 1
[11-03-2021 14:18:09] INFO: MODEL LOADING TO ONNX
[11-03-2021 14:19:18] INFO: BATCHES PROCESSED 5/66.
[11-03-2021 14:20:22] INFO: BATCHES PROCESSED 10/66.
[11-03-2021 14:21:25] INFO: BATCHES PROCESSED 15/66.
[11-03-2021 14:22:29] INFO: BATCHES PROCESSED 20/66.
[11-03-2021 14:23:34] INFO: BATCHES PROCESSED 25/66.
[11-03-2021 14:24:40] INFO: BATCHES PROCESSED 30/66.
[11-03-2021 14:25:45] INFO: BATCHES PROCESSED 35/66.
[11-03-2021 14:26:47] INFO: BATCHES PROCESSED 40/66.
[11-03-2021 14:27:51] INFO: BATCHES PROCESSED 45/66.
[11-03-2021 14:28:56] INFO: BATCHES PROCESSED 50/66.
[11-03-2021 14:29:59] INFO: BATCHES PROCESSED 55/66.
[11-03-2021 14:31:03] INFO: BATCHES PROCESSED 60/66.
[11-03-2021 14:31:57] INFO: BATCHES PROCESSED 65/66.
[11-03-2021 14:31:05] INFO: THREAD 46 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:07] INFO: THREAD 19 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:09] INFO: THREAD 38 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:19] INFO: THREAD 3 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:22] INFO: THREAD 15 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:24] INFO: THREAD 21 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:25] INFO: THREAD 61 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:26] INFO: THREAD 42 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:29] INFO: THREAD 62 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:31] INFO: THREAD 40 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:32] INFO: THREAD 14 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:38] INFO: THREAD 39 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:39] INFO: THREAD 36 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:39] INFO: THREAD 44 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:39] INFO: THREAD 31 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:39] INFO: THREAD 55 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:40] INFO: THREAD 4 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:40] INFO: THREAD 33 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:41] INFO: THREAD 2 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:43] INFO: THREAD 24 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:44] INFO: THREAD 34 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:46] INFO: THREAD 53 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:46] INFO: THREAD 12 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:46] INFO: THREAD 48 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:47] INFO: THREAD 26 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:48] INFO: THREAD 63 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:48] INFO: THREAD 10 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:48] INFO: THREAD 6 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:48] INFO: THREAD 51 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:48] INFO: THREAD 8 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:49] INFO: THREAD 56 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:49] INFO: THREAD 16 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:50] INFO: THREAD 7 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:50] INFO: THREAD 37 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:50] INFO: THREAD 20 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:50] INFO: THREAD 9 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:51] INFO: THREAD 49 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:51] INFO: THREAD 5 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:51] INFO: THREAD 54 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:51] INFO: THREAD 22 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:51] INFO: THREAD 35 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:52] INFO: THREAD 28 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:52] INFO: THREAD 58 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:52] INFO: THREAD 41 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:52] INFO: THREAD 45 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:53] INFO: THREAD 18 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:53] INFO: THREAD 17 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:54] INFO: THREAD 60 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:55] INFO: THREAD 50 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:57] INFO: THREAD 25 FINISHED SUCCESSFULLY.
[11-03-2021 14:31:58] INFO: THREAD 0 FINISHED SUCCESSFULLY.
[11-03-2021 14:32:00] INFO: THREAD 52 FINISHED SUCCESSFULLY.
[11-03-2021 14:32:00] INFO: THREAD 32 FINISHED SUCCESSFULLY.
[11-03-2021 14:32:01] INFO: THREAD 23 FINISHED SUCCESSFULLY.
[11-03-2021 14:32:03] INFO: THREAD 30 FINISHED SUCCESSFULLY.
[11-03-2021 14:32:03] INFO: THREAD 43 FINISHED SUCCESSFULLY.
[11-03-2021 14:32:03] INFO: THREAD 11 FINISHED SUCCESSFULLY.
[11-03-2021 14:32:04] INFO: THREAD 1 FINISHED SUCCESSFULLY.
[11-03-2021 14:32:07] INFO: THREAD 13 FINISHED SUCCESSFULLY.
[11-03-2021 14:32:07] INFO: THREAD 59 FINISHED SUCCESSFULLY.
[11-03-2021 14:32:12] INFO: THREAD 57 FINISHED SUCCESSFULLY.
[11-03-2021 14:32:20] INFO: THREAD 47 FINISHED SUCCESSFULLY.
[11-03-2021 14:32:23] INFO: THREAD 27 FINISHED SUCCESSFULLY.
[11-03-2021 14:32:23] INFO: THREAD 29 FINISHED SUCCESSFULLY.
[11-03-2021 14:32:23] INFO: FINISHED PREDICTION
[11-03-2021 14:32:23] INFO: ELAPSED TIME: 14 Min 14 Sec
[11-03-2021 14:32:23] INFO: PREDICTION FINISHED SUCCESSFULLY. 
[11-03-2021 14:32:23] TOTAL ELAPSED TIME FOR INFERENCE: 14 Min 14 Sec
[11-03-2021 14:32:23] STEP 3.1: CALLING VARIANTS
[11-03-2021 14:32:23] INFO: OUTPUT: /cromwell_root/pepper_output/pepper_hp/
[11-03-2021 14:32:23] INFO: PROCESSING HAPLOTAG: 0
[11-03-2021 14:32:24] INFO: PROCESSING CONTIG: chr10
[11-03-2021 14:39:30] INFO: FINISHED PROCESSING chr10, TOTAL CANDIDATES FOUND: 378085 TOTAL TIME SPENT: 7 Min 6 Sec
[11-03-2021 14:39:37] INFO: PROCESSING CONTIG: chr14
[11-03-2021 14:39:48] INFO: FINISHED PROCESSING chr14, TOTAL CANDIDATES FOUND: 2550 TOTAL TIME SPENT: 0 Min 11 Sec
[11-03-2021 14:39:49] TOTAL ELAPSED TIME FOR VARIANT CALLING: 26 Min 43 Sec

real	26m44.555s
user	1220m53.668s
sys	15m35.409s
[11-03-2021 14:39:49] INFO: [6/9] RUNNING THE FOLLOWING COMMAND
-------
mv /cromwell_root/pepper_output/pepper_hp/*.vcf /cromwell_root/pepper_output/PEPPER_HP_OUPUT.vcf; 
bgzip /cromwell_root/pepper_output/PEPPER_HP_OUPUT.vcf; 
tabix -p vcf /cromwell_root/pepper_output/PEPPER_HP_OUPUT.vcf.gz; 
rm -rf /cromwell_root/pepper_output/pepper_hp/
-------
[11-03-2021 14:39:53] INFO: [7/9] RUNNING THE FOLLOWING COMMAND
-------
mkdir -p /cromwell_root/pepper_output/dv_intermediate_outputs/; 
echo "STARTING DEEPVARIANT"; 
time /opt/deepvariant/bin/run_deepvariant --model_type=WGS --customized_model=/opt/dv_models/ont_1121_none/model.ckpt-30200 --ref=/cromwell_root/broad-dsde-methods-long-reads/resources/references/grch38_noalt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa --reads=/cromwell_root/pepper_output/MARGIN_PHASED.PEPPER_SNP_MARGIN.haplotagged.bam --output_vcf=/cromwell_root/pepper_output/T708322218_ONT.10_14-p.deepvariant_pepper.vcf.gz --output_gvcf=/cromwell_root/pepper_output/T708322218_ONT.10_14-p.deepvariant_pepper.g.vcf.gz --sample_name="6061-SL-0029" --intermediate_results_dir=/cromwell_root/pepper_output/dv_intermediate_outputs/ --num_shards=64 --make_examples_extra_args="alt_aligned_pileup=none,realign_reads=false,min_mapping_quality=1,min_base_quality=1,sort_by_haplotypes=true,parse_sam_aux_fields=true,add_hp_channel=false,variant_caller=vcf_candidate_importer,proposed_variants=/cromwell_root/pepper_output/PEPPER_HP_OUPUT.vcf.gz" --postprocess_variants_extra_args="use_multiallelic_model=True" 2>&1 | tee /cromwell_root/pepper_output/logs/4_DeepVariant.log
-------
STARTING DEEPVARIANT
I1103 14:39:53.527210 140335058065216 run_deepvariant.py:317] Re-using the directory for intermediate results in /cromwell_root/pepper_output/dv_intermediate_outputs/
I1103 14:39:53.527496 140335058065216 run_deepvariant.py:327] You set --customized_model. Instead of using the default model for WGS, `call_variants` step will load /opt/dv_models/ont_1121_none/model.ckpt-30200 instead.

***** Intermediate results will be written to /cromwell_root/pepper_output/dv_intermediate_outputs/ in docker. ****


***** Running the command:*****
( time seq 0 63 | parallel -q --halt 2 --line-buffer /opt/deepvariant/bin/make_examples --mode calling --ref "/cromwell_root/broad-dsde-methods-long-reads/resources/references/grch38_noalt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa" --reads "/cromwell_root/pepper_output/MARGIN_PHASED.PEPPER_SNP_MARGIN.haplotagged.bam" --examples "/cromwell_root/pepper_output/dv_intermediate_outputs/make_examples.tfrecord@64.gz" --noadd_hp_channel --alt_aligned_pileup "none" --gvcf "/cromwell_root/pepper_output/dv_intermediate_outputs/gvcf.tfrecord@64.gz" --min_base_quality "1" --min_mapping_quality "1" --parse_sam_aux_fields --proposed_variants "/cromwell_root/pepper_output/PEPPER_HP_OUPUT.vcf.gz" --norealign_reads --sample_name "6061-SL-0029" --sort_by_haplotypes --variant_caller "vcf_candidate_importer" --task {} )

2021-11-03 14:39:57.038890: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038671: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039009: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039125: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039103: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039171: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038585: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039212: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038973: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039149: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039139: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039161: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038504: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039179: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038426: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038418: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038521: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039180: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039029: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039093: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038864: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038422: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039193: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039132: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038424: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039212: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039124: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039190: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039079: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039113: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039104: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038431: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038896: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038989: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039093: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039179: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038440: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039229: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039085: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039110: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039153: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038776: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039068: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038662: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039145: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039180: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038534: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038429: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038484: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038992: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038806: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038422: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038642: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038431: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039100: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038418: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038415: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038421: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039192: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038952: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039108: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.038426: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039091: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 14:39:57.039235: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
I1103 14:40:01.136240 140282324772672 make_examples_core.py:163] Task 57/64: Preparing inputs
....
I1103 14:53:05.451765 139769163196224 make_examples_core.py:163] Task 25/64: 2313 candidates (2405 examples) [0.99s elapsed]
I1103 14:53:04.698359 140152994068288 make_examples_core.py:163] Task 4/64: 2129 candidates (2243 examples) [13.75s elapsed]
I1103 14:53:05.625952 139672941975360 make_examples_core.py:163] Task 52/64: 2418 candidates (2540 examples) [10.88s elapsed]
I1103 14:53:05.330440 140126785840960 make_examples_core.py:163] Task 16/64: 2220 candidates (2282 examples) [1.44s elapsed]
I1103 14:53:05.493952 140529397729088 make_examples_core.py:163] Task 44/64: 2443 candidates (2523 examples) [2.77s elapsed]
I1103 14:53:06.154630 140529397729088 make_examples_core.py:163] Task 44/64: 2510 candidates (2594 examples) [0.66s elapsed]
I1103 14:53:06.968048 140435694630720 make_examples_core.py:163] Task 26/64: 2303 candidates (2427 examples) [4.70s elapsed]
I1103 14:53:07.275555 139714691082048 make_examples_core.py:163] Task 53/64: 2400 candidates (2566 examples) [15.76s elapsed]
I1103 14:53:07.719906 140657934407488 make_examples_core.py:163] Task 27/64: 2739 candidates (3035 examples) [5.45s elapsed]
I1103 14:53:07.775277 140126785840960 make_examples_core.py:163] Task 16/64: 2308 candidates (2374 examples) [2.44s elapsed]
I1103 14:53:08.681667 139823122659136 make_examples_core.py:163] Task 45/64: 2652 candidates (2750 examples) [5.88s elapsed]
I1103 14:53:08.499621 140345388750656 make_examples_core.py:163] Task 50/64: 2517 candidates (2651 examples) [4.04s elapsed]
I1103 14:53:08.077846 139826026686272 make_examples_core.py:163] Task 55/64: 2412 candidates (2556 examples) [8.96s elapsed]
I1103 14:53:08.165700 140447748351808 make_examples_core.py:163] Task 29/64: 2805 candidates (2883 examples) [2.81s elapsed]
I1103 14:53:08.086294 140152994068288 make_examples_core.py:163] Task 4/64: 2265 candidates (2381 examples) [3.39s elapsed]
I1103 14:53:08.115124 140349764978496 make_examples_core.py:163] Task 58/64: 2401 candidates (2511 examples) [13.20s elapsed]
I1103 14:53:07.834557 140529397729088 make_examples_core.py:163] Task 44/64: 2614 candidates (2702 examples) [1.68s elapsed]
I1103 14:53:08.208366 140388734826304 make_examples_core.py:163] Task 13/64: 2206 candidates (2302 examples) [8.06s elapsed]
# the program died here

For one failed task, the input BAM size is 19GB, and allocated disk size is 300GB.

Does the quick start test work on your system?

Some inputs finish, while others fail using the exact same workflow (PAPI error 10), so it's unlikely to be a coding issue.

Any additional context:

We have successful runs with inputs of similar sizes that failed with PAPI 10. So I'm wondering if there's an empirical formula for predicting disk space usage.

Additionally, is there a way to make DV less verbose? The log file goes to hundreds of MB, which makes debugging less easy.

Thanks!
Steve

The text was updated successfully, but these errors were encountered:

pichuan · 2021-11-03T20:13:11Z

@akolesnikov on our team will look at this and reply later.
Also tagging @kishwarshafin in case some of these are PEPPER specific.

SHuang-Broad · 2021-11-03T20:59:55Z

Thanks you!

An update: I've allocated 500GB disk and am still seeing PAPI 10, so it would be great if can get an understanding of this.

kishwarshafin · 2021-11-03T21:05:21Z

@SHuang-Broad , are you using a BAM file that is publicly available?

SHuang-Broad · 2021-11-03T21:17:37Z

Unfortunately, it's not...

Providing what I can:
It's a high coverage sample (~72X). We subset the BAM into 21 regions that are roughly equal in size (approximately 160-200M bases), three of them are showing this behavior:

chr10+chr14-p
chr4-p+chr5-p_chr11-p
chr7-q+chr16-q

all three BAMs are in the range between 15-20GB.

I am actually suspecting that somewhere in these BAMs, there are stupidly crazy high coverage regions, and that's triggering something in DV to generate loads of temporary files.

kishwarshafin · 2021-11-03T21:27:42Z

@SHuang-Broad , yes, I'd suggest subsampling the BAM file with mapping quality of Q10 or Q20 if coverage is that high and then try again. I think the pipeline is getting stuck at the centromere. Do you happen to know the read N50?

SHuang-Broad · 2021-11-03T21:30:06Z

Good to know that sub-sampling trick.
The N50 is 15695, which doesn't look abnormal.

pgrosu · 2021-11-03T21:37:18Z

@SHuang-Broad It could be disk limit issue, but I have a suspicion it might be memory related. Try to launching the free command in the background on some interval like 60 sec, before all the deepvariant ones:

free -s 60 > dv_mem_usage.txt

Then launch deepvariant in on the same machine from another terminal. Once you get the failure stop the logging from the command above. And look to see if memory has become exhausted. The above command can also be launched in parallel as part of the submitted command if that makes it easier.

If that's the case, then just increase the memory requirements for the job.

Hope it helps,
Paul

SHuang-Broad · 2021-11-03T21:53:22Z

Thanks, Paul!

I do indeed sometimes see out-of-memory errors (e.g. for an even crazier sample with 140X coverage), but usually I get an PAPI error code 9 from that (based on experience, error code 9 indicates memory issues and error code 10 indicates disk issue, PAPI 10 is sort of an catch-all error code so it's not very informative).

Never the less, the procedure you described is very helpful and I'll

try with increased the memory
incorporate that into our workflow (conditionally, e.g. for high coverage samples), and
also use that for collecting data in tuning the resources allocated to DV.

Thanks!
Steve

pgrosu · 2021-11-03T22:00:48Z

Hi Steve,

Sounds good. You can also try use the df command if you want to monitor disk usage just in case, though might want to use it in conjunction with grep to filter for just the disk that affected :)

Hope it helps,
~p

SHuang-Broad · 2021-11-04T13:20:27Z

An update:

I've tried increasing both memory and disk, and it has worked!
Sorry that I haven't tried increasing either, as I was under some time pressure.

Given that DV-Pepper pipeline is a non-trivial part of the pipeline I'm building, I'll dig deeper with both @kishwarshafin and Paul's suggestions.

For the moment, when I initially prototyped the pipeline using a normal sample's ONT data, this is what I found:

pgrosu · 2021-11-05T00:57:43Z

@SHuang-Broad Glad to hear it worked, and thank you for the nice visualization! Docker has multiple layers and DV expands with its own within them, which is something we've noticed with other folks in terms of resource requirements -- which you could sort of tell from the memory/cpu utilization profiles. You might be able to profile it more granularly, but it might easier to start with some simple input files, and maybe a modified version of DV where you add debug information in the code to get an idea of the points of memory/disk expansion. This way you can trace the code-execution with correlated flow of data-processing.

pichuan · 2021-11-11T18:26:27Z

Hi @SHuang-Broad , thanks for bringing up the issue about the logging file size.
I've made two changes internally (which will come out in the next release):

make_examples: I changed this line https://github.com/google/deepvariant/blob/r1.2/deepvariant/make_examples_options.py#L221 to 2000 instead.
call_variants: I changed this line https://github.com/google/deepvariant/blob/r1.2/deepvariant/call_variants.py#L70 to 50000 instead.

In a quick test on a WGS BAM, both will roughly be 1-2min intervals, which hopefully is more reasonable. Before release, I'll check the size of the log files as well.

SHuang-Broad · 2022-01-04T01:46:04Z

@pichuan thanks for the work!
I can confirm that the log file is indeed much smaller now with v1.3.0.
And it's running faster.
Thank you!

pichuan · 2022-01-18T21:40:44Z

I'm closing this issue now. I don't think we have a very easy solution for the original question in the title, but hopefully this thread is still useful.
Feel free to open if you have more questions.

pichuan assigned akolesnikov Nov 3, 2021

pichuan closed this as completed Jan 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empirical formula predicting disk space usage? #491

Empirical formula predicting disk space usage? #491

SHuang-Broad commented Nov 3, 2021

pichuan commented Nov 3, 2021

SHuang-Broad commented Nov 3, 2021

kishwarshafin commented Nov 3, 2021

SHuang-Broad commented Nov 3, 2021

kishwarshafin commented Nov 3, 2021 •

edited

Loading

SHuang-Broad commented Nov 3, 2021

pgrosu commented Nov 3, 2021

SHuang-Broad commented Nov 3, 2021

pgrosu commented Nov 3, 2021

SHuang-Broad commented Nov 4, 2021

pgrosu commented Nov 5, 2021

pichuan commented Nov 11, 2021

SHuang-Broad commented Jan 4, 2022

pichuan commented Jan 18, 2022

Empirical formula predicting disk space usage? #491

Empirical formula predicting disk space usage? #491

Comments

SHuang-Broad commented Nov 3, 2021

pichuan commented Nov 3, 2021

SHuang-Broad commented Nov 3, 2021

kishwarshafin commented Nov 3, 2021

SHuang-Broad commented Nov 3, 2021

kishwarshafin commented Nov 3, 2021 • edited Loading

SHuang-Broad commented Nov 3, 2021

pgrosu commented Nov 3, 2021

SHuang-Broad commented Nov 3, 2021

pgrosu commented Nov 3, 2021

SHuang-Broad commented Nov 4, 2021

pgrosu commented Nov 5, 2021

pichuan commented Nov 11, 2021

SHuang-Broad commented Jan 4, 2022

pichuan commented Jan 18, 2022

kishwarshafin commented Nov 3, 2021 •

edited

Loading