Skip to content

Commit

Permalink
Added resource usage section to the GermlineCNVCaller java doc. (#8064)
Browse files Browse the repository at this point in the history
  • Loading branch information
asmirnov239 authored Mar 28, 2023
1 parent 148dd33 commit 4dc8e2a
Showing 1 changed file with 26 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,32 @@
* https://theano-pymc.readthedocs.io/en/latest/library/config.html</a>.
* </p>
*
* <h3>Resource usage</h3>
*
* <p>Runtime and memory usage for {@link GermlineCNVCaller} can be impacted by (1) the number of input samples, (2) the
* number of intervals, (3) the highest allowed copy-number state (set using the {@code max-copy-number} argument),
* (4) the number of bias factors (set using the {@code max-bias-factors} argument), and convergence criteria.</p>
*
* <p>We recommend running {@link GermlineCNVCaller} in COHORT mode for approximately 200 samples at a time, processing
* between 5k to 12.5k intervals, and {@code max-copy-number} set to 5 across all analyses. For 200 samples and
* 5k intervals, approximately 16GB of memory should be enough to optimize memory usage; for the same
* analysis at 12.5k intervals, we recommend 32GB of memory. Runtimes are on the order of a few hours.</p>
*
* <p>Note that {@link GermlineCNVCaller} can be run on larger interval sets by scattering them into smaller "shards."
* The shards can subsequently be merged together by {@link PostprocessGermlineCNVCalls} tool. In cloud
* and HPC environments, the tool can then process each shard in parallel within a single job.</p>
*
* <p>By default, {@link GermlineCNVCaller} will attempt to use all CPU cores accessible to it within the runtime
* environment. Two environment variables - <code>MKL_NUM_THREADS</code> and <code>OMP_NUM_THREADS</code> - control the
* parallelism of the underlying linear algebra libraries.</p>
*
* <p>Runtime is also affected by how fast the inference procedure converges. There are multiple tool arguments that can
* be used to set convergence criteria that could speed up this convergence, including but not limited to
* {@code caller-update-convergence-threshold}, {@code convergence-snr-averaging-window},
* {@code convergence-snr-countdown-window}, and {@code convergence-snr-trigger-threshold}. However, modifying these
* arguments from the default settings might affect the final results, so please exercise caution if
* including any of these arguments.</p>
*
* <h3>Tool run modes</h3>
* <dl>
* <dt>COHORT mode:</dt>
Expand Down

0 comments on commit 4dc8e2a

Please sign in to comment.