Investigate and address memory leak in gCNV. #5764

samuelklee · 2019-03-06T06:20:50Z

There appears to be a memory leak in gCNV coming from Theano 0.9.0, possibly fixed in Theano/Theano#5832. A few possible fixes:

Update Theano to the latest 1.0.4 version. I've tried this and it looks like the leak goes away. Need to confirm reproducibility of results between versions, see also Check reproducibility in GermlineCNVCaller step. #5730.
Configure Theano 0.9.0 to use MKL, rather than OpenBLAS. It appears the leak is only an issue with the latter. This is a little more complicated, since I now realize that MKL is not actually fully utilized (if at all) in our conda environment. For example, we pip install numpy, rather than conda install a version from the default channel that is compiled against MKL. So we'd need to change a few dependencies in the environment which might have implications for VQSR-CNN. See also gCNV python env fails on mac #4074. @lucidtronix any thoughts? @jamesemery and @cmnbroad might also be interested, as this could have pretty drastic implications for the size of the python dependencies---if we go with option 1, we might be able to get rid of MKL, etc.

Not sure if the memory leak manifests the same across all architectures. Note that I believe this is a separate issue from #5714.

The text was updated successfully, but these errors were encountered:

samuelklee · 2019-03-06T11:34:46Z

With Theano 1.0.4, a shard of 50 samples x 20000 intervals never exceeds more than ~4GB python memory utilization and is pretty constant over the course of the run. I think the VMs we have typically been using for 200x5000 shards are rather oversized, or we could squeeze many more intervals in each.

samuelklee · 2019-03-06T15:47:37Z

I was able to run a 50x100000 shard on my desktop with 32GB. Note that we typically run 200x5000 with 30GB n1-standard-8s, so this represents roughly a factor of 5 improvement.

Memory usage during the run (which I arbitrarily limited to a small number of iterations, which shouldn't affect memory) was as follows:

The slightly higher and lower plateaus occur during denoising + sampling and calling epochs, respectively. There are two spikes up to 30GB that occur at the beginning of sampling of the denoised copy ratios. If we could get rid of these spikes, it probably would have been possible to run the entire exome. See related issue #5754.

An additional thing I haven't investigated yet is whether Theano precompilation affects usage. This run included compilation.

Not sure how inference and the required number of iterations to converge scale with shard size, but we can investigate this separately. It's possible that intermediate-sized shards converge quicker and would require less time spent carrying memory costs.

@mwalker174 @vruano @asmirnov239 please take note.

samuelklee · 2019-03-06T15:54:38Z

Forgot to mention that this is all in cohort mode, so it's conceivable that case mode could be even cheaper and that we could run more samples in each shard. Not sure if the denoising-model parameters are fixed in a way that reduces memory usage, but this should be easy enough to check.

samuelklee · 2019-03-06T18:07:15Z

gCNV runs with (Theano 0.9.0 + OpenBLAS) vs. (Theano 1.0.4 + MKL + MKL numpy) are typcially consistent up to 12+ decimal places, oftentimes more. So I'm OK with making this change.

@lucidtronix any concerns about switching to MKL numpy?

samuelklee · 2019-03-06T19:11:22Z

For reference, plots of memory usage for a 50x10000 shard:

It appears that the 0.9.0 leak only occurs during denoising + sampling.

Not sure if differences in runtime are indicative of OpenBLAS vs. MKL performance or within the noise.

samuelklee added Germline CNV Copy Number tools labels Mar 6, 2019

samuelklee mentioned this issue Mar 6, 2019

Check reproducibility in GermlineCNVCaller step. #5730

Closed

samuelklee mentioned this issue Mar 11, 2019

Improved memory usage in gCNV. #5781

Merged

samuelklee closed this as completed in #5781 Mar 18, 2019

samuelklee mentioned this issue Mar 16, 2020

Upgrade numpy to version 1.17.0 #6494

Closed

samuelklee mentioned this issue Sep 20, 2021

Expose number of samples for emitting denoised copy ratios in gCNV. #5754

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate and address memory leak in gCNV. #5764

Investigate and address memory leak in gCNV. #5764

samuelklee commented Mar 6, 2019 •

edited

Loading

samuelklee commented Mar 6, 2019 •

edited

Loading

samuelklee commented Mar 6, 2019 •

edited

Loading

samuelklee commented Mar 6, 2019 •

edited

Loading

samuelklee commented Mar 6, 2019

samuelklee commented Mar 6, 2019 •

edited

Loading

Investigate and address memory leak in gCNV. #5764

Investigate and address memory leak in gCNV. #5764

Comments

samuelklee commented Mar 6, 2019 • edited Loading

samuelklee commented Mar 6, 2019 • edited Loading

samuelklee commented Mar 6, 2019 • edited Loading

samuelklee commented Mar 6, 2019 • edited Loading

samuelklee commented Mar 6, 2019

samuelklee commented Mar 6, 2019 • edited Loading

samuelklee commented Mar 6, 2019 •

edited

Loading

samuelklee commented Mar 6, 2019 •

edited

Loading

samuelklee commented Mar 6, 2019 •

edited

Loading

samuelklee commented Mar 6, 2019 •

edited

Loading

samuelklee commented Mar 6, 2019 •

edited

Loading