Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate and address memory leak in gCNV. #5764

Closed
samuelklee opened this issue Mar 6, 2019 · 5 comments
Closed

Investigate and address memory leak in gCNV. #5764

samuelklee opened this issue Mar 6, 2019 · 5 comments

Comments

@samuelklee
Copy link
Contributor

samuelklee commented Mar 6, 2019

There appears to be a memory leak in gCNV coming from Theano 0.9.0, possibly fixed in Theano/Theano#5832. A few possible fixes:

  1. Update Theano to the latest 1.0.4 version. I've tried this and it looks like the leak goes away. Need to confirm reproducibility of results between versions, see also Check reproducibility in GermlineCNVCaller step. #5730.
  2. Configure Theano 0.9.0 to use MKL, rather than OpenBLAS. It appears the leak is only an issue with the latter. This is a little more complicated, since I now realize that MKL is not actually fully utilized (if at all) in our conda environment. For example, we pip install numpy, rather than conda install a version from the default channel that is compiled against MKL. So we'd need to change a few dependencies in the environment which might have implications for VQSR-CNN. See also gCNV python env fails on mac #4074. @lucidtronix any thoughts? @jamesemery and @cmnbroad might also be interested, as this could have pretty drastic implications for the size of the python dependencies---if we go with option 1, we might be able to get rid of MKL, etc.

Not sure if the memory leak manifests the same across all architectures. Note that I believe this is a separate issue from #5714.

@samuelklee
Copy link
Contributor Author

samuelklee commented Mar 6, 2019

With Theano 1.0.4, a shard of 50 samples x 20000 intervals never exceeds more than ~4GB python memory utilization and is pretty constant over the course of the run. I think the VMs we have typically been using for 200x5000 shards are rather oversized, or we could squeeze many more intervals in each.

@samuelklee
Copy link
Contributor Author

samuelklee commented Mar 6, 2019

I was able to run a 50x100000 shard on my desktop with 32GB. Note that we typically run 200x5000 with 30GB n1-standard-8s, so this represents roughly a factor of 5 improvement.

Memory usage during the run (which I arbitrarily limited to a small number of iterations, which shouldn't affect memory) was as follows:

gcnv-100k-mem

The slightly higher and lower plateaus occur during denoising + sampling and calling epochs, respectively. There are two spikes up to 30GB that occur at the beginning of sampling of the denoised copy ratios. If we could get rid of these spikes, it probably would have been possible to run the entire exome. See related issue #5754.

An additional thing I haven't investigated yet is whether Theano precompilation affects usage. This run included compilation.

Not sure how inference and the required number of iterations to converge scale with shard size, but we can investigate this separately. It's possible that intermediate-sized shards converge quicker and would require less time spent carrying memory costs.

@mwalker174 @vruano @asmirnov239 please take note.

@samuelklee
Copy link
Contributor Author

samuelklee commented Mar 6, 2019

Forgot to mention that this is all in cohort mode, so it's conceivable that case mode could be even cheaper and that we could run more samples in each shard. Not sure if the denoising-model parameters are fixed in a way that reduces memory usage, but this should be easy enough to check.

@samuelklee
Copy link
Contributor Author

gCNV runs with (Theano 0.9.0 + OpenBLAS) vs. (Theano 1.0.4 + MKL + MKL numpy) are typcially consistent up to 12+ decimal places, oftentimes more. So I'm OK with making this change.

@lucidtronix any concerns about switching to MKL numpy?

@samuelklee
Copy link
Contributor Author

samuelklee commented Mar 6, 2019

For reference, plots of memory usage for a 50x10000 shard:

gcnv-10k-mem

It appears that the 0.9.0 leak only occurs during denoising + sampling.

Not sure if differences in runtime are indicative of OpenBLAS vs. MKL performance or within the noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant