-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive memory consumption #4
Comments
i think i ran into this issue, just pasting the stack trace below if it's useful.... it ran fine for a few hours though. wonder why it would break midway. is it at risk of running out of memory for larger audio files? W tensorflow/core/common_runtime/bfc_allocator.cc:270] ******************************________*************************************************xxxxxxxxx |
@ibab |
Yeah, this makes sense. I'd happily accept a PR for this. |
@jyegerlehner: Are you currently working on a fix? |
I've missed the fact that the number of channels in the residual blocks can be smaller than the number of quantization steps. This greatly reduces memory consumption and leads to much better convergence of the network. Thanks a lot to @jyegerlehner for pointing this out! See discussion in #4.
@ibab, I just got back to work on this; was planning to, but I see you've already done it! 👍 |
I changed the quantization levels to 16 running on a g2.2xlarge aws instance. Getting an OOM exception. Stacktrace: W tensorflow/core/common_runtime/bfc_allocator.cc:270] *****************************************************************************************xxxxxxxxxxx |
Have you modified the batch size? |
@lemonzi no the batch size is the default as mentioned in train.py: ie 1 |
@ansh7 Never mind then -- I saw a Tensor shape in the logs and had a hunch. |
lowering the sample rate also helps avoid memory issues. |
@ibab what did you originally train this on? I'm using a GTX Titan X and running out of memory. Is anyone having luck with lower quantization levels? |
There seems to be a weird issue with garbage collection when using the dilated convolutions. Theoretically, the dilated convolution should be just as fast, but the TensorFlow version is implemented by combining existing ops and I suspect it's not as efficient as the simple convolution. |
It looks like it's making a lot of assumptions about the data being 2D. We don't need this padding, maybe we should use What this line does is basically pad the height (which is 1 for audio) so that it is equal to the dilation, and this will be cropped back after the convolution to match the output padding. |
@polyrhythmatic
Edit: when I say "to completion", I mean for the default number of steps == 2000, which I had never been able to do before. Guesses/Speculation: |
Just came to the same realization as @lemonzi as to why It pads the height dimension so that Actually, we can't use What we can do instead is cut away the end of the tensor so that |
I've fixed the problem in 8add545. Judging from occasional garbage collection log messages, I think the issue mentioned by @jyegerlehner is also valid. It would probably make sense to cut inputs to a fixed size. |
@ibab Ran into the same issue with Titan X. Now, trying out your latest commit. Would you have any numbers to share about GPU used (I think you've mentioned K40c somewhere), time taken for convergence, and maybe a comment about quality of results you've seen? |
Would it be possible to post a link to a pre-trained model we can use? and a link to some example wav output(s)? |
I'm still in the process of finding good hyperparameters, and finding the cause of the generation issue in #13. |
Fixing the convolution op seems to have fixed the issue of easily running into OOM errors, so I'm closing this issue and opening another one on the fact that we might want to crop the samples to a fixed length. |
The network currently runs into out of memory issues at a low number of layers.
This seems to be a problem with TensorFlow's
atrous_conv2d
operation.If I set the dilation factor to
1
, which meansatrous_conv2d
simply callsconv2d
, I can easily run with 10s of layers.It could just be the additional
batch_to_space
andspace_to_batch
operations, in which case I can write a single C++ op foratrous_conv2d
.The text was updated successfully, but these errors were encountered: