Excessive memory consumption #4

ibab · 2016-09-14T17:26:34Z

The network currently runs into out of memory issues at a low number of layers.
This seems to be a problem with TensorFlow's atrous_conv2d operation.
If I set the dilation factor to 1, which means atrous_conv2d simply calls conv2d, I can easily run with 10s of layers.
It could just be the additional batch_to_space and space_to_batch operations, in which case I can write a single C++ op for atrous_conv2d.

The text was updated successfully, but these errors were encountered:

genekogan · 2016-09-14T18:19:53Z

i think i ran into this issue, just pasting the stack trace below if it's useful.... it ran fine for a few hours though. wonder why it would break midway. is it at risk of running out of memory for larger audio files?

W tensorflow/core/common_runtime/bfc_allocator.cc:270] ******************************________*************************************************xxxxxxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 1.86GiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:936] Resource exhausted: OOM when allocating tensor with shape[256,256,1,7603]
E tensorflow/core/client/tensor_c_api.cc:485] OOM when allocating tensor with shape[256,1,7603,256]
[[Node: dilated_stack/layer4/conv_g = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](dilated_stack/layer4/conv_g/SpaceToBatch, dilated_stack/layer4/Variable_1/read)]]
[[Node: loss/Mean/_61 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_839_loss/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]
Storing checkpoint to ./model.ckpt
Loss: 79.2337722778
Loss: 210.806060791
Loss: 97.7300491333
Loss: 1392.23522949
Loss: 235.05305481
Loss: 31.506072998
Loss: 26.7505722046
Loss: 26.29870224
Loss: 9.93331241608
Loss: 6.45463895798
Loss: 6.28830051422
Loss: 6.08931684494
Loss: 5.92027044296
Loss: 5.75096750259
Loss: 5.67673921585
Loss: 5.60965442657
Loss: 5.58204841614
Loss: 5.57294654846
Loss: 5.57038593292
Loss: 5.55689048767
Loss: 5.56215429306
Loss: 5.55251312256
Loss: 5.56506061554
Loss: 5.55785942078
Loss: 5.54626560211
Traceback (most recent call last):
File "train.py", line 144, in
main()
File "train.py", line 129, in main
summary, loss_value, _ = sess.run([summaries, loss, optim])
File "/mnt/drive1/virtualenvs/caffe/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 382, in run
run_metadata_ptr)
File "/mnt/drive1/virtualenvs/caffe/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 655, in _run
feed_dict_string, options, run_metadata)
File "/mnt/drive1/virtualenvs/caffe/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 723, in _do_run
target_list, options, run_metadata)
File "/mnt/drive1/virtualenvs/caffe/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 743, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.ResourceExhaustedError: OOM when allocating tensor with shape[256,1,7603,256]
[[Node: dilated_stack/layer4/conv_g = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](dilated_stack/layer4/conv_g/SpaceToBatch, dilated_stack/layer4/Variable_1/read)]]
[[Node: loss/Mean/_61 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_839_loss/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]
Caused by op u'dilated_stack/layer4/conv_g', defined at:
File "train.py", line 144, in
main()
File "train.py", line 91, in main
loss = net.loss(audio_batch)
File "/home/gene/projects/tensorflow-wavenet/wavenet.py", line 102, in loss
raw_output = self._create_network(encoded)
File "/home/gene/projects/tensorflow-wavenet/wavenet.py", line 71, in _create_network
dilation=dilation)
File "/home/gene/projects/tensorflow-wavenet/wavenet.py", line 31, in _create_dilation_layer
name="conv_g")
File "/mnt/drive1/virtualenvs/caffe/local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 221, in atrous_conv2d
name=name)
File "/mnt/drive1/virtualenvs/caffe/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 394, in conv2d
data_format=data_format, name=name)
File "/mnt/drive1/virtualenvs/caffe/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 703, in apply_op
op_def=op_def)
File "/mnt/drive1/virtualenvs/caffe/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2310, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/mnt/drive1/virtualenvs/caffe/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1232, in init
self._traceback = _extract_stack()

jyegerlehner · 2016-09-15T06:46:16Z

@ibab
I don't think there's any reason why the number of channels in the residual blocks needs to be equal to the number of quantization levels (256). I think 256 resblock channels could be overkill and uses too much memory. If n=20 can encode an entire 28x28 MNIST image, I would think n=16 or n=8 could encode a single amplitude "pixel". If you look at Figure 4 in the paper, they show a single causal convolution outside of the residual blocks. I'm guessing its role is to create the number of channels in the shape that is then carried through the resblocks. Then the 1x1 convs project back up from the resblock channels (e.g. n=8) to the quantization levels (256) just before the softmax. This would be a dramatic memory savings. I think I could make this change, except I don't know how to make it causal. I don't understand how your tf.pad call makes the dilated convs causal.

ibab · 2016-09-15T07:20:05Z

Yeah, this makes sense.
I think I was thrown off by the fact that we would have to keep expanding the number of channels back to a fixed size before feeding them to the next layer, so there is no step-by-step reduction as I would expect for a convnet.

I'd happily accept a PR for this.
I think it should be enough to simply change the number of channels, the convolution should stay causal.
The idea behind the tf.pad call is to shift the output to the right by the dilation rate.
Looking at figure 3 in the paper, padding with 1, 2, 4, ... in each layer should add up to the total amount of padding required to right-align the output of the filters with their inputs.
(I've also used VALID padding, which leaves out the last few values so we don't need to remove them before applying the padding).

ibab · 2016-09-15T10:04:57Z

@jyegerlehner: Are you currently working on a fix?
I've implemented the changes locally, and the network is converging to a significantly lower loss 👍
I can push the changes if you're not currently working on something, otherwise I'll wait for your PR.

@jyegerlehner

I've missed the fact that the number of channels in the residual blocks can be smaller than the number of quantization steps. This greatly reduces memory consumption and leads to much better convergence of the network. Thanks a lot to @jyegerlehner for pointing this out! See discussion in #4.

jyegerlehner · 2016-09-15T15:43:19Z

@ibab, I just got back to work on this; was planning to, but I see you've already done it! 👍

sjain07 · 2016-09-15T18:36:07Z

I changed the quantization levels to 16 running on a g2.2xlarge aws instance. Getting an OOM exception.
wavenet_parms.json looks like:
{
"filter_width": 2,
"quantization_steps": 16,
"sample_rate": 16000,
"dilations": [1, 2, 4, 8, 16, 32],
"residual_channels": 64,
"dilation_channels": 32
}

Stacktrace:
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 426.62MiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:940] Resource exhausted: OOM when allocating tensor with shape[1024,32,1,3413]
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (256): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (512): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (1024): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (2048): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (4096): Total Chunks: 1, Chunks in use: 0 5.5KiB allocated for chunks. 512B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (8192): Total Chunks: 1, Chunks in use: 0 8.0KiB allocated for chunks. 8.0KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (16384): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (32768): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (65536): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (131072): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (262144): Total Chunks: 1, Chunks in use: 0 410.8KiB allocated for chunks. 212.56MiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (524288): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (1048576): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (2097152): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (4194304): Total Chunks: 1, Chunks in use: 0 6.67MiB allocated for chunks. 6.67MiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (8388608): Total Chunks: 1, Chunks in use: 0 13.33MiB allocated for chunks. 13.33MiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (16777216): Total Chunks: 2, Chunks in use: 0 53.34MiB allocated for chunks. 53.34MiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (33554432): Total Chunks: 1, Chunks in use: 0 40.01MiB allocated for chunks. 26.67MiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (67108864): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (134217728): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (268435456): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:656] Bin for 426.62MiB was 256.00MiB, Chunk State:
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024a0000 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024a0100 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024a0200 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024a0300 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024a0400 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024a0500 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024a2500 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024a6500 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024a8500 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024a8d00 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024a8f00 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024a9700 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024a9f00 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024aa100 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024aa300 of size 24576
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024b0300 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024b4300 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024b8300 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024bc300 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024c0300 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024c4300 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024c8300 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024cc300 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024d0300 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024d2300 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024d4300 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024d6300 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024d8300 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024da300 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024dc300 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024dcb00 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024dcd00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024dce00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024dcf00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024dd000 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024dd100 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024dd200 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024dd300 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024dd400 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024dd500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024dd600 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024ded00 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024e0d00 of size 24576
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024e8d00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024e8e00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024e8f00 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024eaf00 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024ecf00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024f0f00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024f4f00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024f8f00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7024fcf00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702500f00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702504f00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702508f00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70250cf00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702510f00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702514f00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702518f00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70251cf00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702520f00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702524f00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702528f00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70252cf00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702530f00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702534f00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702538f00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70253cf00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702540f00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702544f00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702548f00 of size 16384
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70254cf00 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70254ef00 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702550f00 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702552f00 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702554f00 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702556f00 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x702558f00 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70255af00 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70255cf00 of size 8192
........
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 2 Chunks of size 13983744 totalling 26.67MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 13985280 totalling 13.34MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 27966208 totalling 26.67MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 2 Chunks of size 27966464 totalling 53.34MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 55932928 totalling 53.34MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 111865856 totalling 106.68MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 223739904 totalling 213.38MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 447348736 totalling 426.62MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 447479808 totalling 426.75MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 894959616 totalling 853.50MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 1333799424 totalling 1.24GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] Sum Total of in-use chunks: 3.55GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats:
Limit: 3928915968
InUse: 3809625088
MaxInUse: 3872537088
NumAllocs: 5933
MaxAllocSize: 1855949824

W tensorflow/core/common_runtime/bfc_allocator.cc:270] *****************************************************************************************xxxxxxxxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 426.62MiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:940] Resource exhausted: OOM when allocating tensor with shape[1024,32,1,3413]
Traceback (most recent call last):
File "train.py", line 173, in
main()
File "train.py", line 158, in main
summary, loss_value, _ = sess.run([summaries, loss, optim])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 710, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 908, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 958, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 978, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.ResourceExhaustedError: OOM when allocating tensor with shape[1024,32,1,3413]
[[Node: wavenet/dilated_stack/layer5/conv_filter = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](wavenet/dilated_stack/layer5/conv_filter/SpaceToBatch, wavenet/dilated_stack/layer5/Variable/read)]]
[[Node: wavenet/loss/Mean/_67 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_308_wavenet/loss/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]
Caused by op u'wavenet/dilated_stack/layer5/conv_filter', defined at:
File "train.py", line 173, in
main()
File "train.py", line 118, in main
loss = net.loss(audio_batch)
File "/home/ubuntu/tensorflow-wavenet/wavenet.py", line 165, in loss
raw_output = self._create_network(encoded)
File "/home/ubuntu/tensorflow-wavenet/wavenet.py", line 112, in _create_network
self.dilation_channels)
File "/home/ubuntu/tensorflow-wavenet/wavenet.py", line 51, in _create_dilation_layer
name="conv_filter")
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 221, in atrous_conv2d
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 394, in conv2d
data_format=data_format, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 703, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2317, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1239, in init
self._traceback = _extract_stack()

lemonzi · 2016-09-15T18:41:02Z

Have you modified the batch size?

sjain07 · 2016-09-15T18:42:25Z

@lemonzi no the batch size is the default as mentioned in train.py: ie 1
running python train.py no extra params

lemonzi · 2016-09-15T18:43:24Z

@ansh7 Never mind then -- I saw a Tensor shape in the logs and had a hunch.

genekogan · 2016-09-15T20:41:02Z

lowering the sample rate also helps avoid memory issues.

polyrhythmatic · 2016-09-15T21:00:12Z

@ibab what did you originally train this on? I'm using a GTX Titan X and running out of memory. Is anyone having luck with lower quantization levels?

ibab · 2016-09-15T21:04:35Z

There seems to be a weird issue with garbage collection when using the dilated convolutions.
If you set all the dilations to 1 in wavenet_params.json, then it will use regular convolution, which seems to work extremely well (evaluates faster and doesn't run into memory issues as quickly).
I've been able to train models with upwards of 30 convolutional layers on a K40c when setting the dilations to 1.

Theoretically, the dilated convolution should be just as fast, but the TensorFlow version is implemented by combining existing ops and I suspect it's not as efficient as the simple convolution.

lemonzi · 2016-09-15T21:36:46Z

It looks like it's making a lot of assumptions about the data being 2D. We don't need this padding, maybe we should use space_to_batch and conv2d manually to adapt it to the 1D case.

https://github.com/tensorflow/tensorflow/blob/c856366b739850a9f4b0bf1469de7f052619042b/tensorflow/python/ops/nn_ops.py#L208

What this line does is basically pad the height (which is 1 for audio) so that it is equal to the dilation, and this will be cropped back after the convolution to match the output padding.

jyegerlehner · 2016-09-15T21:42:28Z

@polyrhythmatic
I'm also using a Titan-X (12GB) and was still getting OOM after recent fixes. So I dropped the number of channels a bit in wavenet_params.json and it ran to completion:

"residual_channels": 32,
"dilation_channels": 16

Edit: when I say "to completion", I mean for the default number of steps == 2000, which I had never been able to do before.

Guesses/Speculation:
Tensorflow appears to grab all the memory and has its own GPU heap allocation algorithm. I suspect it tries to re-use previously allocated chunks until memory is too fragmented. I don't have time to look at the wavenet code at the moment but I get the impression it's creating input tensors that match the length of the wav file (?not sure). If so, each one is different length, and whenever it finds one that's bigger than the largest previously-allocated chunk, it has to allocate new memory (since all the activation and gradient tensors will have a new size), and so the heap becomes progressively more fragmented. Until there's not enough left. So if most of the clips are less than some duration, say 3 seconds (I just made that number up, but whatever a good number would be), and occasionally there are longer ones, I wonder if we couldn't cap the length to that max length. Then the heap allocator would always have already-allocated chunks available that are big enough it can re-use.

ibab · 2016-09-15T22:39:11Z

Just came to the same realization as @lemonzi as to why atrous_conv2d is using so much more memory than conv2d.

It pads the height dimension so that dilation divides it exactly.
When dilation is something like 512, this will pad the tensor with an astronomical number of zeros, because our width is so large (> 10^4).
So we can't use atrous_conv2d.

Actually, we can't use space_to_batch either, because it requires that the dilation rate divides both height and width. The case where either is exactly 1, which can be handled cleanly, has unfortunately not been treated as a special case.
This might be worth a feature request.

What we can do instead is cut away the end of the tensor so that dilation dividies it exactly, reshape, perform the 2d convolution and reshape again.
If we pad the beginning of the output with zeros, it will be exactly the causal convolution we're looking for.

ibab · 2016-09-16T00:48:52Z

I've fixed the problem in 8add545.
Like atrous_conv2d, I swap the width dimension out into the batch dimension and perform a regular convolution, but without padding in the height dimension.
It's now possible to train large stacks of dilation layers without running out of memory.

Judging from occasional garbage collection log messages, I think the issue mentioned by @jyegerlehner is also valid. It would probably make sense to cut inputs to a fixed size.

adroit91 · 2016-09-16T01:08:28Z

@ibab Ran into the same issue with Titan X. Now, trying out your latest commit. Would you have any numbers to share about GPU used (I think you've mentioned K40c somewhere), time taken for convergence, and maybe a comment about quality of results you've seen?

nonstop99 · 2016-09-16T10:27:32Z

Would it be possible to post a link to a pre-trained model we can use? and a link to some example wav output(s)?

ibab · 2016-09-16T15:10:26Z

I'm still in the process of finding good hyperparameters, and finding the cause of the generation issue in #13.
After that, I'll generate audio samples and provide statistics on how long it takes to train the model.

ibab · 2016-09-16T15:16:37Z

Fixing the convolution op seems to have fixed the issue of easily running into OOM errors, so I'm closing this issue and opening another one on the fact that we might want to crop the samples to a fixed length.

ibab mentioned this issue Sep 14, 2016

Training error in main.py #1

Closed

ibab mentioned this issue Sep 14, 2016

OOM on GTX 1080 #10

Closed

ibab closed this as completed Sep 16, 2016

ibab mentioned this issue Sep 17, 2016

OOM when allocating tensor basveeling/wavenet#7

Open

aclaussen1 mentioned this issue Jan 4, 2017

ResourceExhaustedErrorResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,100000,512] #201

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive memory consumption #4

Excessive memory consumption #4

ibab commented Sep 14, 2016

genekogan commented Sep 14, 2016

jyegerlehner commented Sep 15, 2016 •

edited

Loading

ibab commented Sep 15, 2016 •

edited

Loading

ibab commented Sep 15, 2016

jyegerlehner commented Sep 15, 2016

sjain07 commented Sep 15, 2016 •

edited

Loading

lemonzi commented Sep 15, 2016

sjain07 commented Sep 15, 2016

lemonzi commented Sep 15, 2016

genekogan commented Sep 15, 2016

polyrhythmatic commented Sep 15, 2016

ibab commented Sep 15, 2016

lemonzi commented Sep 15, 2016

jyegerlehner commented Sep 15, 2016 •

edited

Loading

ibab commented Sep 15, 2016

ibab commented Sep 16, 2016

adroit91 commented Sep 16, 2016

nonstop99 commented Sep 16, 2016

ibab commented Sep 16, 2016 •

edited

Loading

ibab commented Sep 16, 2016

Excessive memory consumption #4

Excessive memory consumption #4

Comments

ibab commented Sep 14, 2016

genekogan commented Sep 14, 2016

jyegerlehner commented Sep 15, 2016 • edited Loading

ibab commented Sep 15, 2016 • edited Loading

ibab commented Sep 15, 2016

jyegerlehner commented Sep 15, 2016

sjain07 commented Sep 15, 2016 • edited Loading

lemonzi commented Sep 15, 2016

sjain07 commented Sep 15, 2016

lemonzi commented Sep 15, 2016

genekogan commented Sep 15, 2016

polyrhythmatic commented Sep 15, 2016

ibab commented Sep 15, 2016

lemonzi commented Sep 15, 2016

jyegerlehner commented Sep 15, 2016 • edited Loading

ibab commented Sep 15, 2016

ibab commented Sep 16, 2016

adroit91 commented Sep 16, 2016

nonstop99 commented Sep 16, 2016

ibab commented Sep 16, 2016 • edited Loading

ibab commented Sep 16, 2016

jyegerlehner commented Sep 15, 2016 •

edited

Loading

ibab commented Sep 15, 2016 •

edited

Loading

sjain07 commented Sep 15, 2016 •

edited

Loading

jyegerlehner commented Sep 15, 2016 •

edited

Loading

ibab commented Sep 16, 2016 •

edited

Loading