cifar100 with resnet #113

apeterswu · 2018-05-07T07:29:59Z

Hi,

I try to run the resnet-32 model on cifar-100 dataset, with only the difference of the training data in "Deep_Residual_Learning_CIFAR-10.py", but it causes the error like this:

Starting training...
Traceback (most recent call last):
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/theano/compile/function_module.py", line 903, in __call__
    self.fn() if output_subset is None else\
RuntimeError: error getting worksize: CUDNN_STATUS_BAD_PARAM

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "resnet.py", line 390, in <module>
    main(**kwargs)
  File "resnet.py", line 319, in main
    train_err += train_fn(inputs, targets)
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/theano/compile/function_module.py", line 917, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/theano/gof/link.py", line 325, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/theano/compile/function_module.py", line 903, in __call__
    self.fn() if output_subset is None else\
RuntimeError: error getting worksize: CUDNN_STATUS_BAD_PARAM
Apply node that caused the error: GpuDnnConv{algo='small', inplace=True, num_groups=1}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty{dtype='float32', context_name=None}.0, GpuDnnConvDesc{border_mode='half', subsample=(1, 1), dilation=(1, 1), conv_mode='cross', precision='float32', num_groups=1}.0, Constant{1.0}, Constant{0.0})
Toposort index: 399
Inputs types: [GpuArrayType<None>(float32, 4D), GpuArrayType<None>(float32, 4D), GpuArrayType<None>(float32, 4D), <theano.gof.type.CDataType object at 0x7fa464893a20>, Scalar(float32), Scalar(float32)]
Inputs shapes: [(128, 3, 32, 32), (16, 3, 3, 3), (128, 16, 32, 32), 'No shapes', (), ()]
Inputs strides: [(12288, 4096, 128, 4), (108, 36, 12, 4), (65536, 4096, 128, 4), 'No strides', (), ()]
Inputs values: ['not shown', 'not shown', 'not shown', <capsule object NULL at 0x7fa3997c61e0>, 1.0, 0.0]
Outputs clients: [[GpuElemwise{sub,no_inplace}(GpuDnnConv{algo='small', inplace=True, num_groups=1}.0, InplaceGpuDimShuffle{x,0,x,x}.0), GpuContiguous(GpuDnnConv{algo='small', inplace=True, num_groups=1}.0), GpuElemwise{sub,no_inplace}(GpuDnnConv{algo='small', inplace=True, num_groups=1}.0, GpuElemwise{Composite{(((i0 / i1) / i2) / i3)}}[]<gpuarray>.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
  File "resnet.py", line 390, in <module>
    main(**kwargs)
  File "resnet.py", line 267, in main
    prediction = lasagne.layers.get_output(network)
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/lasagne/layers/helper.py", line 197, in get_output
    all_outputs[layer] = layer.get_output_for(layer_inputs, **kwargs)
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/lasagne/layers/conv.py", line 352, in get_output_for
    conved = self.convolve(input, **kwargs)
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/lasagne/layers/conv.py", line 650, in convolve
    **extra_kwargs)

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Seems something wrong happend in the convolution operation. Could you please give any advice? Thanks a lot.

The text was updated successfully, but these errors were encountered:

f0k · 2018-06-12T13:17:45Z

Does this also happen with the original CIFAR-10 code? Sometimes it helps enforcing a different cuDNN algorithm or letting it choose automatically using:

THEANO_FLAGS=dnn.conv.algo_fwd=guess_on_shape_change,dnn.conv.algo_bwd_data=guess_on_shape_change,dnn.conv.algo_bwd_filter=guess_on_shape_change python resnet.py

If this doesn't help, you can also disable cuDNN using THEANO_FLAGS=dnn.enabled=False.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cifar100 with resnet #113

cifar100 with resnet #113

apeterswu commented May 7, 2018 •

edited

Loading

f0k commented Jun 12, 2018

cifar100 with resnet #113

cifar100 with resnet #113

Comments

apeterswu commented May 7, 2018 • edited Loading

f0k commented Jun 12, 2018

apeterswu commented May 7, 2018 •

edited

Loading