crash when running python ./predict-with-pretrained-model.py #212

kaishijeng · 2015-10-06T01:32:28Z

I have Nvidia 960 graphic card with 4 GB memory. When I ran python ./predict-with-pretrained-model.py, I got the following error message. Any iiea why this happens?

[18:28:42] ./dmlc-core/include/dmlc/logging.h:208: [18:28:42] src/operator/./convolution-inl.h:251: Check failed: (param_.workspace) >= (scol.Size() + sdst.Size())
Minimum workspace size: 169394176
Given: 134217728
[18:28:42] ./dmlc-core/include/dmlc/logging.h:208: [18:28:42] src/engine/./threaded_engine.h:290: [18:28:42] src/operator/./convolution-inl.h:251: Check failed: (param_.workspace) >= (scol.Size() + sdst.Size())
Minimum workspace size: 169394176
Given: 134217728
terminate called after throwing an instance of 'dmlc::Error'
what(): [18:28:42] src/engine/./threaded_engine.h:290: [18:28:42] src/operator/./convolution-inl.h:251: Check failed: (param_.workspace) >= (scol.Size() + sdst.Size())
Minimum workspace size: 169394176
Given: 134217728
Aborted (core dumped)

Thanks,
Kaishi

antinucleon · 2015-10-06T02:45:04Z

The reason is your card didn't provide enough GPU memory, as the default batch size for numpy is 128. In this case, you can run the example in this way:

Change block 2 from:

# Load the pre-trained model
prefix = "Inception/Inception_BN"
num_round = 39
model = mx.model.FeedForward.load(prefix, num_round, ctx=mx.gpu())

to

# Load the pre-trained model
prefix = "Inception/Inception_BN"
num_round = 39
tmp_model = mx.model.FeedForward.load(prefix, num_round, ctx=mx.cpu())
model = mx.model.FeedForward(symbol=tmp_model.symbol, ctx=mx.gpu(), numpy_batch_size=1, arg_params=tmp_model.arg_params, aux_params=tmp_model.aux_params)

antinucleon · 2015-10-06T02:47:10Z

BTW, I will make a change to original load function to make change batch size easier.

kaishijeng · 2015-10-06T04:09:19Z

antinucleon,

   It helps to run through prediction. However, get internals from model's symbol part still gives out errors (See below):

python ./predict-with-pretrained-model.py
('Original Image Shape: ', (225, 400, 3))
('Top1: ', 'n03891251 park bench')
('Top5: ', ['n03891251 park bench', 'n03776460 mobile home, manufactured home', 'n02747177 ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash barrel, trash bin', 'n03891332 parking meter', 'n06794110 street sign'])
[21:08:24] ./dmlc-core/include/dmlc/logging.h:208: [21:08:24] src/operator/./convolution-inl.h:251: Check failed: (param_.workspace) >= (scol.Size() + sdst.Size())
Minimum workspace size: 169394176
Given: 134217728
[21:08:24] ./dmlc-core/include/dmlc/logging.h:208: [21:08:24] src/engine/./threaded_engine.h:290: [21:08:24] src/operator/./convolution-inl.h:251: Check failed: (param_.workspace) >= (scol.Size() + sdst.Size())
Minimum workspace size: 169394176
Given: 134217728
terminate called after throwing an instance of 'dmlc::Error'
what(): [21:08:24] src/engine/./threaded_engine.h:290: [21:08:24] src/operator/./convolution-inl.h:251: Check failed: (param_.workspace) >= (scol.Size() + sdst.Size())
Minimum workspace size: 169394176
Given: 134217728
Aborted (core dumped)

Thanks,
Kaishi

antinucleon · 2015-10-06T04:11:33Z

Again, the default batch size is 128, before I push the fix, maybe you can try to set numpy_batch_size in similar way to 1

kaishijeng · 2015-10-06T04:22:29Z

It works and thanks,

Kaishi

Davidrjx · 2018-02-06T06:01:34Z

i came with similar problem when running
mxnet.nd.ones((2,3),mx.gpu()) , but error as follows:

terminate called after throwing an instance of 'dmlc::Error'
  what():  [05:55:58] /opt/incubator-mxnet/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess CUDA: unknown error

Stack trace returned 9 entries:
[bt] (0) /opt/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5a) [0x7f38edde018a]
[bt] (1) /opt/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f38edde0d28]
[bt] (2) /opt/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(void mshadow::SetDevice<mshadow::gpu>(int)+0xd0) [0x7f38f094b080]
[bt] (3) /opt/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>*, std::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent> const&)+0x87) [0x7f38f0954fe7]
[bt] (4) /opt/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#3}::operator()() const::{lambda(std::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>&&)+0x4e) [0x7f38f095529e]
[bt] (5) /opt/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>)> (std::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>)> >::_M_run()+0x4a) [0x7f38f094e97a]
[bt] (6) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f39851cac80]
[bt] (7) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f398aa206ba]
[bt] (8) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f398a75641d]


terminate called recursively
Aborted (core dumped)

please give possible solutions , thanks!

tqchen assigned antinucleon Oct 6, 2015

antinucleon closed this as completed Oct 6, 2015

vchuravy mentioned this issue Nov 3, 2015

Problems with workspace size dmlc/MXNet.jl#14

Closed

stefanhenneking pushed a commit to stefanhenneking/mxnet that referenced this issue Jun 30, 2017

fix detect cuda archs error (apache#212)

129a35b

pono unassigned antinucleon Jul 28, 2017

stsukrov mentioned this issue Feb 7, 2019

Poor performance of the libmxnet if OMP_PLACES environment variable is present #14087

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crash when running python ./predict-with-pretrained-model.py #212

crash when running python ./predict-with-pretrained-model.py #212

kaishijeng commented Oct 6, 2015

antinucleon commented Oct 6, 2015

antinucleon commented Oct 6, 2015

kaishijeng commented Oct 6, 2015

antinucleon commented Oct 6, 2015

kaishijeng commented Oct 6, 2015

Davidrjx commented Feb 6, 2018 •

edited

Loading

crash when running python ./predict-with-pretrained-model.py #212

crash when running python ./predict-with-pretrained-model.py #212

Comments

kaishijeng commented Oct 6, 2015

antinucleon commented Oct 6, 2015

antinucleon commented Oct 6, 2015

kaishijeng commented Oct 6, 2015

antinucleon commented Oct 6, 2015

kaishijeng commented Oct 6, 2015

Davidrjx commented Feb 6, 2018 • edited Loading

Davidrjx commented Feb 6, 2018 •

edited

Loading