Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

crash when running python ./predict-with-pretrained-model.py #212

Closed
kaishijeng opened this issue Oct 6, 2015 · 6 comments
Closed

crash when running python ./predict-with-pretrained-model.py #212

kaishijeng opened this issue Oct 6, 2015 · 6 comments

Comments

@kaishijeng
Copy link

I have Nvidia 960 graphic card with 4 GB memory. When I ran python ./predict-with-pretrained-model.py, I got the following error message. Any iiea why this happens?

[18:28:42] ./dmlc-core/include/dmlc/logging.h:208: [18:28:42] src/operator/./convolution-inl.h:251: Check failed: (param_.workspace) >= (scol.Size() + sdst.Size())
Minimum workspace size: 169394176
Given: 134217728
[18:28:42] ./dmlc-core/include/dmlc/logging.h:208: [18:28:42] src/engine/./threaded_engine.h:290: [18:28:42] src/operator/./convolution-inl.h:251: Check failed: (param_.workspace) >= (scol.Size() + sdst.Size())
Minimum workspace size: 169394176
Given: 134217728
terminate called after throwing an instance of 'dmlc::Error'
what(): [18:28:42] src/engine/./threaded_engine.h:290: [18:28:42] src/operator/./convolution-inl.h:251: Check failed: (param_.workspace) >= (scol.Size() + sdst.Size())
Minimum workspace size: 169394176
Given: 134217728
Aborted (core dumped)

Thanks,
Kaishi

@antinucleon
Copy link
Contributor

The reason is your card didn't provide enough GPU memory, as the default batch size for numpy is 128. In this case, you can run the example in this way:

Change block 2 from:

# Load the pre-trained model
prefix = "Inception/Inception_BN"
num_round = 39
model = mx.model.FeedForward.load(prefix, num_round, ctx=mx.gpu())

to

# Load the pre-trained model
prefix = "Inception/Inception_BN"
num_round = 39
tmp_model = mx.model.FeedForward.load(prefix, num_round, ctx=mx.cpu())
model = mx.model.FeedForward(symbol=tmp_model.symbol, ctx=mx.gpu(), numpy_batch_size=1, arg_params=tmp_model.arg_params, aux_params=tmp_model.aux_params)

@antinucleon
Copy link
Contributor

BTW, I will make a change to original load function to make change batch size easier.

@kaishijeng
Copy link
Author

antinucleon,

   It helps to run through prediction. However, get internals from model's symbol part still gives out errors (See below):

python ./predict-with-pretrained-model.py
('Original Image Shape: ', (225, 400, 3))
('Top1: ', 'n03891251 park bench')
('Top5: ', ['n03891251 park bench', 'n03776460 mobile home, manufactured home', 'n02747177 ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash barrel, trash bin', 'n03891332 parking meter', 'n06794110 street sign'])
[21:08:24] ./dmlc-core/include/dmlc/logging.h:208: [21:08:24] src/operator/./convolution-inl.h:251: Check failed: (param_.workspace) >= (scol.Size() + sdst.Size())
Minimum workspace size: 169394176
Given: 134217728
[21:08:24] ./dmlc-core/include/dmlc/logging.h:208: [21:08:24] src/engine/./threaded_engine.h:290: [21:08:24] src/operator/./convolution-inl.h:251: Check failed: (param_.workspace) >= (scol.Size() + sdst.Size())
Minimum workspace size: 169394176
Given: 134217728
terminate called after throwing an instance of 'dmlc::Error'
what(): [21:08:24] src/engine/./threaded_engine.h:290: [21:08:24] src/operator/./convolution-inl.h:251: Check failed: (param_.workspace) >= (scol.Size() + sdst.Size())
Minimum workspace size: 169394176
Given: 134217728
Aborted (core dumped)

Thanks,
Kaishi

@antinucleon
Copy link
Contributor

Again, the default batch size is 128, before I push the fix, maybe you can try to set numpy_batch_size in similar way to 1

@kaishijeng
Copy link
Author

It works and thanks,

Kaishi

@Davidrjx
Copy link

Davidrjx commented Feb 6, 2018

i came with similar problem when running
mxnet.nd.ones((2,3),mx.gpu()) , but error as follows:

terminate called after throwing an instance of 'dmlc::Error'
  what():  [05:55:58] /opt/incubator-mxnet/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess CUDA: unknown error

Stack trace returned 9 entries:
[bt] (0) /opt/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5a) [0x7f38edde018a]
[bt] (1) /opt/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f38edde0d28]
[bt] (2) /opt/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(void mshadow::SetDevice<mshadow::gpu>(int)+0xd0) [0x7f38f094b080]
[bt] (3) /opt/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>*, std::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent> const&)+0x87) [0x7f38f0954fe7]
[bt] (4) /opt/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#3}::operator()() const::{lambda(std::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>&&)+0x4e) [0x7f38f095529e]
[bt] (5) /opt/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>)> (std::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>)> >::_M_run()+0x4a) [0x7f38f094e97a]
[bt] (6) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f39851cac80]
[bt] (7) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f398aa206ba]
[bt] (8) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f398a75641d]


terminate called recursively
Aborted (core dumped)

please give possible solutions , thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants