-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Ndarray.asnumpy() error with gluon dense under both GPU and CPU environment #10807
Comments
modified your script as this: from mxnet.gluon import nn
import mxnet as mx
mx.Context.default_ctx = mx.Context('cpu', 0)
layer = nn.Dense(1000)
x = mx.nd.random.uniform(shape=(16, 128, 300, 300))
x.attach_grad()
layer.collect_params().initialize()
with mx.autograd.record():
out = layer(x)
out.backward()
print(x.grad.shape)
print(x.grad)
print(x.grad.asnumpy().shape) with mxnet-mkl 1.1.0 from pypi, I got this:
with mxnet-mkl 1.2.0b20180503 from pypi, I got this:
So, I am totally confused... |
@sandeep-krishnamurthy could you help to add label NDArray, MKL? Thanks |
@roywei Thanks for follow-up, this issue happens on GPU platform as well, so MKL lable might limit the scope. May I have your double-check? Thanks. :) |
are you sure you can do it in CPU without MKLDNN? |
@zheng-da As I mentioned before, all environments(GPU/CPU/MKLDNN) failed. According to your response, can I consider it as an out-of-memory error? |
i think so |
I think this is due to the use of index_t (which is uint32_t) vs int64_t in tesnorblob. This is a legacy issue. We should use int64_t for all indexing |
Verified the PR #11742 will fix this issue. @sandeep-krishnamurthy please close this. Thanks! |
Description
I try to use gluon API to test mkldnn while an error occurs when doing asnumpy() operation after the network's backward propogation. So I performed the test under gpu and native cpu environment, and the error still exists. What' more, the small size of input data will not trigger the error.
You can run the following mininum example to reproduce the error.
Error message:
print x.grad.asnumpy().shape
File "/home/linliu/mxnet_gpu/incubator-mxnet/python/mxnet/ndarray/ndarray.py", line 1876, in asnumpy
ctypes.c_size_t(data.size)))
File "/home/linliu/mxnet_gpu/incubator-mxnet/python/mxnet/base.py", line 149, in check_call
raise MXNetError(py_str(LIB.MXGetLastError()))
mxnet.base.MXNetError: ([04:25:06] include/mxnet/././tensor_blob.h:257: Check failed: this->shape.Size() == shape.Size() (11520000000 vs. 2930065408) TBlob.get_with_shape: new and old shape do not match total elements
The text was updated successfully, but these errors were encountered: