Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking Systems without GPU #26

Open
jerrin92 opened this issue Jan 11, 2018 · 4 comments
Open

Benchmarking Systems without GPU #26

jerrin92 opened this issue Jan 11, 2018 · 4 comments

Comments

@jerrin92
Copy link

Hi Team,

I am trying to benchmark a system without gpu. However, while running the benchmark script, it looks for nvidia-smi.

CalledProcessError: Command 'nvidia-smi' returned non-zero exit status 127

This is the same error that I get with fcn5, alexnet, resnet,lstm.

In addition, we plan to run the benchmarking on mxnet, tensorflow and caffe. So from the documentation, I understand that we need to copy the zip files to $HOME/data. However, we need to use the configuration file that is associated with the particular framework for it to work. Is that correct?

@shyhuai
Copy link
Collaborator

shyhuai commented Jan 12, 2018

Hi, without GPU, please remove line 116 and line 117 , which are used for GPU power collection, in the file of benchmark.py.

It is correct of your understanding that you need to write your own configuration file for your benchmarks. For data preparation, you also need to unzip the data file that you downloaded, and put them in the directory of $HOME/data.

@jerrin92
Copy link
Author

jerrin92 commented Jan 16, 2018

Yup, now the errors related to gpus are gone. However, I get the following error message.

[bt] (9) /N/dc2/scratch/jerkatta/mxnet/python/mxnet/../../lib/libmxnet.so(MXExecutorSimpleBind+0x2069) [0x7fe9f010d609]

Traceback (most recent call last):
  File "train_cifar10.py", line 54, in <module>
    fit.fit(args, sym, data.get_rec_iter, init)
  File "/gpfs/home/j/e/jerkatta/Carbonate/benchmarking/dlbench/tools/mxnet/common/fit.py", line 187, in fit
    monitor            = monitor)
  File "/N/dc2/scratch/jerkatta/mxnet/python/mxnet/module/base_module.py", line 460, in fit
    for_training=True, force_rebind=force_rebind)
  File "/N/dc2/scratch/jerkatta/mxnet/python/mxnet/module/module.py", line 417, in bind
    state_names=self._state_names)
  File "/N/dc2/scratch/jerkatta/mxnet/python/mxnet/module/executor_group.py", line 231, in __init__
    self.bind_exec(data_shapes, label_shapes, shared_group)
  File "/N/dc2/scratch/jerkatta/mxnet/python/mxnet/module/executor_group.py", line 327, in bind_exec
    shared_group))
  File "/N/dc2/scratch/jerkatta/mxnet/python/mxnet/module/executor_group.py", line 603, in _bind_ith_exec
    shared_buffer=shared_data_arrays, **input_shapes)
  File "/N/dc2/scratch/jerkatta/mxnet/python/mxnet/symbol.py", line 1479, in simple_bind
    raise RuntimeError(error_msg)
RuntimeError: simple_bind error. Arguments:
data: (128, 3L, 32L, 32L)
softmax_label: (128,)
[12:31:58] src/storage/storage.cc:113: Compile with USE_CUDA=1 to enable GPU usage

In addition, the current log files generated are,

./mxnet-cnn-alexnet--devId0,1,2,3-c4-b256-Tue_Jan_16_12:31:50_2018-xx.log:Total time: 1.67378902435

./mxnet-cnn-alexnet--devId0-c1-b1024-Tue_Jan_16_12:31:55_2018-e1.xx.log:Total time: 1.23220992088

./mxnet-cnn-resnet--devId0,1,2,3-c4-b128-Tue_Jan_16_12:31:52_2018-xx.log:Total time: 1.2443048954

./mxnet-cnn-resnet--devId0-c1-b128-Tue_Jan_16_12:31:56_2018-xx.log:Total time: 1.14851498604

./mxnet-fc-fcn5--devId0,1,2,3-c4-b1024-Tue_Jan_16_12:31:47_2018-xx.log:Total time: 2.8816781044

./mxnet-fc-fcn5--devId0-c1-b4096-Tue_Jan_16_12:31:54_2018-xx.log:Total time: 1.44347310066

./mxnet-rnn-lstm--devId0-c1-b1024-Tue_Jan_16_12:31:58_2018-xx.log:Total time: 1.68027210236

and we are running it without GPUs

@FreemanX
Copy link
Collaborator

You can narrow down the cause of the problem by testing MXNet only. Direct to dlbench/tools/mxnet and run testbm.sh. You may need to modify the script and comment out the lines for GPU tests. You can also append the flag -debug to each test line so that more info will be given to help you debug.

@jerrin92
Copy link
Author

Okay, shall try the same. Some of the testbm.sh do not have the test statements for CPUs. I am assuming that if I pass -cpuCount 20 instead of -gpuCount 1, that would solve the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants