Skip to content
This repository has been archived by the owner on Nov 16, 2019. It is now read-only.

Getting error while running in yarn mode #81

Closed
abhaymise opened this issue Jun 10, 2016 · 3 comments
Closed

Getting error while running in yarn mode #81

abhaymise opened this issue Jun 10, 2016 · 3 comments

Comments

@abhaymise
Copy link

16/06/10 14:32:20 INFO caffe.CaffeProcessor: my rank is 1
16/06/10 14:32:20 INFO caffe.LMDB: Batch size:64
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0610 14:32:20.571682 10978 CaffeNet.cpp:67] Check failed: d >= 0 (-1 vs. 0) cannot grab GPU device

command used is

spark-submit --master yarn --deploy-mode cluster --num-executors ${SPARK_WORKER_INSTANCES} --files ${CAFFE_ON_SPARK}/data/lenet_memory_solver.prototxt,${CAFFE_ON_SPARK}/data/lenet_memory_train_test.prototxt --conf spark.driver.extraLibraryPath="${LD_LIBRARY_PATH}" --conf spark.executorEnv.LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" --class com.yahoo.ml.caffe.CaffeOnSpark ${CAFFE_ON_SPARK}/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar -train -features accuracy,loss -label label -conf lenet_memory_solver.prototxt -devices ${DEVICES} -connection ethernet -model hdfs:///mnist.model -output hdfs:///mnist_features_result

Please guide

@junshi15
Copy link
Collaborator

it seems caffe could not find GPU devices in the executor. Please make sure the program has access to GPU devices (as many as ${DEVICES) GPUs).

@mriduljain
Copy link
Contributor

@abhaymise If this is fixed, please comment and close.

@abhaymise
Copy link
Author

Yeah the issue is fixed now.The number of devices passed as argument was the issue.Devices should be equal to number of GPUs in each of your machine.In my case i was using a g2.2xlarge machine which only contains one GPU.When i ran it on cluster i thought this number should be increased to three which caused the above error.When made the $DEVICES variable to one it ran.

Thanks mridul for your constant support and creating such a beautiful API.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants