This repository has been archived by the owner on Nov 16, 2019. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 358
Getting error while running in yarn mode #81
Comments
it seems caffe could not find GPU devices in the executor. Please make sure the program has access to GPU devices (as many as ${DEVICES) GPUs). |
@abhaymise If this is fixed, please comment and close. |
Yeah the issue is fixed now.The number of devices passed as argument was the issue.Devices should be equal to number of GPUs in each of your machine.In my case i was using a g2.2xlarge machine which only contains one GPU.When i ran it on cluster i thought this number should be increased to three which caused the above error.When made the $DEVICES variable to one it ran. Thanks mridul for your constant support and creating such a beautiful API. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
16/06/10 14:32:20 INFO caffe.CaffeProcessor: my rank is 1
16/06/10 14:32:20 INFO caffe.LMDB: Batch size:64
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0610 14:32:20.571682 10978 CaffeNet.cpp:67] Check failed: d >= 0 (-1 vs. 0) cannot grab GPU device
command used is
spark-submit --master yarn --deploy-mode cluster --num-executors ${SPARK_WORKER_INSTANCES} --files ${CAFFE_ON_SPARK}/data/lenet_memory_solver.prototxt,${CAFFE_ON_SPARK}/data/lenet_memory_train_test.prototxt --conf spark.driver.extraLibraryPath="${LD_LIBRARY_PATH}" --conf spark.executorEnv.LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" --class com.yahoo.ml.caffe.CaffeOnSpark ${CAFFE_ON_SPARK}/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar -train -features accuracy,loss -label label -conf lenet_memory_solver.prototxt -devices ${DEVICES} -connection ethernet -model hdfs:///mnist.model -output hdfs:///mnist_features_result
Please guide
The text was updated successfully, but these errors were encountered: