Skip to content
This repository has been archived by the owner on Nov 16, 2019. It is now read-only.

NullPointerException while training from dataframe #125

Closed
abhaymise opened this issue Jul 27, 2016 · 3 comments
Closed

NullPointerException while training from dataframe #125

abhaymise opened this issue Jul 27, 2016 · 3 comments

Comments

@abhaymise
Copy link

I am getting the following error while training from a dataframe

java.lang.NullPointerException
at com.yahoo.ml.caffe.CaffeOnSpark$$anonfun$6.apply(CaffeOnSpark.scala:178)
at com.yahoo.ml.caffe.CaffeOnSpark$$anonfun$6.apply(CaffeOnSpark.scala:172)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1025)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:1007)
at org.apache.spark.rdd.RDD$$anonfun$min$1.apply(RDD.scala:1418)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.min(RDD.scala:1417)
at com.yahoo.ml.caffe.CaffeOnSpark.train(CaffeOnSpark.scala:182)
at com.yahoo.ml.caffe.CaffeOnSpark$.main(CaffeOnSpark.scala:40)
at com.yahoo.ml.caffe.CaffeOnSpark.main(CaffeOnSpark.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NullPointerException
at com.yahoo.ml.caffe.CaffeOnSpark$$anonfun$6.apply(CaffeOnSpark.scala:178)
at com.yahoo.ml.caffe.CaffeOnSpark$$anonfun$6.apply(CaffeOnSpark.scala:172)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

This ran when i had num-executors as 4 and cores as 7 .

Current setting is :

spark-submit --master ${MASTER_URL}
--deploy-mode client
--executor-memory 5G
--executor-cores 3
--conf spark.yarn.executor.memoryOverhead=2000
--num-executors 6
--files /home/ubuntu/caffe/CaffeOnSpark/data/hdfs_walmart/WALNET/GOOGLE/DF/walmart_dataframe_train_val.prototxt,/home/ubuntu/caffe/CaffeOnSpark/data/hdfs_walmart/WALNET/GOOGLE/DF/walmart_dataframe_quick_solver.prototxt
--conf spark.driver.extraLibraryPath="${LD_LIBRARY_PATH}"
--conf spark.executorEnv.LD_LIBRARY_PATH="${LD_LIBRARY_PATH}"
--class com.yahoo.ml.caffe.CaffeOnSpark
${CAFFE_ON_SPARK}/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar
-train
-features accuracy,loss -label label
-conf /home/ubuntu/caffe/CaffeOnSpark/data/hdfs_walmart/WALNET/GOOGLE/DF/walmart_dataframe_quick_solver.prototxt
-clusterSize 8
-devices ${DEVICES}
-connection ethernet
-model hdfs://master:9000/walmart/googlenet/DF/model/walmart.model
-output hdfs://master:9000/walmart/googlenet/DF/features/walmart_test_result

@junshi15
Copy link
Collaborator

Your probably should not specify both --num-executors and --clusterSize.
num-executors is for yarn only. In that case, clusterSize is automatically set accordingly.
https://github.com/yahoo/CaffeOnSpark/blob/master/caffe-grid/src/main/scala/com/yahoo/ml/caffe/Config.scala#L386-L401

But NullPointerExeception is likely due to incorrect settings for either (or both) of them.

@anfeng
Copy link
Contributor

anfeng commented Sep 26, 2016

Please try out with our latest code, and update this issue accordingly.

@mriduljain
Copy link
Contributor

Closing this as there is no update. Please reopen as necessary

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants