-
Notifications
You must be signed in to change notification settings - Fork 357
Conversation
CLA is valid! |
@@ -1,40 +1,41 @@ | |||
HOME ?=/home/${USER} | |||
ifeq ($(shell which spark-submit),) | |||
SPARK_HOME ?=/home/y/share/spark | |||
SPARK_HOME=~/spark-1.6.0-bin-hadoop2.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls undo this change.
else | ||
SPARK_HOME ?=$(shell which spark-submit 2>&1 | sed 's/\/bin\/spark-submit//g') | ||
endif | ||
CAFFE_ON_SPARK ?=$(shell pwd) | ||
LD_LIBRARY_PATH ?=/home/y/lib64:/home/y/lib64/mkl/intel64 | ||
LD_LIBRARY_PATH ?=/usr/local/cuda/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/home/y/lib64:/home/y/lib64/mkl/intel64:/usr/local/cuda/
LD_LIBRARY_PATH2=${LD_LIBRARY_PATH}:${CAFFE_ON_SPARK}/caffe-public/distribute/lib:${CAFFE_ON_SPARK}/caffe-distri/distribute/lib:/usr/lib64:/lib64 | ||
DYLD_LIBRARY_PATH ?=/home/y/lib64:/home/y/lib64/mkl/intel64 | ||
DYLD_LIBRARY_PATH2=${DYLD_LIBRARY_PATH}:${CAFFE_ON_SPARK}/caffe-public/distribute/lib:${CAFFE_ON_SPARK}/caffe-distri/distribute/lib:/usr/lib64:/lib64 | ||
DYLD_LIBRARY_PATH ?=/usr/local/cuda/lib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/home/y/lib64:/home/y/lib64/mkl/intel64:/usr/local/cuda/
|
||
export SPARK_VERSION=$(shell ${SPARK_HOME}/bin/spark-submit --version 2>&1 | grep version | awk '{print $$5}' | cut -d'.' -f1) | ||
ifeq (${SPARK_VERSION}, 2) | ||
export MVN_SPARK_FLAG=-Dspark2 | ||
endif | ||
|
||
build: | ||
build: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove extra space
cd caffe-distri; make clean; cd .. | ||
clean: | ||
pushd caffe-public; make clean; popd | ||
pushd caffe-distri; make clean; popd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why these changes? Let's avoid unwanted changes
@@ -0,0 +1,8 @@ | |||
from examples.coco.retrieval_experiment import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copyright
from pyspark.sql import DataFrame,SQLContext | ||
from ConversionUtil import getScalaSingleton, toPython | ||
|
||
class Conversions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file belong to a tools folder. Rename it "DFConversions"
from RegisterContext import registerContext | ||
from pyspark.sql import DataFrame,SQLContext | ||
|
||
class Vocab: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to a subfolder: tools/Vocab.py
@@ -0,0 +1,14 @@ | |||
net: "/Users/mridul/bigml/CaffeOnSpark/data/train_val.prototxt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
net: "CaffeOnSpark/data/image_train_val.prototxt"
@@ -0,0 +1,396 @@ | |||
name: "CaffeNet" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
image_train_val.prototxt?
} | ||
}.collect() | ||
|
||
/* test("CocoTest") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this large chunk of test code? Should it be uncommented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes...I went OOM on Travis, so doing a check ...will revert back (with less dataset) if this test is the culprit
@@ -0,0 +1,396 @@ | |||
name: "CaffeNet" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why "bvlc_reference"? We are using CoS memory layer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reference net is bvlc net
@@ -0,0 +1,14 @@ | |||
net: "CaffeOnSpark/data/bvlc_reference_net.prototxt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why bvlc_reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's the renamed solver/net file
@@ -36,6 +40,26 @@ def show_df(df, nrows=10): | |||
html += "</table>" | |||
return HTML(html) | |||
|
|||
def show_captions(df, nrows=10): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is very much caption specific. Let's move it to tools/DFConversions.py or tools/Utils.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will do shortly. Fixing the dataset for test right now, as the current dataframe with images leads to OOM on Travis
export DYLD_LIBRARY_PATH=${CAFFE_ON_SPARK}/caffe-public/distribute/lib:${CAFFE_ON_SPARK}/caffe-distri/distribute/lib | ||
export DYLD_LIBRARY_PATH=${DYLD_LIBRARY_PATH}:/usr/local/cuda/lib:/usr/local/mkl/lib/intel64/ | ||
export LD_LIBRARY_PATH=${DYLD_LIBRARY_PATH} | ||
export SPARK_HOME=/Users/mridul/bigml/spark-1.6.0-bin-hadoop2.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's refer to our GetStarted guides (https://github.com/yahoo/CaffeOnSpark/wiki/GetStarted_standalone_osx). We should inform our users to set up CAFFE_ON_SPARK and SPARK_HOME etc from our guides. Don't want to expose your own environments to our users.
Steps to run the COCO dataset for Image Captioning | ||
================================================== | ||
##### (1) Env setup | ||
export CAFFE_ON_SPARK=/Users/mridul/bigml/CaffeOnSpark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm...crept in by mistake
export PYSPARK_PYTHON=Python2.7.10/bin/python | ||
export PYTHONPATH=$PYTHONPATH:caffeonsparkpythonapi.zip:caffe_on_grid_archive/lib64:/usr/local/cuda-7.5/lib64 | ||
export LD_LIBRARY_PATH=Python2.7.10/lib:/usr/local/cuda/lib:caffe_on_grid_archive/lib64/mkl/intel64/:${LD_LIBRARY_PATH} | ||
export DYLD_LIBRARY_PATH=Python2.7.10/lib:/usr/local/cuda/lib:caffe_on_grid_archive/lib64/mkl/intel64/:${LD_LIBRARY_PATH} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have 2 sets of definition for LD_LIBRARY_PATH and DYLD_LIBRARY_PATH. Please consolidate.
pushd ${CAFFE_ON_SPARK}/data/ | ||
ln -s ~/Python2.7.10 Python2.7.10 | ||
unzip ${CAFFE_ON_SPARK}/caffe-grid/target/caffeonsparkpythonapi.zip | ||
cat /tmp/coco/parquet/vocab/part* > vocab.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not need the above command: cat /tmp/coco/parquet/vocab/part* > vocab.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inferencing code doesn't understand split vocab files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please adjust Vocab.save() method to create a single file. You could use coalesce(1, true)
export IPYTHON_OPTS="notebook --no-browser --ip=127.0.0.1" | ||
pushd ${CAFFE_ON_SPARK}/data/ | ||
ln -s ~/Python2.7.10 Python2.7.10 | ||
unzip ${CAFFE_ON_SPARK}/caffe-grid/target/caffeonsparkpythonapi.zip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove duplicated lines:
ln -s ~/Python2.7.10 Python2.7.10
unzip ${CAFFE_ON_SPARK}/caffe-grid/target/caffeonsparkpythonapi.zip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stage 6 and 7 are independent of each other. You either run 6 or 7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, we should call it (6), and tell user to do either action.
Please squash your commits. |
1205e32
to
7d907c0
Compare
Squashed all commit and accommodated all review comments + trimmed testcase. It should pass and be ready for merge |
LD_LIBRARY_PATH2=${LD_LIBRARY_PATH}:${CAFFE_ON_SPARK}/caffe-public/distribute/lib:${CAFFE_ON_SPARK}/caffe-distri/distribute/lib:/usr/lib64:/lib64 | ||
DYLD_LIBRARY_PATH ?=/home/y/lib64:/home/y/lib64/mkl/intel64 | ||
DYLD_LIBRARY_PATH2=${DYLD_LIBRARY_PATH}:${CAFFE_ON_SPARK}/caffe-public/distribute/lib:${CAFFE_ON_SPARK}/caffe-distri/distribute/lib:/usr/lib64:/lib64 | ||
DYLD_LIBRARY_PATH ?=/usr/local/cuda/lib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/home/y/lib64:/home/y/lib64/mkl/intel64:/usr/local/cuda/lib
@@ -4,6 +4,7 @@ | |||
import numpy as np | |||
from base64 import b64encode | |||
from google.protobuf import text_format | |||
import array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove the changes in this file
I am not sure what you mean. We move show_caption out of this so it changed On Friday, November 4, 2016, anfeng notifications@github.com wrote:
|
Initial Setup: https://github.com/yahoo/CaffeOnSpark/wiki/GetStarted_standalone | ||
export DYLD_LIBRARY_PATH=${CAFFE_ON_SPARK}/caffe-public/distribute/lib:${CAFFE_ON_SPARK}/caffe-distri/distribute/lib:/usr/local/cuda/lib:/usr/local/mkl/lib/intel64/:Python2.7.10/lib:/usr/local/cuda/lib:caffe_on_grid_archive/lib64/mkl/intel64/ | ||
export LD_LIBRARY_PATH=${DYLD_LIBRARY_PATH} | ||
export SPARK_HOME=/Users/mridul/bigml/spark-1.6.0-bin-hadoop2.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User should set up both CAFFE_ON_SPARK and SPARK_HOME per https://github.com/yahoo/CaffeOnSpark/wiki/GetStarted_standalone
export PYSPARK_PYTHON=Python2.7.10/bin/python | ||
export PYTHONPATH=$PYTHONPATH:caffeonsparkpythonapi.zip:caffe_on_grid_archive/lib64:/usr/local/cuda-7.5/lib64 | ||
export IPYTHON_ROOT=~/Python2.7.10 | ||
unset SPARK_CONF_DIR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this unset statement?
export IPYTHON_OPTS="notebook --no-browser --ip=127.0.0.1" | ||
pushd ${CAFFE_ON_SPARK}/data/ | ||
ln -s ~/Python2.7.10 Python2.7.10 | ||
unzip ${CAFFE_ON_SPARK}/caffe-grid/target/caffeonsparkpythonapi.zip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, we should call it (6), and tell user to do either action.
pushd ${CAFFE_ON_SPARK}/data/ | ||
ln -s ~/Python2.7.10 Python2.7.10 | ||
unzip ${CAFFE_ON_SPARK}/caffe-grid/target/caffeonsparkpythonapi.zip | ||
cat /tmp/coco/parquet/vocab/part* > vocab.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please adjust Vocab.save() method to create a single file. You could use coalesce(1, true)
|
||
def save(vocabFilePath: String): Unit = { | ||
synchronized { | ||
rdd_word.map(word => word.getString(0)).saveAsTextFile(vocabFilePath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coalesce(1, true).saveAsTextFile(vocabFilePath)
pushd ${CAFFE_ON_SPARK}/data/ | ||
ln -s ~/Python2.7.10 Python2.7.10 | ||
unzip ${CAFFE_ON_SPARK}/caffe-grid/target/caffeonsparkpythonapi.zip | ||
cat /tmp/coco/parquet/vocab/part* > vocab.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove this "cat" statenment with Vocab.scala change suggested.
--conf spark.task.cpus=${CORES_PER_WORKER} \ | ||
--conf spark.driver.extraLibraryPath="${DYLD_LIBRARY_PATH}:Python2.7.10/lib" \ | ||
--conf spark.executorEnv.LD_LIBRARY_PATH="${DYLD_LIBRARY_PATH}:Python2.7.10/lib" \ | ||
--conf spark.pythonargs="-model /tmp/coco/parquet/lrcn_coco.model -imagenet lstm_deploy.prototxt -lstmnet lrcn_word_to_preds.deploy.prototxt -vocab vocab.txt -input /tmp/coco/parquet/df_embedded_train2014 -output /tmp/coco/parquet/df_caption_results_train2014" examples/ImageCaption.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vocab.txt/part-00000
|
||
FileUtils.deleteQuietly(new File(cocoImageCaptionDF)) | ||
val df_image_caption = Conversions.Coco2ImageCaptionFile(sqlContext, cocoJson, 4) | ||
// val rdd_input_captions = inputDF2PairRDD(df_image_caption) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove unwanted statement
jar -xvf caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar META-INF/native/linux64/liblmdbjni.so | ||
mv META-INF/native/linux64/liblmdbjni.so ${CAFFE_ON_SPARK}/caffe-distri/distribute/lib | ||
${CAFFE_ON_SPARK}/scripts/setup-mnist.sh | ||
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH2}"; mvn ${MVN_SPARK_FLAG} -B test | ||
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH2}"; GLOG_minloglevel=1 mvn ${MVN_SPARK_FLAG} -B test | ||
cd ${CAFFE_ON_SPARK}/caffe-grid/src/main/python/; zip -r caffeonsparkpythonapi *; cd ${CAFFE_ON_SPARK}/caffe-public/python/; zip -ur ${CAFFE_ON_SPARK}/caffe-grid/src/main/python/caffeonsparkpythonapi.zip *; cd - ; mv caffeonsparkpythonapi.zip ${CAFFE_ON_SPARK}/caffe-grid/target/; cd ${CAFFE_ON_SPARK} | ||
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}; export SPARK_HOME=${SPARK_HOME};${CAFFE_ON_SPARK}/caffe-grid/src/test/python/PythonTest.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GLOG_minloglevel=1 ${CAFFE_ON_SPARK}/caffe-grid/src/test/python/PythonTest.sh
jar -xvf caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar META-INF/native/osx64/liblmdbjni.jnilib | ||
mv META-INF/native/osx64/liblmdbjni.jnilib ${CAFFE_ON_SPARK}/caffe-distri/distribute/lib | ||
${CAFFE_ON_SPARK}/scripts/setup-mnist.sh | ||
export LD_LIBRARY_PATH="${DYLD_LIBRARY_PATH2}"; mvn ${MVN_SPARK_FLAG} -B test | ||
export LD_LIBRARY_PATH="${DYLD_LIBRARY_PATH2}"; GLOG_minloglevel=1 mvn ${MVN_SPARK_FLAG} -B test | ||
cd ${CAFFE_ON_SPARK}/caffe-grid/src/main/python/; zip -r caffeonsparkpythonapi *; cd ${CAFFE_ON_SPARK}/caffe-public/python/; zip -ur ${CAFFE_ON_SPARK}/caffe-grid/src/main/python/caffeonsparkpythonapi.zip *; cd -; mv caffeonsparkpythonapi.zip ${CAFFE_ON_SPARK}/caffe-grid/target/; cd ${CAFFE_ON_SPARK} | ||
cd ${CAFFE_ON_SPARK}/caffe-grid/src/main/python/; zip -r caffeonsparkpythonapi *; mv caffeonsparkpythonapi.zip ${CAFFE_ON_SPARK}/caffe-grid/target/; cd ${CAFFE_ON_SPARK} | ||
export DYLD_LIBRARY_PATH=${DYLD_LIBRARY_PATH}; export SPARK_HOME=${SPARK_HOME};${CAFFE_ON_SPARK}/caffe-grid/src/test/python/PythonTest.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GLOG_minloglevel=1 ${CAFFE_ON_SPARK}/caffe-grid/src/test/python/PythonTest.sh
@@ -1,15 +1,14 @@ | |||
Steps to run the COCO dataset for Image Captioning | |||
================================================== | |||
##### (1) Env setup | |||
Initial Setup: https://github.com/yahoo/CaffeOnSpark/wiki/GetStarted_standalone | |||
Set up both CAFFE_ON_SPARK and SPARK_HOME per https://github.com/yahoo/CaffeOnSpark/wiki/GetStarted_standalone | |||
export DYLD_LIBRARY_PATH=${CAFFE_ON_SPARK}/caffe-public/distribute/lib:${CAFFE_ON_SPARK}/caffe-distri/distribute/lib:/usr/local/cuda/lib:/usr/local/mkl/lib/intel64/:Python2.7.10/lib:/usr/local/cuda/lib:caffe_on_grid_archive/lib64/mkl/intel64/ | |||
export LD_LIBRARY_PATH=${DYLD_LIBRARY_PATH} | |||
export SPARK_HOME=/Users/mridul/bigml/spark-1.6.0-bin-hadoop2.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove export SPARK_HOME=/Users/mridul/bigml/spark-1.6.0-bin-hadoop2.6
pyspark --master ${MASTER_URL} --deploy-mode client \ | ||
--conf spark.driver.extraLibraryPath="${DYLD_LIBRARY_PATH}:Python2.7.10/lib" \ | ||
--conf spark.executorEnv.LD_LIBRARY_PATH="${DYLD_LIBRARY_PATH}:Python2.7.10/lib" \ | ||
--files "${CAFFE_ON_SPARK}/data/lstm_deploy.prototxt,${CAFFE_ON_SPARK}/data/vocab.txt,${CAFFE_ON_SPARK}/data/lrcn_word_to_preds.deploy.prototxt,${CAFFE_ON_SPARK}/data/caffe/_caffe.so,${CAFFE_ON_SPARK}/data/bvlc_reference_net.prototxt,${CAFFE_ON_SPARK}/data/bvlc_reference_solver.prototxt,${CAFFE_ON_SPARK}/data/lrcn_cos.prototxt,${CAFFE_ON_SPARK}/data/lrcn_solver.prototxt" \ | ||
--files "${CAFFE_ON_SPARK}/data/lstm_deploy.prototxt,${CAFFE_ON_SPARK}/data/vocab.txt/part-00000,${CAFFE_ON_SPARK}/data/lrcn_word_to_preds.deploy.prototxt,${CAFFE_ON_SPARK}/data/caffe/_caffe.so,${CAFFE_ON_SPARK}/data/bvlc_reference_net.prototxt,${CAFFE_ON_SPARK}/data/bvlc_reference_solver.prototxt,${CAFFE_ON_SPARK}/data/lrcn_cos.prototxt,${CAFFE_ON_SPARK}/data/lrcn_solver.prototxt" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need to specify an alias for "${CAFFE_ON_SPARK}/data/vocab.txt/part-00000"? Otherwise, you will see a file named part-00000 in home dir.
+1 |
No description provided.