Skip to content
This repository has been archived by the owner on Nov 16, 2019. It is now read-only.

LSTM support in CaffeOnSpark #176

Merged
merged 8 commits into from
Nov 4, 2016
Merged

LSTM support in CaffeOnSpark #176

merged 8 commits into from
Nov 4, 2016

Conversation

mriduljain
Copy link
Contributor

No description provided.

@yahoocla
Copy link

yahoocla commented Nov 2, 2016

CLA is valid!

@@ -1,40 +1,41 @@
HOME ?=/home/${USER}
ifeq ($(shell which spark-submit),)
SPARK_HOME ?=/home/y/share/spark
SPARK_HOME=~/spark-1.6.0-bin-hadoop2.6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls undo this change.

else
SPARK_HOME ?=$(shell which spark-submit 2>&1 | sed 's/\/bin\/spark-submit//g')
endif
CAFFE_ON_SPARK ?=$(shell pwd)
LD_LIBRARY_PATH ?=/home/y/lib64:/home/y/lib64/mkl/intel64
LD_LIBRARY_PATH ?=/usr/local/cuda/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/home/y/lib64:/home/y/lib64/mkl/intel64:/usr/local/cuda/

LD_LIBRARY_PATH2=${LD_LIBRARY_PATH}:${CAFFE_ON_SPARK}/caffe-public/distribute/lib:${CAFFE_ON_SPARK}/caffe-distri/distribute/lib:/usr/lib64:/lib64
DYLD_LIBRARY_PATH ?=/home/y/lib64:/home/y/lib64/mkl/intel64
DYLD_LIBRARY_PATH2=${DYLD_LIBRARY_PATH}:${CAFFE_ON_SPARK}/caffe-public/distribute/lib:${CAFFE_ON_SPARK}/caffe-distri/distribute/lib:/usr/lib64:/lib64
DYLD_LIBRARY_PATH ?=/usr/local/cuda/lib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/home/y/lib64:/home/y/lib64/mkl/intel64:/usr/local/cuda/


export SPARK_VERSION=$(shell ${SPARK_HOME}/bin/spark-submit --version 2>&1 | grep version | awk '{print $$5}' | cut -d'.' -f1)
ifeq (${SPARK_VERSION}, 2)
export MVN_SPARK_FLAG=-Dspark2
endif

build:
build:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove extra space

cd caffe-distri; make clean; cd ..
clean:
pushd caffe-public; make clean; popd
pushd caffe-distri; make clean; popd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why these changes? Let's avoid unwanted changes

@@ -0,0 +1,8 @@
from examples.coco.retrieval_experiment import *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copyright

from pyspark.sql import DataFrame,SQLContext
from ConversionUtil import getScalaSingleton, toPython

class Conversions:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file belong to a tools folder. Rename it "DFConversions"

from RegisterContext import registerContext
from pyspark.sql import DataFrame,SQLContext

class Vocab:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to a subfolder: tools/Vocab.py

@@ -0,0 +1,14 @@
net: "/Users/mridul/bigml/CaffeOnSpark/data/train_val.prototxt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

net: "CaffeOnSpark/data/image_train_val.prototxt"

@@ -0,0 +1,396 @@
name: "CaffeNet"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image_train_val.prototxt?

}
}.collect()

/* test("CocoTest") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this large chunk of test code? Should it be uncommented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes...I went OOM on Travis, so doing a check ...will revert back (with less dataset) if this test is the culprit

@@ -0,0 +1,396 @@
name: "CaffeNet"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why "bvlc_reference"? We are using CoS memory layer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reference net is bvlc net

@@ -0,0 +1,14 @@
net: "CaffeOnSpark/data/bvlc_reference_net.prototxt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bvlc_reference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's the renamed solver/net file

@@ -36,6 +40,26 @@ def show_df(df, nrows=10):
html += "</table>"
return HTML(html)

def show_captions(df, nrows=10):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is very much caption specific. Let's move it to tools/DFConversions.py or tools/Utils.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will do shortly. Fixing the dataset for test right now, as the current dataframe with images leads to OOM on Travis

export DYLD_LIBRARY_PATH=${CAFFE_ON_SPARK}/caffe-public/distribute/lib:${CAFFE_ON_SPARK}/caffe-distri/distribute/lib
export DYLD_LIBRARY_PATH=${DYLD_LIBRARY_PATH}:/usr/local/cuda/lib:/usr/local/mkl/lib/intel64/
export LD_LIBRARY_PATH=${DYLD_LIBRARY_PATH}
export SPARK_HOME=/Users/mridul/bigml/spark-1.6.0-bin-hadoop2.6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's refer to our GetStarted guides (https://github.com/yahoo/CaffeOnSpark/wiki/GetStarted_standalone_osx). We should inform our users to set up CAFFE_ON_SPARK and SPARK_HOME etc from our guides. Don't want to expose your own environments to our users.

Steps to run the COCO dataset for Image Captioning
==================================================
##### (1) Env setup
export CAFFE_ON_SPARK=/Users/mridul/bigml/CaffeOnSpark
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm...crept in by mistake

export PYSPARK_PYTHON=Python2.7.10/bin/python
export PYTHONPATH=$PYTHONPATH:caffeonsparkpythonapi.zip:caffe_on_grid_archive/lib64:/usr/local/cuda-7.5/lib64
export LD_LIBRARY_PATH=Python2.7.10/lib:/usr/local/cuda/lib:caffe_on_grid_archive/lib64/mkl/intel64/:${LD_LIBRARY_PATH}
export DYLD_LIBRARY_PATH=Python2.7.10/lib:/usr/local/cuda/lib:caffe_on_grid_archive/lib64/mkl/intel64/:${LD_LIBRARY_PATH}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have 2 sets of definition for LD_LIBRARY_PATH and DYLD_LIBRARY_PATH. Please consolidate.

pushd ${CAFFE_ON_SPARK}/data/
ln -s ~/Python2.7.10 Python2.7.10
unzip ${CAFFE_ON_SPARK}/caffe-grid/target/caffeonsparkpythonapi.zip
cat /tmp/coco/parquet/vocab/part* > vocab.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not need the above command: cat /tmp/coco/parquet/vocab/part* > vocab.txt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inferencing code doesn't understand split vocab files

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please adjust Vocab.save() method to create a single file. You could use coalesce(1, true)

export IPYTHON_OPTS="notebook --no-browser --ip=127.0.0.1"
pushd ${CAFFE_ON_SPARK}/data/
ln -s ~/Python2.7.10 Python2.7.10
unzip ${CAFFE_ON_SPARK}/caffe-grid/target/caffeonsparkpythonapi.zip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove duplicated lines:

ln -s ~/Python2.7.10 Python2.7.10
unzip ${CAFFE_ON_SPARK}/caffe-grid/target/caffeonsparkpythonapi.zip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stage 6 and 7 are independent of each other. You either run 6 or 7

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, we should call it (6), and tell user to do either action.

@anfeng anfeng changed the title Lstm inference LSTM support in CaffeOnSpark Nov 3, 2016
@junshi15
Copy link
Collaborator

junshi15 commented Nov 3, 2016

Please squash your commits.

@mriduljain
Copy link
Contributor Author

Squashed all commit and accommodated all review comments + trimmed testcase. It should pass and be ready for merge

LD_LIBRARY_PATH2=${LD_LIBRARY_PATH}:${CAFFE_ON_SPARK}/caffe-public/distribute/lib:${CAFFE_ON_SPARK}/caffe-distri/distribute/lib:/usr/lib64:/lib64
DYLD_LIBRARY_PATH ?=/home/y/lib64:/home/y/lib64/mkl/intel64
DYLD_LIBRARY_PATH2=${DYLD_LIBRARY_PATH}:${CAFFE_ON_SPARK}/caffe-public/distribute/lib:${CAFFE_ON_SPARK}/caffe-distri/distribute/lib:/usr/lib64:/lib64
DYLD_LIBRARY_PATH ?=/usr/local/cuda/lib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/home/y/lib64:/home/y/lib64/mkl/intel64:/usr/local/cuda/lib

@@ -4,6 +4,7 @@
import numpy as np
from base64 import b64encode
from google.protobuf import text_format
import array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the changes in this file

@mriduljain
Copy link
Contributor Author

I am not sure what you mean. We move show_caption out of this so it changed

On Friday, November 4, 2016, anfeng notifications@github.com wrote:

@anfeng commented on this pull request.

In caffe-grid/src/main/python/com/yahoo/ml/caffe/DisplayUtils.py
#176 (review)
:

@@ -4,6 +4,7 @@
import numpy as np
from base64 import b64encode
from google.protobuf import text_format
+import array

please remove the changes in this file


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#176 (review),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACCTVYEHkHFZBeHtUkBEnU6mlAQbzNVLks5q62QzgaJpZM4KnlWU
.

Initial Setup: https://github.com/yahoo/CaffeOnSpark/wiki/GetStarted_standalone
export DYLD_LIBRARY_PATH=${CAFFE_ON_SPARK}/caffe-public/distribute/lib:${CAFFE_ON_SPARK}/caffe-distri/distribute/lib:/usr/local/cuda/lib:/usr/local/mkl/lib/intel64/:Python2.7.10/lib:/usr/local/cuda/lib:caffe_on_grid_archive/lib64/mkl/intel64/
export LD_LIBRARY_PATH=${DYLD_LIBRARY_PATH}
export SPARK_HOME=/Users/mridul/bigml/spark-1.6.0-bin-hadoop2.6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User should set up both CAFFE_ON_SPARK and SPARK_HOME per https://github.com/yahoo/CaffeOnSpark/wiki/GetStarted_standalone

export PYSPARK_PYTHON=Python2.7.10/bin/python
export PYTHONPATH=$PYTHONPATH:caffeonsparkpythonapi.zip:caffe_on_grid_archive/lib64:/usr/local/cuda-7.5/lib64
export IPYTHON_ROOT=~/Python2.7.10
unset SPARK_CONF_DIR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this unset statement?

export IPYTHON_OPTS="notebook --no-browser --ip=127.0.0.1"
pushd ${CAFFE_ON_SPARK}/data/
ln -s ~/Python2.7.10 Python2.7.10
unzip ${CAFFE_ON_SPARK}/caffe-grid/target/caffeonsparkpythonapi.zip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, we should call it (6), and tell user to do either action.

pushd ${CAFFE_ON_SPARK}/data/
ln -s ~/Python2.7.10 Python2.7.10
unzip ${CAFFE_ON_SPARK}/caffe-grid/target/caffeonsparkpythonapi.zip
cat /tmp/coco/parquet/vocab/part* > vocab.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please adjust Vocab.save() method to create a single file. You could use coalesce(1, true)


def save(vocabFilePath: String): Unit = {
synchronized {
rdd_word.map(word => word.getString(0)).saveAsTextFile(vocabFilePath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coalesce(1, true).saveAsTextFile(vocabFilePath)

pushd ${CAFFE_ON_SPARK}/data/
ln -s ~/Python2.7.10 Python2.7.10
unzip ${CAFFE_ON_SPARK}/caffe-grid/target/caffeonsparkpythonapi.zip
cat /tmp/coco/parquet/vocab/part* > vocab.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove this "cat" statenment with Vocab.scala change suggested.

--conf spark.task.cpus=${CORES_PER_WORKER} \
--conf spark.driver.extraLibraryPath="${DYLD_LIBRARY_PATH}:Python2.7.10/lib" \
--conf spark.executorEnv.LD_LIBRARY_PATH="${DYLD_LIBRARY_PATH}:Python2.7.10/lib" \
--conf spark.pythonargs="-model /tmp/coco/parquet/lrcn_coco.model -imagenet lstm_deploy.prototxt -lstmnet lrcn_word_to_preds.deploy.prototxt -vocab vocab.txt -input /tmp/coco/parquet/df_embedded_train2014 -output /tmp/coco/parquet/df_caption_results_train2014" examples/ImageCaption.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vocab.txt/part-00000


FileUtils.deleteQuietly(new File(cocoImageCaptionDF))
val df_image_caption = Conversions.Coco2ImageCaptionFile(sqlContext, cocoJson, 4)
// val rdd_input_captions = inputDF2PairRDD(df_image_caption)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove unwanted statement

jar -xvf caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar META-INF/native/linux64/liblmdbjni.so
mv META-INF/native/linux64/liblmdbjni.so ${CAFFE_ON_SPARK}/caffe-distri/distribute/lib
${CAFFE_ON_SPARK}/scripts/setup-mnist.sh
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH2}"; mvn ${MVN_SPARK_FLAG} -B test
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH2}"; GLOG_minloglevel=1 mvn ${MVN_SPARK_FLAG} -B test
cd ${CAFFE_ON_SPARK}/caffe-grid/src/main/python/; zip -r caffeonsparkpythonapi *; cd ${CAFFE_ON_SPARK}/caffe-public/python/; zip -ur ${CAFFE_ON_SPARK}/caffe-grid/src/main/python/caffeonsparkpythonapi.zip *; cd - ; mv caffeonsparkpythonapi.zip ${CAFFE_ON_SPARK}/caffe-grid/target/; cd ${CAFFE_ON_SPARK}
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}; export SPARK_HOME=${SPARK_HOME};${CAFFE_ON_SPARK}/caffe-grid/src/test/python/PythonTest.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GLOG_minloglevel=1 ${CAFFE_ON_SPARK}/caffe-grid/src/test/python/PythonTest.sh

jar -xvf caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar META-INF/native/osx64/liblmdbjni.jnilib
mv META-INF/native/osx64/liblmdbjni.jnilib ${CAFFE_ON_SPARK}/caffe-distri/distribute/lib
${CAFFE_ON_SPARK}/scripts/setup-mnist.sh
export LD_LIBRARY_PATH="${DYLD_LIBRARY_PATH2}"; mvn ${MVN_SPARK_FLAG} -B test
export LD_LIBRARY_PATH="${DYLD_LIBRARY_PATH2}"; GLOG_minloglevel=1 mvn ${MVN_SPARK_FLAG} -B test
cd ${CAFFE_ON_SPARK}/caffe-grid/src/main/python/; zip -r caffeonsparkpythonapi *; cd ${CAFFE_ON_SPARK}/caffe-public/python/; zip -ur ${CAFFE_ON_SPARK}/caffe-grid/src/main/python/caffeonsparkpythonapi.zip *; cd -; mv caffeonsparkpythonapi.zip ${CAFFE_ON_SPARK}/caffe-grid/target/; cd ${CAFFE_ON_SPARK}
cd ${CAFFE_ON_SPARK}/caffe-grid/src/main/python/; zip -r caffeonsparkpythonapi *; mv caffeonsparkpythonapi.zip ${CAFFE_ON_SPARK}/caffe-grid/target/; cd ${CAFFE_ON_SPARK}
export DYLD_LIBRARY_PATH=${DYLD_LIBRARY_PATH}; export SPARK_HOME=${SPARK_HOME};${CAFFE_ON_SPARK}/caffe-grid/src/test/python/PythonTest.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GLOG_minloglevel=1 ${CAFFE_ON_SPARK}/caffe-grid/src/test/python/PythonTest.sh

@@ -1,15 +1,14 @@
Steps to run the COCO dataset for Image Captioning
==================================================
##### (1) Env setup
Initial Setup: https://github.com/yahoo/CaffeOnSpark/wiki/GetStarted_standalone
Set up both CAFFE_ON_SPARK and SPARK_HOME per https://github.com/yahoo/CaffeOnSpark/wiki/GetStarted_standalone
export DYLD_LIBRARY_PATH=${CAFFE_ON_SPARK}/caffe-public/distribute/lib:${CAFFE_ON_SPARK}/caffe-distri/distribute/lib:/usr/local/cuda/lib:/usr/local/mkl/lib/intel64/:Python2.7.10/lib:/usr/local/cuda/lib:caffe_on_grid_archive/lib64/mkl/intel64/
export LD_LIBRARY_PATH=${DYLD_LIBRARY_PATH}
export SPARK_HOME=/Users/mridul/bigml/spark-1.6.0-bin-hadoop2.6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove export SPARK_HOME=/Users/mridul/bigml/spark-1.6.0-bin-hadoop2.6

pyspark --master ${MASTER_URL} --deploy-mode client \
--conf spark.driver.extraLibraryPath="${DYLD_LIBRARY_PATH}:Python2.7.10/lib" \
--conf spark.executorEnv.LD_LIBRARY_PATH="${DYLD_LIBRARY_PATH}:Python2.7.10/lib" \
--files "${CAFFE_ON_SPARK}/data/lstm_deploy.prototxt,${CAFFE_ON_SPARK}/data/vocab.txt,${CAFFE_ON_SPARK}/data/lrcn_word_to_preds.deploy.prototxt,${CAFFE_ON_SPARK}/data/caffe/_caffe.so,${CAFFE_ON_SPARK}/data/bvlc_reference_net.prototxt,${CAFFE_ON_SPARK}/data/bvlc_reference_solver.prototxt,${CAFFE_ON_SPARK}/data/lrcn_cos.prototxt,${CAFFE_ON_SPARK}/data/lrcn_solver.prototxt" \
--files "${CAFFE_ON_SPARK}/data/lstm_deploy.prototxt,${CAFFE_ON_SPARK}/data/vocab.txt/part-00000,${CAFFE_ON_SPARK}/data/lrcn_word_to_preds.deploy.prototxt,${CAFFE_ON_SPARK}/data/caffe/_caffe.so,${CAFFE_ON_SPARK}/data/bvlc_reference_net.prototxt,${CAFFE_ON_SPARK}/data/bvlc_reference_solver.prototxt,${CAFFE_ON_SPARK}/data/lrcn_cos.prototxt,${CAFFE_ON_SPARK}/data/lrcn_solver.prototxt" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to specify an alias for "${CAFFE_ON_SPARK}/data/vocab.txt/part-00000"? Otherwise, you will see a file named part-00000 in home dir.

@anfeng
Copy link
Contributor

anfeng commented Nov 4, 2016

+1

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants