Skip to content
This repository has been archived by the owner on Nov 16, 2019. It is now read-only.

Does CaffeOnSpark support multiple LMDB Files For Training/Testing #85

Open
stickFigure1235 opened this issue Jun 13, 2016 · 3 comments

Comments

@stickFigure1235
Copy link

Does CaffeOnSpark support multiple LMDB files or is there an alternative?

An error occurs when I try to do something like the following:

layer {
name: "data"
type: "MemoryData"
top: "data"
top: "dummyLabel1"
include {
phase: TRAIN
}
source_class: "com.yahoo.ml.caffe.LMDB"
memory_data_param {
source: "file:train_data_lmdb/"
batch_size: 16
channels: 1
height: 1
width: 1072
share_in_parallel: true
}
}
layer {
name: "label"
type: "MemoryData"
top: "label"
top: "dummyLabel2"
include {
phase: TRAIN
}
source_class: "com.yahoo.ml.caffe.LMDB"
memory_data_param {
source: "file:train_label_lmdb/"
batch_size: 16
channels: 1
height: 1
width: 1072
share_in_parallel: true
}
}
layer {
name: "data"
type: "MemoryData"
top: "data"
top: "dummyLabel1"
include {
phase: TEST
}
source_class: "com.yahoo.ml.caffe.LMDB"
memory_data_param {
source: "file:test_data_lmdb/"
batch_size: 16
channels: 1
height: 1
width: 1072
share_in_parallel: true
}
}
layer {
name: "label"
type: "MemoryData"
top: "label"
top: "dummyLabel2"
include {
phase: TEST
}
source_class: "com.yahoo.ml.caffe.LMDB"
memory_data_param {
source: "file:test_label_lmdb/"
batch_size: 16
channels: 1
height: 1
width: 1072
share_in_parallel: true
}
}

layer{
name: "silence_layer"
type: "Silence"
bottom: "dummyLabel1"
bottom: "dummyLabel2"
}

layer{
name: "zero"
type: "Eltwise"
bottom: "data"
bottom: "label"
top: "zero"
eltwise_param {
operation: SUM
coeff: 1
coeff: -1
}
}

I get the following error:

F0613 18:06:55.973876 14601 memory_data_layer.cpp:112] Check failed: data_ MemoryDataLayer needs to be initalized by calling Reset

Thanks.

@junshi15
Copy link
Collaborator

No multiple sources at this moment. This is a feature we plan to support in the future.

@stickFigure1235
Copy link
Author

Is HDF5 supported in CaffeOnSpark?

@junshi15
Copy link
Collaborator

junshi15 commented Jun 14, 2016

No, our focus is distributed file formats, such as Spark DataFrame, Hadoop SequenceFile, etc. In the future, better support of DataFrame will be our main development effort. Those single-node file formats, such as LMDB, HDF5 are not quite suitable in a distributed environment. I have seen people used a single LMDB file of size 1TB for training! That's very much the opposite of distributed storage.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants