Does CaffeOnSpark support multiple LMDB Files For Training/Testing #85

stickFigure1235 · 2016-06-13T23:27:14Z

Does CaffeOnSpark support multiple LMDB files or is there an alternative?

An error occurs when I try to do something like the following:

layer {
name: "data"
type: "MemoryData"
top: "data"
top: "dummyLabel1"
include {
phase: TRAIN
}
source_class: "com.yahoo.ml.caffe.LMDB"
memory_data_param {
source: "file:train_data_lmdb/"
batch_size: 16
channels: 1
height: 1
width: 1072
share_in_parallel: true
}
}
layer {
name: "label"
type: "MemoryData"
top: "label"
top: "dummyLabel2"
include {
phase: TRAIN
}
source_class: "com.yahoo.ml.caffe.LMDB"
memory_data_param {
source: "file:train_label_lmdb/"
batch_size: 16
channels: 1
height: 1
width: 1072
share_in_parallel: true
}
}
layer {
name: "data"
type: "MemoryData"
top: "data"
top: "dummyLabel1"
include {
phase: TEST
}
source_class: "com.yahoo.ml.caffe.LMDB"
memory_data_param {
source: "file:test_data_lmdb/"
batch_size: 16
channels: 1
height: 1
width: 1072
share_in_parallel: true
}
}
layer {
name: "label"
type: "MemoryData"
top: "label"
top: "dummyLabel2"
include {
phase: TEST
}
source_class: "com.yahoo.ml.caffe.LMDB"
memory_data_param {
source: "file:test_label_lmdb/"
batch_size: 16
channels: 1
height: 1
width: 1072
share_in_parallel: true
}
}

layer{
name: "silence_layer"
type: "Silence"
bottom: "dummyLabel1"
bottom: "dummyLabel2"
}

layer{
name: "zero"
type: "Eltwise"
bottom: "data"
bottom: "label"
top: "zero"
eltwise_param {
operation: SUM
coeff: 1
coeff: -1
}
}

I get the following error:

F0613 18:06:55.973876 14601 memory_data_layer.cpp:112] Check failed: data_ MemoryDataLayer needs to be initalized by calling Reset

Thanks.

junshi15 · 2016-06-14T16:52:25Z

No multiple sources at this moment. This is a feature we plan to support in the future.

stickFigure1235 · 2016-06-14T17:08:40Z

Is HDF5 supported in CaffeOnSpark?

junshi15 · 2016-06-14T17:15:01Z

No, our focus is distributed file formats, such as Spark DataFrame, Hadoop SequenceFile, etc. In the future, better support of DataFrame will be our main development effort. Those single-node file formats, such as LMDB, HDF5 are not quite suitable in a distributed environment. I have seen people used a single LMDB file of size 1TB for training! That's very much the opposite of distributed storage.

heliumsun mentioned this issue Apr 6, 2017

Executor may hung when using multiple devices(GPU) #243

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does CaffeOnSpark support multiple LMDB Files For Training/Testing #85

Does CaffeOnSpark support multiple LMDB Files For Training/Testing #85

stickFigure1235 commented Jun 13, 2016

junshi15 commented Jun 14, 2016

stickFigure1235 commented Jun 14, 2016

junshi15 commented Jun 14, 2016 •

edited

Loading

Does CaffeOnSpark support multiple LMDB Files For Training/Testing #85

Does CaffeOnSpark support multiple LMDB Files For Training/Testing #85

Comments

stickFigure1235 commented Jun 13, 2016

junshi15 commented Jun 14, 2016

stickFigure1235 commented Jun 14, 2016

junshi15 commented Jun 14, 2016 • edited Loading

junshi15 commented Jun 14, 2016 •

edited

Loading