Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to do regression? #512

Closed
xucong-zhang opened this issue Jun 17, 2014 · 57 comments
Closed

How to do regression? #512

xucong-zhang opened this issue Jun 17, 2014 · 57 comments

Comments

@xucong-zhang
Copy link

Hi,
I am trying to modify the mnist example to be a regression network. I just changed the loss layer from "SOFTMAX_LOSS" to "EUCLIDEAN_LOSS", and "num_output" of ip2 layer to be 1 instead of 10. But I got the result like this:

I0617 15:26:45.970600 10216 solver.cpp:141] Iteration 0, Testing net (#0)
I0617 15:26:47.555521 10216 solver.cpp:179] Test score #0: 1
I0617 15:26:47.555577 10216 solver.cpp:179] Test score #1: 0
I0617 15:26:51.046875 10216 solver.cpp:274] Iteration 100, lr = 0.00992565
I0617 15:26:51.047067 10216 solver.cpp:114] Iteration 100, loss = nan
I0617 15:26:54.535904 10216 solver.cpp:274] Iteration 200, lr = 0.00985258
I0617 15:26:54.536092 10216 solver.cpp:114] Iteration 200, loss = nan
I0617 15:26:58.024719 10216 solver.cpp:274] Iteration 300, lr = 0.00978075
I0617 15:26:58.024911 10216 solver.cpp:114] Iteration 300, loss = nan
I0617 15:27:01.514154 10216 solver.cpp:274] Iteration 400, lr = 0.00971013
I0617 15:27:01.514345 10216 solver.cpp:114] Iteration 400, loss = nan
I0617 15:27:05.003473 10216 solver.cpp:274] Iteration 500, lr = 0.00964069
I0617 15:27:05.003661 10216 solver.cpp:114] Iteration 500, loss = nan
I0617 15:27:05.003675 10216 solver.cpp:141] Iteration 500, Testing net (#0)
I0617 15:27:06.572185 10216 solver.cpp:179] Test score #0: 1
I0617 15:27:06.572234 10216 solver.cpp:179] Test score #1: nan
I0617 15:27:10.060245 10216 solver.cpp:274] Iteration 600, lr = 0.0095724
I0617 15:27:10.060436 10216 solver.cpp:114] Iteration 600, loss = nan

Can anyone help me with it, or could you give me a example to do regression with Caffe?

Thank you very much!

Updates:
Just in case anyone wants to work on multi-label regression problem, please refer to our project webpage:
https://www.mpi-inf.mpg.de/departments/computer-vision-and-multimodal-computing/research/gaze-based-human-computer-interaction/appearance-based-gaze-estimation-in-the-wild/
In the "Method" part, you will find my configuration file as well as the Matlab code convert .mat to .h5
P.S. Thanks to #1746, that is a time saver!

@Yangqing
Copy link
Member

As a heads-up, it does not really make sense to model classification as a regression problem: why is the distance between a "1" and "9" larger than that between "2" and "3", if we don't count in the semantic of digits?

This being said, your network probably suffers from large learning rates - decreasing it would eliminate the nan error, but again you probably won't get anything useful out of mnist regression.

@xucong-zhang
Copy link
Author

Hi, thank you for the comment!
Sorry I didn't mention that I already replaced the mnist data with my own regression dataset, so actually it makes sense:)
I decreased the learning rates and it works!

I didn't find any regression example yet (if you know some, please tell me!), so I will continue it and update my progress:)

Thank you very much!

@xucong-zhang
Copy link
Author

Hi,
I also can't find the suitable accuracy layer for the regression, so I just changed the "accuracy_layer.cpp". I know it is very lazy, but I afraid to got more error if I create my own costumer layer:)
I changed the loop in the cpp file to be:

for (int i = 0; i < num; ++i) {
    // Accuracy
    for (int j = 0; j < dim; ++j) { //xczhang change here!
      accuracy += sqrt((bottom_data[i * dim + j]-bottom_label[i * dim + j])*(bottom_data[i * dim + j]-bottom_label[i * dim + j]));
    }

    //Dtype prob = max(bottom_data[i * dim + static_cast<int>(bottom_label[i])],
    //                 Dtype(kLOG_THRESHOLD));
    //logprob -= log(prob);
  }

since I just need the accuracy, don't care the logprob, so just comment them.
Now my regression finally works!
Thank you for pointing out the learning rate problem@Yanqing!

@sguada
Copy link
Contributor

sguada commented Jun 20, 2014

I think using accuracy for regression is misleading. Instead of modifying Accuracy use patch #522 which will allow you to use EuclideanLoss during test to get the loss during Test in the same way is used during Train

Replace accuracy layer in test prototxt with loss layer with a top

layers: {
 name: 'loss'
 type: EUCLIDEAN_LOSS
 bottom: 'fc7'
 bottom: 'label'
 top: 'loss'
};

@xucong-zhang
Copy link
Author

Hi,
Thank you for the link about Accuracy layer!
My regression code is running well!

@Yangqing
Copy link
Member

Glad it works :)

@thuanvh
Copy link

thuanvh commented Jun 24, 2014

@XucongZhang Does your regression work with many labels? I am trying to do a regression with caffe. In my problem, label is not only one value, it is a vector of float.

@xucong-zhang
Copy link
Author

@thuanvh Hi, glad someone mention that! Yes, I also use multi label. I used the [https://github.com//pull/147]. The example how to generate the data is located in /caffe-dev/src/caffe/test/test_data generate_sample_data.py.
You will get a txt file to store all the h5 file, and the network will read them one by one, in case you have lot of data.
And also for the loading, here is how I did it:

name: "train"
layers {
  name: "MyData"
  type: HDF5_DATA
  top: "data"
  top: "label"
  hdf5_data_param {
    source: "../train_data_list.txt"
    batch_size: 128
  }
}

@thuanvh
Copy link

thuanvh commented Jun 25, 2014

Thank @XucongZhang , I am using your suggestion. And I will inform you the result later.

@thuanvh
Copy link

thuanvh commented Jun 25, 2014

Hi @XucongZhang I have problem in using HDF5 file.
I have 1500 image with size 60 x 60, and each image has 136 float values label. I create 3 .h5 files, each file contains 500 images and theirs samples.

Loading HDF5 files 0.h5
Successfully loaded 1 rows
Output data size: 500 1500 60 60
Top shape: 500 1500 60 60
Top shape: 500 68000 1 1

And error at the loss layer

euclidean_loss_layer.cpp: check failed: bottom[0]->channels() == bottom[1]->channels() (136 vs 68000).

Why the label dimension is 68000?

@xucong-zhang
Copy link
Author

Hi @thuanvh I think you made mistake on generate the .h5 file.
For your case, you have 3 files, and the image data should be "number_1_width_height", i.e. 500_1_60_60. And the output information should be "batchsize_1_width_height", But based on your output, you generated the input data as 500_1500_60_60;
For the label, it should be "number_labels", i.e. 500_136. But based on the output, you generated the label data as 500_68000, which also means "batch size_labels". I believe the 68000 comes from 500*136, you mixed up the number of samples and number of labels to be one dimension. The label should be two dimensions: number of sample * labels.
And it also shouldn't be "Successfully loaded 1 rows", it should be how many the samples.

For example, I have 1280 samples, each images is 36*60, the label is 3 float values, and the batch size is 128. I got this:

Loading HDF5 file/home/XXX/data.h5
Successully loaded 1280 rows
output data size: 128,1,60,36
Top shape: 128 1 60 36 (276480)
Top shape: 128 3 1 1 (384)

It is my experience and understanding, just for your reference.

@thuanvh
Copy link

thuanvh commented Jun 26, 2014

Hi @XucongZhang
Well, your suggestion is very detailed and helpful. Thank you.
Now I get your previous problem is that the loss is always NAN. I tried to changed base_lr in lenet_solver.prototxt from 0.01 to some tried smaller values. But no magic appears.
In your problem, did you changed base_lr in lenet_solver?

@thuanvh
Copy link

thuanvh commented Jun 26, 2014

I tried to add some log line in Euclidiean Loss Layer file, and the loss value is well calculated.
In file lenet_train.prototxt and lenet_test.prototxt, instead of using SOFTMAX_LOSS, I used EUCLIDEAN_LOSS.

layers {
name: "loss"
type: EUCLIDEAN_LOSS
bottom: "ip2"
bottom: "label"
#top: "loss"
}
About this layer, the network does not permit me to add top layer as @sguada mentioned.

@sguada
Copy link
Contributor

sguada commented Jun 26, 2014

That functionality requires #522 patch. So either wait until is integrated in dev or just apply the patch to your code.

Maybe is a problem in the weight or bias initialization, try bias=0.1

@sguada
Copy link
Contributor

sguada commented Jun 26, 2014

@XucongZhang if you are willing to do an example of regression and a PR we will be happy to integrate it in Caffe.

@thuanvh
Copy link

thuanvh commented Jun 26, 2014

I applied the patch to master. In training progress, the loss is still nan, I wrote a log line in Euclidien Loss for loggin. I found that, after the first test, the loss become nan. Here the log I get

I0626 15:43:11.720676 12519 euclidean_loss_layer.cpp:61] Euclid loss: 26.1575
I0626 15:43:11.895576 12519 euclidean_loss_layer.cpp:59] Euclid dot: 2606.79 bottom[0]->num() : 50
I0626 15:43:11.895771 12519 euclidean_loss_layer.cpp:61] Euclid loss: 26.0679
I0626 15:43:12.071288 12519 euclidean_loss_layer.cpp:59] Euclid dot: 2615.75 bottom[0]->num() : 50
I0626 15:43:12.071467 12519 euclidean_loss_layer.cpp:61] Euclid loss: 26.1575
I0626 15:43:12.246860 12519 euclidean_loss_layer.cpp:59] Euclid dot: 2606.79 bottom[0]->num() : 50
I0626 15:43:12.247030 12519 euclidean_loss_layer.cpp:61] Euclid loss: 26.0679
I0626 15:43:12.424530 12519 euclidean_loss_layer.cpp:59] Euclid dot: 2615.75 bottom[0]->num() : 50
I0626 15:43:12.424700 12519 euclidean_loss_layer.cpp:61] Euclid loss: 26.1575
I0626 15:43:12.598495 12519 euclidean_loss_layer.cpp:59] Euclid dot: 2606.79 bottom[0]->num() : 50
I0626 15:43:12.598675 12519 euclidean_loss_layer.cpp:61] Euclid loss: 26.0679
I0626 15:43:12.772830 12519 euclidean_loss_layer.cpp:59] Euclid dot: 2615.75 bottom[0]->num() : 50
I0626 15:43:12.773038 12519 euclidean_loss_layer.cpp:61] Euclid loss: 26.1575
I0626 15:43:12.773110 12519 solver.cpp:142] Test score #0: 26.1127
I0626 15:43:12.976600 12519 euclidean_loss_layer.cpp:59] Euclid dot: 2606.79 bottom[0]->num() : 50
I0626 15:43:12.976780 12519 euclidean_loss_layer.cpp:61] Euclid loss: 26.0679
I0626 15:43:13.613327 12519 euclidean_loss_layer.cpp:59] Euclid dot: 1.74088e+10 bottom[0]->num() : 50
I0626 15:43:13.613502 12519 euclidean_loss_layer.cpp:61] Euclid loss: 1.74088e+08
I0626 15:43:14.170456 12519 euclidean_loss_layer.cpp:59] Euclid dot: inf bottom[0]->num() : 50
I0626 15:43:14.170627 12519 euclidean_loss_layer.cpp:61] Euclid loss: inf
I0626 15:43:14.707566 12519 euclidean_loss_layer.cpp:59] Euclid dot: -nan bottom[0]->num() : 50
I0626 15:43:14.707723 12519 euclidean_loss_layer.cpp:61] Euclid loss: -nan
I0626 15:43:15.249769 12519 euclidean_loss_layer.cpp:59] Euclid dot: -nan bottom[0]->num() : 50
I0626 15:43:15.249995 12519 euclidean_loss_layer.cpp:61] Euclid loss: -nan
I0626 15:43:15.781705 12519 euclidean_loss_layer.cpp:59] Euclid dot: -nan bottom[0]->num() : 50
I0626 15:43:15.781893 12519 euclidean_loss_layer.cpp:61] Euclid loss: -nan

As you see, after the line
I0626 15:43:12.773110 12519 solver.cpp:142] Test score #0: 26.1127
the loss become nan, it's so crazy.
I tried many time, but after the first test, the loss is nan.

@xucong-zhang
Copy link
Author

@thuanvh For the train part, I got the actual number after decreasing the base_lr in the solver file. And for the loss, I just changed the loss layer of train like this:

layers {
  name: "loss"
  type: EUCLIDEAN_LOSS
  bottom: "ip2"
  bottom: "label"
}

And for the test, I deleted the prob layer, and also changed the accuracy layer like this:

layers {
  name: "accuracy
  type: ACCURACY
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
}

I also modified the source code in caffe-dev/src/caffe/layers/accuracy_layer.cpp to get the euclidean distance instead of different of class. Then it works.
To be honest, I don't know why there will be nan. I will also get it when I change the learn rate bigger.

@thuanvh
Copy link

thuanvh commented Jun 26, 2014

Why in the train part we not use output in layer loss, and in the test part we need its output (top) ?

@shelhamer
Copy link
Member

The loss is of course crucial, but it's unimportant apart from its role in
the top derivative; it's purely diagnostic to report it. In training, the
loss is forward computed and then backpropagation begins
accordingly–there's no need for the loss to be output at the top of the
network.

Look at the lenet_consolidated_solver.protoxt for an example of reporting
the train and validation losses.

On Thu, Jun 26, 2014 at 9:32 AM, thuanvh notifications@github.com wrote:

Why in the train part we not use output in layer loss, and in the test
part we need its output (top) ?


Reply to this email directly or view it on GitHub
#512 (comment).

@thuanvh
Copy link

thuanvh commented Jun 27, 2014

Hi, from LeNet Mnist example, I do step by step some little changes based on your suggestion about regression so that assure the loss is not nan. Now it is a normal number. I am running the training.

I think the problem that I get is the scale of input data image. After scale to [0, 1], loss is not nan.
Thank you all,

@bearpaw
Copy link

bearpaw commented Sep 24, 2014

Hi @thuanvh , would you please share your method to convert your dataset into HDF5 format? For example, I have a directory containing all the images, and I have a txt file containing labels for each image. How could I convert the data and the label into HDF5 file? Thank you.

@wiibrew
Copy link

wiibrew commented Oct 12, 2014

@XucongZhang Hi, I am using caffe for image regression, I saw your comments and got the data prepared. In my case, I need set the lr to be 1e-8 to make the nan disappear. But the program is not converging, I use image to do localization, any advice about that?

@xucong-zhang
Copy link
Author

@buaawelldon Hi, from my experience, you can also normalize your label and change initialization of the filter to avoid nan. I suspect the learning rate is so small that you can not learn something.
However, for my experiment, the filter also is not converging...

@wiibrew
Copy link

wiibrew commented Oct 13, 2014

Thanks, I will try your advice.

On Sun, Oct 12, 2014 at 6:11 PM, Xucong Zhang notifications@github.com
wrote:

@buaawelldon https://github.com/buaawelldon Hi, from my experience, you
can also normalize your label and change initialization of the filter to
avoid nan. I suspect the learning rate is so small that you can not learn
something.
However, for my experiment, the filter also is not converging...


Reply to this email directly or view it on GitHub
#512 (comment).

@chocolate9624
Copy link

@XucongZhang Can I ask you a question> I load the data in hdf5 format. I have 390 h5 files. The questions is the log file outputs

@chocolate9624
Copy link

It outputs " I1127 10:36:23.443383 37867 hdf5_data_layer.cpp:49] Successully loaded 128 rows
I1127 10:36:23.791043 37867 hdf5_data_layer.cpp:29] Loading HDF5 " all the time.

@xucong-zhang
Copy link
Author

Yes, it will outputs the operations. I know it will be pretty annoying, and
you can just comment out the corresponding code, where it already told you:
the 29 and 49 row of hdf5_data_layer.cpp. You can find the app file in
/src/caffe/layeers
Feel free to modify the code.

2014-11-27 3:41 GMT+01:00 chocolate9624 notifications@github.com:

It outputs " I1127 10:36:23.443383 37867 hdf5_data_layer.cpp:49]
Successully loaded 128 rows
I1127 10:36:23.791043 37867 hdf5_data_layer.cpp:29] Loading HDF5 " all the
time.


Reply to this email directly or view it on GitHub
#512 (comment).

@souzou
Copy link

souzou commented Mar 4, 2015

Hello,
I Used Euclidiean Loss Layer in the AlexNet, i just change the loss layer like this:
...
layers {
name: "losstest"
type: EUCLIDEAN_LOSS
bottom: "fc8"
bottom: "label"
top: "losstest"
include: { phase: TEST }
}
layers {
name: "loss"
type: EUCLIDEAN_LOSS
bottom: "fc8"
bottom: "label"
top: "loss"
}

the training step run correctly and it creates model file, but when i try to classify a new image in the model with python, the result is

F0304 12:30:04.102885 40208 layer.hpp:347] Check failed: ExactNumBottomBlobs() == bottom.size() (2 vs. 1) EUCLIDEAN_LOSS Layer takes 2 bottom blob(s) as input.
*** Check failure stack trace: ***
Abandon

Do you have any idea about the problem?
thx

@xucong-zhang
Copy link
Author

Hi,

From my point of view, first of all you don't need the Euclidean loss for
the test phase, instead of that you need the accuracy layer, which I
modified for my task.

And the error means you just pass one bottom blob but it seems not the
case. I am not sure about that part. Sorry.

2015-03-04 12:49 GMT+01:00 souzou notifications@github.com:

Hello,
I Used Euclidiean Loss Layer in the AlexNet, i just change the loss layer
like this:
...
layers {
name: "losstest"
type: EUCLIDEAN_LOSS
bottom: "fc8"
bottom: "label"
top: "losstest"
include: { phase: TEST }
}
layers {
name: "loss"
type: EUCLIDEAN_LOSS
bottom: "fc8"
bottom: "label"
top: "loss"
}

the trining run correctly and create model file, but when i try to
classify a new image in the model with python, the result is

F0304 12:30:04.102885 40208 layer.hpp:347] Check failed:
ExactNumBottomBlobs() == bottom.size() (2 vs. 1) EUCLIDEAN_LOSS Layer takes
2 bottom blob(s) as input.
*** Check failure stack trace: ***
Abandon

Do you have any idea about the problem?
thx


Reply to this email directly or view it on GitHub
#512 (comment).

@caterpillar77
Copy link

Hi souzou! I got the same error as you, have you figured out the problem? Thank you!

@sjtujulian
Copy link

@bearpaw Hello, I have the same problem with you. Do you solve this problem and can you share the answer with me? Thanks a lot!

@bearpaw
Copy link

bearpaw commented May 25, 2015

@sjtujulian Hi. I use HDF5 layer to handle multi-label data. Please refer to the official caffe HDF5 demo.

@sjtujulian
Copy link

@bearpaw Thank you very much!

@sjtujulian
Copy link

@XucongZhang Why the "num_output" of ip2 layer is 1 instead of 2? Because I think that the .h5 file contains (x,y) and it's like N*2.

@caterpillar77
Copy link

What's your problem? Can you describe it in detail?

caterpillarms@gmail.com

From: Julian
Date: 2015-07-10 16:49
To: BVLC/caffe
CC: caterpillar77
Subject: Re: [caffe] How to do regression? (#512)
@XucongZhang Why the "num_output" of ip2 layer is 1 instead of 2? Because I think that the .h5 file contains (x,y) and it's like N*2.

Reply to this email directly or view it on GitHub.

@arasharchor
Copy link

@XucongZhang Hi, Xoucong! I'd like to do regression using caffe for training a network which predicts optic flow magnitude map and optic flow direction map of an image (not two) using a multitask EuclideanLoss. the ground truth is a grayscale magnitude optic flow and 2D vector of optic flow. previously I did classification with caffe and declaring GT was easy, but in this case, I didn't get it how to put this ground truth into hdf5 file and basically the txt file it refers to.

To predict those maps, I modified AlexNet fully connected layers to fully convolutional and a in the end defined two EuclideanLosses, but it is wrong. could you let me know how to do regression in this case as you have already done it. and another question is basically how to specify the dimensions of output which e.g. for vector optic flow prediction, it has two channels. x and y.

########## I didn't get it how to use this layer and how to make those training data into list.txt
name: "train"
layers {
name: "MyData"
type: HDF5_DATA
top: "data"
top: "label"
hdf5_data_param {
source: "../list.txt"
batch_size: 128
}
#}
########################################train_val.prototxt is
name: "FCN-FlowRegressionCaffeNet"
layer {
name: "data"
type: "ImageData"
top: "data"
top: "magnitude_labels"
top: "vector_labels"
include {
phase: TRAIN
}
transform_param {
mirror: true
crop_size: 227
mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
}
image_data_param {
source: "data//train.txt"
batch_size: 50
new_height: 256
new_width: 256
}
}
layer {
name: "data"
type: "ImageData"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mirror: false
crop_size: 227
mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
}
image_data_param {
source: "data/test.txt"
batch_size: 50
new_height: 256
new_width: 256
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
}
}
......
......
......

layer {
name: "fc7-conv"
type: "Convolution"
bottom: "fc6-conv"
top: "fc7-conv"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 4096
kernel_size: 1
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7-conv"
top: "fc7-conv"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7-conv"
top: "fc7-conv"
dropout_param {
dropout_ratio: 0.5
}
}

layer {
name: "magnitude"
type: "Convolution"
bottom: "fc7"
top: "magnitude"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 1 #???
kernel_size:1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "vector"
type: "Convolution"
bottom: "fc7"
top: "vector"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 1 #####??
kernel_size:1
weight_filler {
type: "gaussian"
std: 0.001
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "accuracyMagnitude"
type: "Accuracy"
bottom: "magnitude"
bottom: "magnitude_labels"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "accuracyVector"
type: "Accuracy"
bottom: "vector"
bottom: "vector_labels"
top: "accuracy"
include {
phase: TEST
}
}

layer {
name: "loss_magnitude"
type: "EuclideanLoss #"type: "SoftmaxWithLoss"
bottom: "magnitude"
bottom: "magnitude_labels"
propagate_down: 1
propagate_down: 0
top: "loss_magnitude"
loss_weight: 1
}
layer {
name: "loss_vector"
type: "EuclideanLoss" #type: "SmoothL1Loss" Euclidean loss instead of softmaxwithloss and L1 loss
bottom: "vector"
bottom: "vector_labels"
top: "loss_vector"
loss_weight: 1
}

########################################## the deploy.prototxt:

name: "AlexNet"
input: "data"
input_shape {
dim: 10
dim: 3
dim: 227
dim: 227
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
....
....
....
layer {
name: "fc7-conv"
type: "Convolution"
bottom: "fc6-conv"
top: "fc7-conv"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_outpt: 4096
kernel_size: 1
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7-conv"
top: "fc7-conv"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7-conv"
top: "fc7-conv"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "magnitude"
type: "Convolution"
bottom: "fc7"
top: "magnitude"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 1 #???
kernel_size:1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "vector"
type: "Convolution"
bottom: "fc7"
top: "vector"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 1 #####??
kernel_size:1
weight_filler {
type: "gaussian"
std: 0.001
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "prob"
type: "Softmax" ????
bottom: "magnitude"
top: "prob"
}
layer {
name: "prob"
type: "Softmax" ????
bottom: "vector"
top: "prob"
}

########################################the solver.prototxt:
net: "models/FCN-Regression/train.prototxt"
test_iter: 10
test_interval: 100
base_lr: 0.01
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 20
max_iter: 450000
momentum: 0.9
weight_decay: 0.0005
snapshot: 1000
snapshot_prefix: "models/FCN-Regression/prediction"
solver_mode: GPU

Thanks a lot already.

@chriss2401
Copy link

@smajida I create multi-label hdf5 files through the MATLAB demo provided by caffe (just look at /caffe/matlab/hdf5creation). It basically takes your data and writes it in a .h5 in "chunks" (batch size). Just be careful because if your file is around 10GB caffe might complain that it is too big. You then have to divide your data in smaller .h5 files. Afterwards, your txt file will simply point to the files e.g.

train1.h5
train2.h5

and so on. I place my .h5 files in the same folder as my prototxt folder so it can easily find it.

By the way, you might be able to get some ideas from @XucongZhang 's train test prototxt file here:

https://www.mpi-inf.mpg.de/fileadmin/inf/d2/xucong/MPIIGaze/train_test.prototxt

I still don't exactly understand why we use two outputs at the last inner product layer (if we're regressing shouldn't it be one?).

@arasharchor
Copy link

@chriss2401 thanks a lot. basically I have 500 .png images with dimension (800,800,3)as training data with values 0 to 255 and labels are grayscale png images with dimension (800,800). so based on hdf5 demo, I changed the code for the beginning 4 images from training and 4 from labels, but store2hdf5 complains about the dimensions. here is there error:

batch no. 1
Error using store2hdf5 (line 15)
Number of samples should be matched between data and labels

Error in demo (line 49)
curr_dat_sz=store2hdf5(filename, batchdata, batchlabs, ~created_flag, startloc, chunksz);

the data_disk size is (500,500,3,4) and label_disk (500,500,4)
the input to store2hdf5 in the first iteration considering batch size equal to one:
size(batchdata)=> 500 500 3
size(batchlabs)=> 500 500

how should I solve this problem?

@chriss2401
Copy link

@smajida your data should be formated in the following way:

Training: 800 800 3 500 (rows,colums,channels,number - don't forget to permute since caffe and matlab have a different format)

Labels: 800 800 500

That way when the check at line 15 happens ( assert(lab_dims(end)==num_samples ) both arrays will output the same number (500)

@chriss2401
Copy link

By the way, when testing my model, my accuracy is incredibly high. Did anyone else get this ?

@arasharchor
Copy link

@chriss2401 I tried it now my data is exactly
training: 800 800 3 500
test : 800 800 500
it turned out I should've also changed the startloc as well.
I changed it in this way:(added 1 more dimension to 'lab')
startloc=struct('dat',[1,1,1,totalct+1], 'lab', [1,1,totalct+1]);
thanks

@chriss2401
Copy link

@smajida Sorry, reading the comments on the top of the file it says : label is D*N matrix of labels (D labels per sample)

So I'm not sure if your images for ground truth will work in this case. But maybe #1698 will help.

@arasharchor
Copy link

@chriss2401 that is interesting because I also had read it but surprisingly it worked. not only for my one channel label image, but also I concatenated optical flow (u,v) vector to it. thanks for your help.

Now I have a question. you mentioned permuting because the order in caffe and matlab differs. in this order which I have prepared, caffe doesn't work, does it? could you tell me in which order shall I prepare hdf5?

as for regression, the last layer's out was put equal to 1, when I put it equal to 2 it complained 500000 vs 25000 which by putting output to 1 this problem was solved, but I have another problem to get to the final regression training.
layer {
name: "upscore"
type: "Deconvolution"
bottom: "fc7-conv"
top: "upscore"
param {
lr_mult: 0
}
convolution_param {
num_output: 1
bias_term: false
kernel_size: 64
stride: 32
}
layer { type: 'Crop' name: 'score' top: 'score'
bottom: 'upscore' bottom: 'data' }

layer { type: 'EuclideanLoss' name: 'loss' top: 'loss'
bottom: 'score' bottom: 'label'
loss_param { normalize: false }
}
F0310 06:55:09.047936 43399 insert_splits.cpp:35] Unknown bottom blob 'data' (layer 'conv1_1', bottom index 0)
what does it mean here unkowen blob 'data'? data layer in hdf5 as I am sure it is correctly configured. here is my datalayer.

layer {
name: "data"
type: "HDF5Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
hdf5_data_param {
source: "list.txt"
batch_size: 1
}
}
and the first conv layer.
layer {
name: "conv1_1"
type: "Convolution"
bottom: "data"
top: "conv1_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 100
kernel_size: 3
engine: CAFFE
}
}

is that a problem of the order dimension by which I built hd5?
thanks for your advice! 💯

@chriss2401
Copy link

@smajida as you may know, when you imread an image on MATLAB the dimensions are HxWxC (height width channels). Caffe works with WxHxC. Therefore before using the hdf5 functions to write your data you should permute in the following way.

im = permute(im,[2 1 3]);

of course you have a fourth channel, but you get my idea. Once you flip the width and height you should be good to go. As for the regression problem I'm not 100 percent sure. @XucongZhang 's project has an accuracy layer but when I use it in my project (regression through RNN) it outputs some crazy values (like 20+). So I removed it as some other people mentioned in here and I'm looking at results. But I'm not sure how it will work because your GT size is different from your data (800 800 versus 800 800 3). Right now I'm using GT of 10x4 and data of 10x4x256x28x28 (scalar for each image).

@arasharchor
Copy link

@chriss2401 Oops, sorry when you were replying I was editing my previous post. I managed to get the net run and it was trained on four images ofcourse loss is in the scale of e+7. I hope now by using the real dataset and using pretrained models I will be able to get much less loss. as for the permutation. I got the gist. I can computer accuracy after training outside caffe. now I just want to make the net start to get trained, but it would be good to have a accuracy layer in this case. In my case I have an image with 500x500 as labels. I resized images to 500 unlike 800 to be able to use a similar semantic label model as finetuning as well.

how should I change the accuracy level to make it suitable for this case? any idea or suggestion? I know L2 norm, but I don't know how to access the blobs of labels and blobs of previous layer which is the last layer before loss.

thanks

@arasharchor
Copy link

@chriss2401 just a small question, shall I substract the mean image from images before making hd5 files. and in my case dividing the image by 255 before hdf5 making would be better?

  • I saw in the webpage of this network which I am modifying for regression for demo they did this:
    img = img.transpose((2,0,1))
    which means first channel then W and afterwards H. right? but you said only permuting W and H. what is the reason of this difference?
    thanks

@MohsenFayyaz89
Copy link

@smajida HDF5 layer doesn't have any preprocessings like others. You should do preprocessing before storing your data in HDF5 format.
The reason of permutation is because of the diffrence between matlab and c++ arrays computation.

@zimenglan-sysu-512
Copy link

@souzou have you ever solved your problem? if, can you share your solution? thanks.

@EastWoodGu
Copy link

EastWoodGu commented Sep 29, 2016

Hello,
I Used Euclidiean Loss Layer in the CaffeNet, i just change the last layer to loss layer like this:
...
layers {
name: "loss"
type: EUCLIDEAN_LOSS
bottom: "fc8"
bottom: "label"
top: "loss"
}

the training step run correctly and it creates model file,when I use this model to classify the images for test,the result is differ from the value of label,could you give me some advice,think you very much.

@BangpengGao
Copy link

BangpengGao commented Dec 4, 2016

@XucongZhang ,if the code udate?I cant find the loop that you change in accuracy_layer.cpp.May you tell me the line?And if my data has 128D features and label is a float number,it isnt a picture,the number of data is 10w,How I should setup the params?Hope that you help me,thanks!

@srv902
Copy link

srv902 commented Feb 8, 2017

I have given the source inside hdf5_data_param which contains the filenames in .h5 format. My questions are

  1. The files inside .h5 are under "xtrain1" key. So will it taken automatically or do I have to feed it manually?

  2. Since the task is of regression, how should I specify the training output which is an image? I believe this needs to be done in the txt file under source.

@arasharchor
Copy link

arasharchor commented Feb 8, 2017

@MohsenFayyaz89 Thanks it has been a while that I have migrated to python layers and I am not using any more HDF5 layers.

@TT-AIR
Copy link

TT-AIR commented May 12, 2017

@XucongZhang hello,I use caffe to estimate depth from a single image, which is a regression problem. However, when I use the trained model to predict, the model can just output the same with any input. Hope you help me, thanks!

@sreevasu
Copy link

HI All,

I am new to FCN, how can i use both SoftmaxWithLoss and SmoothL1loss layers to fcn code. Please any one help me on this.

Thanking you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests