Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train with VGG Dataset #103

Closed
bamos opened this issue Mar 7, 2016 · 24 comments
Closed

Train with VGG Dataset #103

bamos opened this issue Mar 7, 2016 · 24 comments
Labels

Comments

@bamos
Copy link
Collaborator

bamos commented Mar 7, 2016

\cc @melgor

Using only triplets with the current OpenFace code doesn't
give us LFW accuracy over 90%, which I'm surprised by since
the VGG dataset is much larger than the CASIA/FaceScrub dataset
that gives us ~93% LFW accuracy.

I'm going to try making the training procedure closer to the VGG paper
by adding a classification loss.

train-loss pdf
lfw-accuracy pdf

@stevenluzheng
Copy link

I have same issues as you have:

My procedure of pre-processing training data:
crop face by bounding box --> alignment face and scale to 256x256 --> ranking data (throw 5%,10% 20%,30% 50% data) --> randomly mirror and downsample data

I trained triploss/softmax my net based on pre-trained model(which is achieved to LFW 97.5%,youtubedb 91%)

Followed performance is my test result:
throw 0% 5% 10% 20% 30% 50%
Triploss training 95.5% 95.5% 94.5% 94.5% 94.5% 93.5%
softmax training(googlnet) 97.5% 97.0% 98% 96.5% 96.5% 95.5%

How you pre-process your training data? Do you clean your data ?

@melgor
Copy link
Contributor

melgor commented Mar 7, 2016

@bamos It is pretty strange that Average TripletLoss go down and LFW accuracy does not change. This looks like over-fitting to train data, but it is not possible (I assume you use one of nn4 models). Looks like our algorithm is not perfect or have some bugs. I have some bad filling about it. Whenever I try to fine-tune model using TripletLoss, I could not get better accuracy than pure softmax-loss.

@stevenluzheng Have you used VGG-DataSet, right? Your results are pretty much the same like in my test: Soft-Max wins. It is not consistent with FaceNet or Baidu paper. I need to revise the algorithm again based on these papers.

@bamos
Copy link
Collaborator Author

bamos commented Mar 7, 2016

Hi @stevenluzheng - I'm preprocessing data with an affine transformation that rotates, crops, and resizes faces to 96x96 around the landmarks. I'm also not doing any data cleaning, do you recommend any?

Hi @melgor - maybe I'll see better results with the original nn4 network instead of my modified one with less parameters. Trying now. The VGG network doesn't seem difficult to re-implement, I might also try using their architecture.

-Brandon.

@stevenluzheng
Copy link

@melgor HI:Megor, how you clean your data, I think VGG dataset is quite, so I am trying to combine casia and VGG dataset togther, I dont think our alogrithm is incorrect. cause I can improve from 93% LFW to 97.5 LFW , I think algorithm is not very wrong

@melgor
Copy link
Contributor

melgor commented Mar 7, 2016

@stevenluzheng I am not cleaning the data either.
So you can improve to 97.5% using TripletLoss, right? Could you say more about it? I mean:

  1. Input data dimmension
  2. Which Model definition (nn4 or vgg)?
  3. How long did you train your model?
  4. What do you mean by softmax? That you learn the model for the classification, right?
  5. Did you finetune your model by TripletLoss? Or learn from scratch?

Edit: One note, how many images have you dowload from VGG dataset? I am still downloading it but I can say that 10% of links are dead. Do you have same situation?

@bamos
Copy link
Collaborator Author

bamos commented Mar 7, 2016

@melgor and @stevenluzheng - when you fine-tune with triplet loss after classification, do you restrict the updates to just the last layer or do you update the entire network?

@melgor
Copy link
Contributor

melgor commented Mar 7, 2016

I was restricting updated for just last layer. All other layer have LR=0 and weight decay = 0

@stevenluzheng
Copy link

@melgor I am not cleaning the data either.
So you can improve to 97.5% using TripletLoss, right? Could you say more about it? I mean:

-Input data dimmension
A: 256x256x3
Which Model definition (nn4 or vgg)?
A: it is a private model it looks like VGG but not same one, if you are intersted I can send it to you ,but not paste here
How long did you train your model?
A:first I use 2-3 days to train my classify model and based on this model I use 12 hours to generate triplets and 1 day to train.
What do you mean by softmax? That you learn the model for the classification, right?
A:Yes, I use softmax layer to classify, but i dont use classify directly, I always extact feature and compare cosine or l2 distance for features
Did you finetune your model by TripletLoss? Or learn from scratch?
A: it is a finetune, I never achieve a good result from scratch
Edit: One note, how many images have you dowload from VGG dataset? I am still downloading it but I can say that 10% of links are dead. Do you have same situation?
I download 2M but I remove duplicated and use ranking program to rank simlarity, and remove pics with lowest similarity, unfortunately, some low similarity score pic is good semi-hardest samples

@stevenluzheng
Copy link

@bamos
@stevenluzheng - when you fine-tune with triplet loss after classification, do you restrict the updates to just the last layer or do you update the entire network?
I think it is entire network, why do you think only last layer?

@stevenluzheng
Copy link

@bamos and @melgor I think the reason we cannot achieve 99% is data deepth and breadth, VGG data provide deepth, but not breadth, CASIA provides breadth, but to deepth, I did a interesting experiment that might prove my opinion:
I use casia and VGG data to train VGG-16 net(classification)
CAISA: 92.5% LFW
VGG-16 94.3% LFW

But I mix VGG casia dataset (simply VGG+casia, no any remove duplication and error check) I get 14000 classes and 2.5M data and LFW performance is 95.9%

BR

@melgor
Copy link
Contributor

melgor commented Mar 9, 2016

@stevenluzheng
Thanks for your comments, there are really useful.
I want to clarify following things:

I use 12 hours to generate triplets and 1 day to train.

Did you use your code or openface? Because in OpenFace there is no stage "triplet generation" before learning.

Yes, I use softmax layer to classify, but i dont use classify directly, I always extact feature and compare cosine or l2 distance for features.

Could you clarify that? I do not get " I dont use classify directly". How it is connected with extracting features? Did you get the distance between samples and then have a softmax which force distance to be '1' (postiive pairs) or '0'(negative pairs) (similar to https://github.com/eladhoffer/TripletNet)

I am going now to work with 96x96 images. I will do some experiments with nn4 and maybe resnet.

@stevenluzheng
Copy link

@melgor

  1. All my code is based on caffe and triplet generation is a application which is used to generate triplet training set (it is filelist that marked with anchor/nagative/postive)by facenet semi-hard formular
  2. Sorry you can forget it, it is not related with our methodology, it is use only for our extract features for pics, I dont want to misunderstand you
  3. One question for Ludwiczuk, do you use caffe for triploss or use Torch? I read your many posts in googleforum I think you should have a triploss in caffe?

BR

@bamos
Copy link
Collaborator Author

bamos commented Mar 9, 2016

On fine-tuning:

@stevenluzheng - I think it is entire network, why do you think only last layer?

I've seen both done, for example paragraph 5 of section 4.4 of the VGG paper says they only fine-tune the last layer for triplet networks.

@melgor
Copy link
Contributor

melgor commented Mar 9, 2016

@stevenluzheng
I use Torch. Mainly because of limitation of Caffe and I would like to test other framework. I know that there is a TripletLoss in Caffe, but as you said, you have to generate the Triplets offline.
OpenFace generate Triplets online, which speed up training and is consistent with FaceNet paper. On the other hand, Oxford-Face use offline generation.

I do not know which idea is better: I suppose that online, because we choose semi-hard examples based on current network and not the network which was used as the base-model. Additional, based on benchmark[https://github.com/soumith/convnet-benchmarks/issues/90], torch is faster. So main thing about Torch is speed and better semi-hard examples.

Maybe you could send me the model definition and we could compare Torch vs Caffe.
Ex. I can learn VGG-16 on CASIa + fine-tunning using OpenFace
You can paste the model using https://gist.github.com/.

@melgor
Copy link
Contributor

melgor commented Mar 18, 2016

Any progress in using VGG? I have the same problem like you @bamos, the net do not converge. I think it is because of the a noise in data (if you would look to a example folder, sth like 30% is a noise). I was trying also learn classification task with no luck. Now I will do the same with DataBase without noise (they annotate each image if it is noise or not).
In the other hand, in paper they claim to achieve better results with this noisy DataBase.

@bamos
Copy link
Collaborator Author

bamos commented Mar 18, 2016

Hi @melgor - the training's going better with the improvements from 6fd4ac8. Here are the in-progress results from training nn4 using their full dataset over last night with all triplets. Highest LFW accuracy is .9308.

lfw-accuracy pdf
train-loss pdf

@shimen
Copy link

shimen commented Apr 10, 2016

Hi @bamos , is there any updates with training on VGG?

@bamos
Copy link
Collaborator Author

bamos commented Apr 10, 2016

Hi @shimen - I haven't tried many experiments, but I haven't seen great accuracy improvements using VGG over CASIA+FaceScrub with the current OpenFace models.

-Brandon.

@shimen
Copy link

shimen commented Apr 11, 2016

Hi @bamos, I will try to train a VGG network from scratch. I will post update for any results.
Have you saw my question about "Japanese and Chinese people verification test"?
https://groups.google.com/forum/#!topic/cmu-openface/eBqce8hm-bo
Can you please comment on this issue.

-Ilya

@myme5261314
Copy link

@stevenluzheng Could you send me a copy of your private model structure? I'm really interested in which key difference lead you to 97% on LFW while @bamos and @melgor haven't achieved, the input dimension or the model structure.

@shimen
Copy link

shimen commented Aug 25, 2016

Hi @bamos. I had trained the nn4.small2.v1 network architecture with CASIA+FaceScrub+VGG.
I had cleared the dirty VGG database before that.
I had 11523 identities with 1189783 face images.
I use the K80 for training so I set the parameters to these values:
-peoplePerBatch = 30
-imagesPerPerson = 30

Here are the results:
image

I had 3 test and I also add the AUC to the plots
LFW:
image

30 persons:
image

my facebook profile:
image

Here is a the results of the 130 iteration compared to the nn4.small2.v1 published model:
LFW:
nn4.small2.v1 published: avgACC, 0.9303 +/- 0.0138 (I had used a manual alignment for the images that failed, this explains the small difference in the results from yours )
130 iter: avgACC, 0.9527 +/- 0.0110, avgAUC, 0.9852 +/- 0.0063
lfw

30 persons:
nn4.small2.v1 published: avgACC, 0.8262 +/- 0.0130
130 iter: avgACC, 0.8750 +/- 0.0156, avgAUC, 0.9469 +/- 0.0089
30persons

my facebook profile:
nn4.small2.v1 published: avgACC, 0.8915 +/- 0.0101
130 iter: avgACC, 0.9397 +/- 0.0111,avgAUC, 0.9858 +/- 0.0034
myfacebookprofile

I had tried different setting for the parameters and what made the difference for me the most is increasing the imagesPerPerson parameter.

@AlexLocust
Copy link

@shimen thank you for your post and especially for sharing info about used parameters values!
I want to reproduce your results. Could you please clarify something:

  1. did you trained triplet loss from scratch, or used pre trained classification model?
  2. how did you cleared VGG database? Based on info provided with dataset or with some other technique?
  3. how did you preprocessed images?

@khryang
Copy link

khryang commented Mar 14, 2017

hi @bamos and @stevenluzheng.
thank you for your information of VGG face paper.
I'm trying to implementation of VGG face.
but I don't know exactly how can I imply the triplet loss at the last fully-connected layer.

Can I receive the code partially about triplet loss.

this is the code I got from someone's blog.
It is just about triplet-loss.
Please, could you give me any help??

triplet loss.zip

@stale
Copy link

stale bot commented Nov 18, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Nov 18, 2017
@stale stale bot closed this as completed Nov 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants