Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use the pretrained JSTL+DGD model for person re-identification? #14

Open
benstaf opened this issue Nov 8, 2016 · 6 comments
Open

Comments

@benstaf
Copy link

benstaf commented Nov 8, 2016

I don't understand how to do person re-identification with the pretrained JSTL+DGD model found here: https://drive.google.com/open?id=0B67_d0rLRTQYZnB5ZUZpdTlxM0k

I have two problems, one related to input, one related to output :

  1. In person re-identification, we input two different pictures, and we ask the model if they depict the same person or not.

But here, in the file 'jstl_dgd_deploy_inference.prototxt', the input data is (1,3,144,56) and not, for example, (2,3,144,56).

  1. In the file 'jstl_dgd_deploy_inference.prototxt', I don't see the output layer, it should be a binary softmax, with output '1' if the two photos represent the same person, and '0' if the persons are different.

Moreover, when loading the caffemodel weights, I receive the warnings:

I1108 08:48:31.324759 1525 net.cpp:752] Ignoring source layer relu7
I1108 08:48:31.324795 1525 net.cpp:752] Ignoring source layer drop7
I1108 08:48:31.324802 1525 net.cpp:752] Ignoring source layer fc8_jstl

This suggests that something is missing in the prototxt file.

@Cysu
Copy link
Owner

Cysu commented Nov 8, 2016

Our model does not directly produces the binary verification result of a pair of people. During test stage, we first go through all the images, and extract their features using our net. Then we compute the pairwise Euclidean distances between query and gallery people. At last, for each query, we just rank the gallery samples based on their distances.

If you just wish to do the verification, you can choose a distance threshold that balances the true positive rate and false positive rate.

@benstaf
Copy link
Author

benstaf commented Nov 12, 2016

I tried to follow your suggestions, but my result is not convincing. I made some experiments with the PRID dataset.

In the multi shot case, I choose 2 pictures of persons 4 and 9, taken with cameras A and B (8 pictures in total).

We should get a large distance between pictures of different persons, and a small distance between pictures of the same person, but this is not the case. Why?

Some results are here (for example, a4_1.png is picture number 1 of person 4 by camera A):

distance between a4_1.png and a9_1.png: 6.65493
distance between a4_1.png and a4_34.png: 6.5565
distance between a4_1.png and a9_28.png: 4.84618
distance between a4_1.png and b4_1.png: 7.06474
distance between a4_1.png and b9_1.png: 8.09637
distance between a4_1.png and b4_34.png: 5.71222
distance between a4_1.png and b9_28.png: 5.91796
distance between b9_1.png and a4_34.png: 9.21853
distance between b9_1.png and a9_28.png: 7.02944
distance between b9_1.png and b4_1.png: 4.23969
distance between b9_1.png and b9_28.png: 5.4921

Cropped images are here (cropped to shape (56,144) for input in the neural network):
https://drive.google.com/drive/folders/0B86WKpvkt66BeVp4UGgxUlhzZG8?usp=sharing

Code (additional code here:
https://drive.google.com/drive/folders/0B86WKpvkt66BcDhxUW14bUsxd1k?usp=sharing
)

`from jstl_inference import JSTL # jstl_inference.py is the TensorFlow version of the file jstl_dgd_deploy_inference.prototxt , made with Caffe-Tensorflow #https://github.com/ethereon/caffe-tensorflow
#see code here: https://drive.google.com/drive/folders/0B86WKpvkt66BcDhxUW14bUsxd1k?usp=sharing

import tensorflow as tf
from scipy.misc import imread
import numpy as np
from PIL import Image, ImageOps

#Preparation of the feature extractor

x = tf.placeholder(tf.float32, shape=[1, 144, 56, 3])
y = tf.placeholder(tf.float32, shape=[1, 256])

net = JSTL({'data': x})
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
net.load('jstl_inference.npy', sess) #jstl_inference.npy is the Numpy version of jstl_dgd_inference.caffemodel , obtained with Caffe-Tensorflow https://github.com/ethereon/caffe-tensorflow
#file here: https://drive.google.com/drive/folders/0B86WKpvkt66BcDhxUW14bUsxd1k?usp=sharing

person_feature = sess.graph.get_tensor_by_name("fc7/fc7:0") #gets the output from the layer FC7

def extract_vector(image_data):
img = imread(image_data)

img=Image.fromarray(img)
img=ImageOps.fit(img, size=(56,144), method=Image.ANTIALIAS) # resize (by maintaining the aspect ratio) and crops the input image


img=np.asarray(img)
img = np.reshape(img, (1, 144, 56, 3))


feed= {x: img}
person_vector = sess.run(person_feature, feed_dict=feed)
return person_vector[0]

def distance_pics(photo1,photo2):
person1=extract_vector(photo1)
person2=extract_vector(photo2)
dist = np.linalg.norm(person1-person2)
print( 'distance between ' + photo1 + ' and ' + photo2 + ': '+ str(dist))

#Results

distance_pics('a4_1.png','a9_1.png')
distance_pics('a4_1.png','a4_34.png')
distance_pics('a4_1.png','a9_28.png')

distance_pics('a4_1.png','b4_1.png')
distance_pics('a4_1.png','b9_1.png')
distance_pics('a4_1.png','b4_34.png')
distance_pics('a4_1.png','b9_28.png')

distance_pics('b9_1.png','a4_34.png')
distance_pics('b9_1.png','a9_28.png')
distance_pics('b9_1.png','b4_1.png')
distance_pics('b9_1.png','b9_28.png')`

@Cysu
Copy link
Owner

Cysu commented Nov 13, 2016

I guess there might be some mismatch between the image preprocessing methods we used.

When training the model, we use opencv to read the image, and subtract the mean pixel values. The input data to the CNN should be a 1x3x144x56 image, whose color channels are in BGR order, and are demeaned by [102, 102, 101].

Thanks for providing the script. I will verify this after the cvpr deadline.

@benstaf
Copy link
Author

benstaf commented Nov 16, 2016

I revised my image pre-processing, but the result does not improve. My result is:

distance between a4_1.png and a9_1.png: 6.59645
distance between a4_1.png and a4_34.png: 7.80466
distance between a4_1.png and a9_28.png: 6.67408
distance between a4_1.png and b4_1.png: 11.086
distance between a4_1.png and b9_1.png: 10.6859
distance between a4_1.png and b4_34.png: 12.731
distance between a4_1.png and b9_28.png: 13.6327
distance between b9_1.png and a4_34.png: 9.13998
distance between b9_1.png and a9_28.png: 12.1658
distance between b9_1.png and b4_1.png: 5.44103
distance between b9_1.png and b9_28.png: 7.77282

My code is:

`from jstl_inference import JSTL # the output python script of caffe2tensorflow
import tensorflow as tf

import numpy as np

import cv2

x = tf.placeholder(tf.float32, shape=[1,144, 56, 3])
y = tf.placeholder(tf.float32, shape=[1, 256])

net = JSTL({'data': x})
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
net.load('jstl_inference.npy', sess)

person_feature = sess.graph.get_tensor_by_name("fc7/fc7:0")

def preprocess(image):
img=cv2.imread(image)
shape=img.shape
ratio=float(144)/float(shape[0])
dim=(int(shape[1]*ratio), 144)
resized = cv2.resize(img, dim, interpolation = cv2.INTER_AREA)

#Crop on both sides
margin=dim[0]-56
if margin % 2==0:
cropped=resized[:,margin/2:dim[0]-margin/2]
else:
cropped=resized[:,margin/2:dim[0]+1-margin/2]

cv2.imwrite('cropped_' + image, cropped)`

`# subtract the mean pixel values
centered_array=cropped-np.array([102,102,101]) #demean by [102, 102, 101].

return centered_array

def extract_vector(image):
centered_array=preprocess(image)

input_array = np.reshape(centered_array, (1,144, 56,3))

feed= {x: input_array}
person_vector = sess.run(person_feature, feed_dict=feed)
return person_vector[0]

def distance_pics(photo1,photo2):
person1=extract_vector(photo1)
person2=extract_vector(photo2)
dist = np.linalg.norm(person1-person2)
print( 'distance between ' + photo1 + ' and ' + photo2 + ': '+ str(dist))

#Results:

distance_pics('a4_1.png','a9_1.png')
distance_pics('a4_1.png','a4_34.png')
distance_pics('a4_1.png','a9_28.png')

distance_pics('a4_1.png','b4_1.png')
distance_pics('a4_1.png','b9_1.png')
distance_pics('a4_1.png','b4_34.png')
distance_pics('a4_1.png','b9_28.png')

distance_pics('b9_1.png','a4_34.png')
distance_pics('b9_1.png','a9_28.png')
distance_pics('b9_1.png','b4_1.png')
distance_pics('b9_1.png','b9_28.png')`

@kaidic
Copy link

kaidic commented Dec 27, 2016

I've encountered the same problem. It seems that the feature layer outputs I got from using tensorflow and caffe are different.

@soulslicer
Copy link

Which prototxt in the code was used to train model for jstl_dgd_inference.caffemodel? I can't seem to find it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants