How to use the pretrained JSTL+DGD model for person re-identification? #14

benstaf · 2016-11-08T09:41:16Z

I don't understand how to do person re-identification with the pretrained JSTL+DGD model found here: https://drive.google.com/open?id=0B67_d0rLRTQYZnB5ZUZpdTlxM0k

I have two problems, one related to input, one related to output :

In person re-identification, we input two different pictures, and we ask the model if they depict the same person or not.

But here, in the file 'jstl_dgd_deploy_inference.prototxt', the input data is (1,3,144,56) and not, for example, (2,3,144,56).

In the file 'jstl_dgd_deploy_inference.prototxt', I don't see the output layer, it should be a binary softmax, with output '1' if the two photos represent the same person, and '0' if the persons are different.

Moreover, when loading the caffemodel weights, I receive the warnings:

I1108 08:48:31.324759 1525 net.cpp:752] Ignoring source layer relu7
I1108 08:48:31.324795 1525 net.cpp:752] Ignoring source layer drop7
I1108 08:48:31.324802 1525 net.cpp:752] Ignoring source layer fc8_jstl

This suggests that something is missing in the prototxt file.

Cysu · 2016-11-08T10:53:51Z

Our model does not directly produces the binary verification result of a pair of people. During test stage, we first go through all the images, and extract their features using our net. Then we compute the pairwise Euclidean distances between query and gallery people. At last, for each query, we just rank the gallery samples based on their distances.

If you just wish to do the verification, you can choose a distance threshold that balances the true positive rate and false positive rate.

benstaf · 2016-11-12T23:23:14Z

I tried to follow your suggestions, but my result is not convincing. I made some experiments with the PRID dataset.

In the multi shot case, I choose 2 pictures of persons 4 and 9, taken with cameras A and B (8 pictures in total).

We should get a large distance between pictures of different persons, and a small distance between pictures of the same person, but this is not the case. Why?

Some results are here (for example, a4_1.png is picture number 1 of person 4 by camera A):

distance between a4_1.png and a9_1.png: 6.65493
distance between a4_1.png and a4_34.png: 6.5565
distance between a4_1.png and a9_28.png: 4.84618
distance between a4_1.png and b4_1.png: 7.06474
distance between a4_1.png and b9_1.png: 8.09637
distance between a4_1.png and b4_34.png: 5.71222
distance between a4_1.png and b9_28.png: 5.91796
distance between b9_1.png and a4_34.png: 9.21853
distance between b9_1.png and a9_28.png: 7.02944
distance between b9_1.png and b4_1.png: 4.23969
distance between b9_1.png and b9_28.png: 5.4921

Cropped images are here (cropped to shape (56,144) for input in the neural network):
https://drive.google.com/drive/folders/0B86WKpvkt66BeVp4UGgxUlhzZG8?usp=sharing

Code (additional code here:
https://drive.google.com/drive/folders/0B86WKpvkt66BcDhxUW14bUsxd1k?usp=sharing
)

`from jstl_inference import JSTL # jstl_inference.py is the TensorFlow version of the file jstl_dgd_deploy_inference.prototxt , made with Caffe-Tensorflow #https://github.com/ethereon/caffe-tensorflow
#see code here: https://drive.google.com/drive/folders/0B86WKpvkt66BcDhxUW14bUsxd1k?usp=sharing

import tensorflow as tf
from scipy.misc import imread
import numpy as np
from PIL import Image, ImageOps

#Preparation of the feature extractor

x = tf.placeholder(tf.float32, shape=[1, 144, 56, 3])
y = tf.placeholder(tf.float32, shape=[1, 256])

net = JSTL({'data': x})
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
net.load('jstl_inference.npy', sess) #jstl_inference.npy is the Numpy version of jstl_dgd_inference.caffemodel , obtained with Caffe-Tensorflow https://github.com/ethereon/caffe-tensorflow
#file here: https://drive.google.com/drive/folders/0B86WKpvkt66BcDhxUW14bUsxd1k?usp=sharing

person_feature = sess.graph.get_tensor_by_name("fc7/fc7:0") #gets the output from the layer FC7

def extract_vector(image_data):
img = imread(image_data)

img=Image.fromarray(img)
img=ImageOps.fit(img, size=(56,144), method=Image.ANTIALIAS) # resize (by maintaining the aspect ratio) and crops the input image


img=np.asarray(img)
img = np.reshape(img, (1, 144, 56, 3))


feed= {x: img}
person_vector = sess.run(person_feature, feed_dict=feed)
return person_vector[0]

def distance_pics(photo1,photo2):
person1=extract_vector(photo1)
person2=extract_vector(photo2)
dist = np.linalg.norm(person1-person2)
print( 'distance between ' + photo1 + ' and ' + photo2 + ': '+ str(dist))

#Results

distance_pics('a4_1.png','a9_1.png')
distance_pics('a4_1.png','a4_34.png')
distance_pics('a4_1.png','a9_28.png')

distance_pics('a4_1.png','b4_1.png')
distance_pics('a4_1.png','b9_1.png')
distance_pics('a4_1.png','b4_34.png')
distance_pics('a4_1.png','b9_28.png')

distance_pics('b9_1.png','a4_34.png')
distance_pics('b9_1.png','a9_28.png')
distance_pics('b9_1.png','b4_1.png')
distance_pics('b9_1.png','b9_28.png')`

Cysu · 2016-11-13T03:09:14Z

I guess there might be some mismatch between the image preprocessing methods we used.

When training the model, we use opencv to read the image, and subtract the mean pixel values. The input data to the CNN should be a 1x3x144x56 image, whose color channels are in BGR order, and are demeaned by [102, 102, 101].

Thanks for providing the script. I will verify this after the cvpr deadline.

benstaf · 2016-11-16T08:22:41Z

I revised my image pre-processing, but the result does not improve. My result is:

distance between a4_1.png and a9_1.png: 6.59645
distance between a4_1.png and a4_34.png: 7.80466
distance between a4_1.png and a9_28.png: 6.67408
distance between a4_1.png and b4_1.png: 11.086
distance between a4_1.png and b9_1.png: 10.6859
distance between a4_1.png and b4_34.png: 12.731
distance between a4_1.png and b9_28.png: 13.6327
distance between b9_1.png and a4_34.png: 9.13998
distance between b9_1.png and a9_28.png: 12.1658
distance between b9_1.png and b4_1.png: 5.44103
distance between b9_1.png and b9_28.png: 7.77282

My code is:

`from jstl_inference import JSTL # the output python script of caffe2tensorflow
import tensorflow as tf

import numpy as np

import cv2

x = tf.placeholder(tf.float32, shape=[1,144, 56, 3])
y = tf.placeholder(tf.float32, shape=[1, 256])

net = JSTL({'data': x})
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
net.load('jstl_inference.npy', sess)

person_feature = sess.graph.get_tensor_by_name("fc7/fc7:0")

def preprocess(image):
img=cv2.imread(image)
shape=img.shape
ratio=float(144)/float(shape[0])
dim=(int(shape[1]*ratio), 144)
resized = cv2.resize(img, dim, interpolation = cv2.INTER_AREA)

#Crop on both sides
margin=dim[0]-56
if margin % 2==0:
cropped=resized[:,margin/2:dim[0]-margin/2]
else:
cropped=resized[:,margin/2:dim[0]+1-margin/2]

cv2.imwrite('cropped_' + image, cropped)`

`# subtract the mean pixel values
centered_array=cropped-np.array([102,102,101]) #demean by [102, 102, 101].

return centered_array

def extract_vector(image):
centered_array=preprocess(image)

input_array = np.reshape(centered_array, (1,144, 56,3))

feed= {x: input_array}
person_vector = sess.run(person_feature, feed_dict=feed)
return person_vector[0]

def distance_pics(photo1,photo2):
person1=extract_vector(photo1)
person2=extract_vector(photo2)
dist = np.linalg.norm(person1-person2)
print( 'distance between ' + photo1 + ' and ' + photo2 + ': '+ str(dist))

#Results:

distance_pics('a4_1.png','a9_1.png')
distance_pics('a4_1.png','a4_34.png')
distance_pics('a4_1.png','a9_28.png')

distance_pics('a4_1.png','b4_1.png')
distance_pics('a4_1.png','b9_1.png')
distance_pics('a4_1.png','b4_34.png')
distance_pics('a4_1.png','b9_28.png')

distance_pics('b9_1.png','a4_34.png')
distance_pics('b9_1.png','a9_28.png')
distance_pics('b9_1.png','b4_1.png')
distance_pics('b9_1.png','b9_28.png')`

kaidic · 2016-12-27T03:03:34Z

I've encountered the same problem. It seems that the feature layer outputs I got from using tensorflow and caffe are different.

soulslicer · 2018-06-30T17:39:30Z

Which prototxt in the code was used to train model for jstl_dgd_inference.caffemodel? I can't seem to find it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use the pretrained JSTL+DGD model for person re-identification? #14

How to use the pretrained JSTL+DGD model for person re-identification? #14

benstaf commented Nov 8, 2016

Cysu commented Nov 8, 2016

benstaf commented Nov 12, 2016 •

edited

Loading

Cysu commented Nov 13, 2016

benstaf commented Nov 16, 2016

kaidic commented Dec 27, 2016

soulslicer commented Jun 30, 2018

How to use the pretrained JSTL+DGD model for person re-identification? #14

How to use the pretrained JSTL+DGD model for person re-identification? #14

Comments

benstaf commented Nov 8, 2016

Cysu commented Nov 8, 2016

benstaf commented Nov 12, 2016 • edited Loading

Cysu commented Nov 13, 2016

benstaf commented Nov 16, 2016

kaidic commented Dec 27, 2016

soulslicer commented Jun 30, 2018

benstaf commented Nov 12, 2016 •

edited

Loading