This repository includes our code for the paper 'Large-scale Training Data Search for Object Re-identification' in CVPR2023.
Related material: Paper, Video, Zhihu
As shown in figure above, we present a search and pruning (SnP) solution to the training data search problem in object re-ID. The source data pool is 1 order of magnitude larger than existing re-ID training sets in terms of the number of images and the number of identities. When the target is AlicePerson, from the source pool, our method (SnP) results in a training set 80% smaller than the source pool while achieving a similar or even higher re-ID accuracy. The searched training set is also superior to existing individual training sets such as Market-1501, Duke, and MSMT.
- Sklearn
- Scipy 1.2.1
- PyTorch 1.7.0 + torchivision 0.8.1
Please prepare the following datasets for person re-ID: DukeMTMC-reID, Market1503, MSMT17, CUHK03, RAiD, PersonX, UnrealPerson, RandPerson, PKU-Reid, VIPeR, AlicePerson (target data in VisDA20).
You may need to sign up to get access to some of these datasets. Please store these datasets in a file strcuture like this
~
└───reid_data
└───duke_reid
│ │ bounding_box_train
│ │ ...
│
└───market
│ │ bounding_box_train
│ │ ...
│
└───MSMT
│ │ MSMT_bounding_box_train
│ │ ...
│
└───cuhk03_release
│ │ cuhk-03.mat
│ │ ...
│
└───alice-person
│ │ bounding_box_train
│ │ ...
│
└───RAiD_Dataset-master
│ │ bounding_box_train
│ │ ...
│
└───unreal
│ │ UnrealPerson-data
│ │ ...
│
└───randperson_subset
│ │ randperson_subset
│ │ ...
│
└───PKU-Reid
│ │ PKUv1a_128x48
│ │ ...
│
└───i-LIDS-VID
│ │ images
│ │ ...
│
└───VIPeR
│ │ images
│ │ ...
Please prepare the following datasets for vehicle re-ID: VeRi, CityFlow-reID, VehicleID, VeRi-wild, VehicleX, Stanford Cars, PKU-vd1 and PKU-vd2. The AliceVehicle will be public available by our team shortly.
Please store these datasets in a file strcuture like this
~
└───reid_data
└───VeRi
│ │ bounding_box_train
│ │ ...
│
└───AIC19-reid
│ │ bounding_box_train
│ │ ...
│
└───VehicleID_V1.0
│ │ image
│ │ ...
│
└───vehicleX_random_attributes
│ │ ...
│
└───veri-wild
│ │ VeRI-Wild
│ │ ...
│
└───stanford_cars
│ │ cars_train
│ │ ...
│
└───compcars
│ │ CompCars
│ │ ...
│
└───PKU-VD
│ │ VD1
│ │ VD2
│ │ ...
The SnP framework are shown in animation above. For running such process, when Market is used as target, we can seach a training set with 2860 IDs using the command below:
python trainingset_search_person.py --target 'market' \
--result_dir 'results/sample_data_market/' --n_num_id 2860 \
--ID_sampling_method SnP --img_sampling_method 'FPS' --img_sampling_ratio 0.5 \
--output_data '/data/reid_data/market/SnP_2860IDs_0.5Imgs_0610'
When VeRi is used as target, the command is:
python trainingset_search_vehicle.py --target 'veri' \
--result_dir './results/sample_data_veri/' --n_num_id 3118 \
--ID_sampling_method SnP --img_sampling_method 'FPS' --img_sampling_ratio 0.5 \
--output_data '/data/data/VeRi/SnP_3118IDs_0.5Imgs_0610'
If you find this code useful, please kindly cite:
@article{yao2023large,
title={Large-scale Training Data Search for Object Re-identification},
author={Yao, Yue and Lei, Huan and Gedeon, Tom and Zheng, Liang},
journal={arXiv preprint arXiv:2303.16186},
year={2023}
}
If you have any question, feel free to contact yue.yao@anu.edu.au