-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Triplet Image Dataset #1042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, I believe Thoughts? |
That sounds like a good idea! We will, however, need to come up with some sort of strategy to select valid The other problem is that we will have to go through classes = ['class_1', 'class_2', ..., 'class_m']
class_imgs = {'c_1': ['/path/1', '/path/2', ...], ...}
num_triplets = 10000
samples = []
for _ in range(num_triplets):
c_1, c_2 = np.random.choice(classes, size=2, replace=False)
img_1, img_2 = np.random.choice(class_imgs[c_1], size=2, replace=False)
img_3 = np.random.choice(class_imgs[c_2])
triplet = (img_1, img_2, img_3)
samples.append(triplet) This approach solved my problem, but may not be suitable for the generic case that you proposed. What do you think? |
@dakshjotwani I'd probably want to have the dataset triplets to be deterministic, at least for torchvision. So randomly sampling the triplets on a given |
Another option would be to make it an |
@fmassa In my current implementation, I'm not sure if it's practical to go through all possible |
@dakshjotwani that's what I mentioned wrt it not being deterministic: two different runs will have different values, and are indeed different datasets. About the second point: But this might just be a better fit for the |
Alright, I agree that this dataset should be an
format = {
'class_0': {
'num_samples': 3,
'replacement': False,
},
'class_1': {
'num_samples': 2,
'replacement': True,
},
...
} The |
@dakshjotwani I think that it might make sense to start with 3 - I think it might be good to specify a Also, ccing @ssnl , as he was the one who originally designed |
Sounds good. I look forward to what @ssnl has to say about our |
This is a bit of an interesting use of |
Use cases for generic tuple datasets likely exist, but re motivation for the original task, I've never used an n-tuple dataset (n**k is not pleasant) with siamese or triplet loss fn. I've always used hard or semi-hard mining and more typical dataset that returns one element at a time but has a custom sampler to make sure each batch has the desired examples per identity. I think most methods for identity recognition, face or otherwise since the FaceNet era have moved to online hard or semi-hard example mining for the triples and pairs. Or don't use them at all if they're using the various softmax + geometric loss like arcface, cosface, ring loss, etc etc. Edit: This is a great blog post on the topic https://omoindrot.github.io/triplet-loss ... also, forgot my paper history, but FaceNet was actually the paper that introduced online selection and recommended not building a dataset that did offline triplet selection |
@rwightman yes, I agree that hard-negative mining is something that we should add support for. But performing hard negative mining on a large dataset on-the-fly is very expensive computation-wise. This means that we can re-create the dataset once every few epochs or so, by updating the similarity matrix that is provided to it. This would still kind of match with the approach in #1061 (but some changes would need to be done). |
@fmassa basing hard negative mining on features calculated every few epochs is still 'offline' ... by online I mean you select the pairings that you sum for your loss after you run the examples through the network on each batch and have a set of embeddings. It's less expensive than offline hard example mining. This is often called 'batch hard' or 'batch-semi hard' if you don't just pick the hardest (which can sometimes collapse if you have too many really hard examples). This paper compares all methods: https://arxiv.org/abs/1703.07737 ... it compares offline Triplet, Triplet with OHM (offline hard mining which you just mentioned), with the batch-hard, batch-all At some point in the training, you do end up with no or very few hard examples if you're using a hard margin (not so with soft margin), so the number of examples in each batch that you sum into the loss decreases, but that's often an indicator when to stop training.... I wouldn't recommend trying to fit the batch-hard/batch-all cases with the other n-tuple scenarios. The dataset for online bach schemes remains pretty normal like a typical classification setup, only the sampler that builds batches with K examples of P classes in each batch and the loss function, that computes pairwise distances and builds the tuples for each batch differs. Tensorflow has an example loss fn, https://github.com/tensorflow/tensorflow/blob/r1.14/tensorflow/contrib/losses/python/metric_learning/metric_loss_ops.py#L160 |
Wouldn't this be something that would be dealt |
@dakshjotwani yeah, you need a non-trivial loss fn like that TF one I linked above. But note that it only takes a list of embeddings and a list of labels, it does not build any triplets. I went through this exercise myself some time ago, was about to set out building a triplet dataset and then realized they are not needed and spent the time working on the loss fn/mining strategies and sampler. So that's why bringing this up, do you actually want a triplet dataset? or do you want a triplet training scheme that works? |
Interesting. I want to make sure I'm understanding this correctly, and listing some potential contributions we can make:
@rwightman Is this the approach you want to take? Please correct me if I made any errors. @fmassa What do you think? Do you think this is a better approach to the problem? If we decide to change our approach, which modules would I have to work with? (Since it wouldn't be a dataset anymore) In the meantime, I shall read more on this and get back with a gist as a proof of concept by tomorrow. It will probably help me understand the online mining approach better. |
@dakshjotwani The sampler there assumes the dataset has a map with a specific name in it. There would be better ways of making this interface more generic. Batch size needs to be set to p*k. In the loss I included a pretty standard hard-negative selection and my own hacky sampling selection. I wanted to revist my sampling approach someday but haven't had time. The idea was to sample hard negatives (and positives) more frequently than the easy ones, but not to always choose the hard ones as that can cause collapse in datasets where there are lots of very hard examples or noise in the labels. I'm sure someone smarter than I has come up a better approach for that, but my hack did work well on a dataset where the hard negative approach was causing the loss to collapse to the margin. Overall aproach, I'd probably spend more time with all of these ideas, implement several different methods, see what works and what doesn't on a real dataset before commiting to one and finishing a PR. |
Thanks for the gist! It'll surely help me understand this approach better. The I shall work with your approach on the same dataset and margin (margin=0.2) and get back to you with results. |
@dakshjotwani I've asked around and it seems that the online within-batch approach for triplet loss might be better. So I'll hold on on merging the |
@rwightman This approach did work better than my previous approach. On VGGFace2's test set, using I attempted to refine your @fmassa I do believe, however, that there is still a need for some sort of I think it would be quite nice to have some of these features (online triplet mining, PKSampler, TupleDataset) to make metric-learning more convenient using Pytorch/torchvision. What do you think? |
@dakshjotwani very interesting results! Thanks @rwightman for the heads up!
The dataset that you mention can be built on-the-fly, without materializing anything, right? Just compute on-the-fly the label for the tuple, and compute the index of
I agree, and I think I'd start with the triplet mining and PKSampler. I think this deserves a whole new folder in What about the following: Could you send a PR with a simple training / evaluation script that trains on
Once those are there and merged, we will need to find a better place for adding those losses and samplers in the core torchvision. Thoughts? |
@fmassa Yes it can. Currently, instead of doing that, I'm using Edit: I have made
I could, but this would be quite complicated. Unlike the other datasets, the VGGFace2 download cannot be automated using Do you think we can work with something simpler, like I feel like the Here's what I think the PR should include:
Does this sound good? |
@dakshjotwani awesome, glad you got it working I seen fixed Triplets used for validation as you said. I think at least one of the canonical datasets has validation triplets predefined by some common impl but I forget which. Another common validation technique you'll see when the task is retrieval or reid is calculation of mAP and cmc metrics on a seperate or holdout set of index + query images. Picking two Pytorch based repos https://github.com/Cysu/open-reid/tree/master/reid and https://github.com/layumi/Person_reID_baseline_pytorch have this setup For you training, if you're trying to sqeueze out a few more %, triplet loss often sensitive to h-params, so trying different optimizers, LR can have a big impact. Size of your embedding can make a difference. Also try with and without L2 normalizating your embeddings before feeding into the loss and with/without clamping (relu) the output to positive. With soft margin on and off, or different margins if the hard margin is working better. |
Interesting. I'm actually working on another problem which is quite similar to the problems solved in the repos you mentioned. Thanks for the links! I'll delve deeper into them right away. Currently my validation approach is to compute the distances between all tuples (
I'll keep that in mind and keep you posted! Thanks! |
Closing this for now. We can reopen this if we want to implement |
Hi,
I am currently working with semi-supervised learning to implement face recognition. I required a dataset that could load triplets (three images, where the first two belong to the same class, and the last image belongs to a different class. The facenet paper has more details on this.) for training. Currently, I have implemented this dataset for my own use. I was wondering if this is a feature that would be nice to have as part of the library itself.
Thanks,
Daksh
The text was updated successfully, but these errors were encountered: