-
Notifications
You must be signed in to change notification settings - Fork 216
Description
I've been doing some experiments with your batch hard triplet loss function and different architectures/datasets. On MARS I manage to reproduce the results from your paper (network seems to converge), but with many other datasets I get stuck at a loss of ~0.6931 which is softplus(0). Looking at the embedding it seems like the network starts to yield the same embeddings for all different classes.
Worth to know is that a center loss formulation works quite well for generating usable embeddings for these datasets, I've tried with me-celeb-1m (after cleaning it up), and with casia-webface.
My interpretation of these results is that the batch hard triplet loss function is really sensitive to mislabeled datasets, and it might get stuck in a local minima if the dataset contains mislabeled images. I've tried some hyperparameter tuning (e. g. changing lr and optimizer), but I haven't managed to avoid the local minimum.
Have you seen similar results in your work when experimenting with different datasets?