Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement more state-of-the-art loss for multi-label classification #52

Open
NickleDave opened this issue Mar 23, 2020 · 2 comments
Open

Comments

@NickleDave
Copy link
Owner

trying to figure out how to 'task optimize' a network for object recognition when multiple objects can be in an image (which, let's face it, is always the case)

seems like it does not make sense to just have a softmax and train with cross-entropy, because this enforces exactly one winning prediction per image. Can argue about whether the brain does this, but for our model, we don't want to disadvantage it and then have it look like its lower accuracy correlates with Visual Search Difficulty scores just because it can never be perfect. Better to train the "best" way and still see an effect of difficulty

also doesn't make sense to have the output be "present / absent" because we would then be training it to output present only for labels that are in the training set (ignoring a class of object / target that might actually be in the image, we just haven't labeled it)

Hence seems like we still want multi-label classification

Can't tell what SoTA is though for training out-of-the-box CNNs for multi-label image classifcation--just a bunch of fancy methods without anyone showing directly how bad a vanilla CNN is.

BinaryCrossEntropy seems to be standard.
Seems like WARP is one method papers that propose fancy methods point to. Key idea is to learn a ranking using a sampling strategy, so we efficiently learn to rank positive samples higher than negatives
https://www.aaai.org/ocs/index.php/IJCAI/IJCAI11/paper/view/2926/3666
This paper applied a WARP-like loss to CNNs:
https://arxiv.org/pdf/1312.4894.pdf
An alternative is a pairwise loss:
http://openaccess.thecvf.com/content_cvpr_2017/papers/Li_Improving_Pairwise_Ranking_CVPR_2017_paper.pdf
but this looks more involved, not trained end-to-end, and a separate classifier

Some potentially useful implemenations of different losses for recommender systems here:
https://github.com/maciejkula/spotlight/blob/master/spotlight/losses.py
They call WARP "adaptive hinge loss" but cite the paper
https://github.com/maciejkula/spotlight/blob/75f4c8c55090771b52b88ef1a00f75bb39f9f2a9/spotlight/losses.py#L127

@NickleDave
Copy link
Owner Author

Here's a PyTorch specific implementation of WARP for neural nets
https://medium.com/@gabrieltseng/intro-to-warp-loss-automatic-differentiation-and-pytorch-b6aa5083187a

@NickleDave
Copy link
Owner Author

In terms of the paper, seems valid and useful to point out that not known what it means to "task optimize" when task involves "detecting a target among multiple objects", i.e., visual search, and so we should investigate multiple types of "task optimizing"

at least one paper finds training with just single-labels using cross-entropy can be surprisingly competitive for Pascal VOC which specifically is the dataset we are using, see #41
this is the paper: https://arxiv.org/pdf/1612.03663.pdf

at least one paper has extended this to deep nets, they develop "smooth loss functions for deep top-k classification" https://arxiv.org/pdf/1802.07595.pdf
and they have a PyTorch implementation: https://github.com/oval-group/smooth-topk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant