implement more state-of-the-art loss for multi-label classification #52

NickleDave · 2020-03-23T02:07:19Z

trying to figure out how to 'task optimize' a network for object recognition when multiple objects can be in an image (which, let's face it, is always the case)

seems like it does not make sense to just have a softmax and train with cross-entropy, because this enforces exactly one winning prediction per image. Can argue about whether the brain does this, but for our model, we don't want to disadvantage it and then have it look like its lower accuracy correlates with Visual Search Difficulty scores just because it can never be perfect. Better to train the "best" way and still see an effect of difficulty

also doesn't make sense to have the output be "present / absent" because we would then be training it to output present only for labels that are in the training set (ignoring a class of object / target that might actually be in the image, we just haven't labeled it)

Hence seems like we still want multi-label classification

Can't tell what SoTA is though for training out-of-the-box CNNs for multi-label image classifcation--just a bunch of fancy methods without anyone showing directly how bad a vanilla CNN is.

BinaryCrossEntropy seems to be standard.
Seems like WARP is one method papers that propose fancy methods point to. Key idea is to learn a ranking using a sampling strategy, so we efficiently learn to rank positive samples higher than negatives
https://www.aaai.org/ocs/index.php/IJCAI/IJCAI11/paper/view/2926/3666
This paper applied a WARP-like loss to CNNs:
https://arxiv.org/pdf/1312.4894.pdf
An alternative is a pairwise loss:
http://openaccess.thecvf.com/content_cvpr_2017/papers/Li_Improving_Pairwise_Ranking_CVPR_2017_paper.pdf
but this looks more involved, not trained end-to-end, and a separate classifier

Some potentially useful implemenations of different losses for recommender systems here:
https://github.com/maciejkula/spotlight/blob/master/spotlight/losses.py
They call WARP "adaptive hinge loss" but cite the paper
https://github.com/maciejkula/spotlight/blob/75f4c8c55090771b52b88ef1a00f75bb39f9f2a9/spotlight/losses.py#L127

NickleDave · 2020-03-23T02:16:42Z

Here's a PyTorch specific implementation of WARP for neural nets
https://medium.com/@gabrieltseng/intro-to-warp-loss-automatic-differentiation-and-pytorch-b6aa5083187a

NickleDave · 2020-03-28T21:06:06Z

In terms of the paper, seems valid and useful to point out that not known what it means to "task optimize" when task involves "detecting a target among multiple objects", i.e., visual search, and so we should investigate multiple types of "task optimizing"

at least one paper finds training with just single-labels using cross-entropy can be surprisingly competitive for Pascal VOC which specifically is the dataset we are using, see #41
this is the paper: https://arxiv.org/pdf/1612.03663.pdf

at least one paper has extended this to deep nets, they develop "smooth loss functions for deep top-k classification" https://arxiv.org/pdf/1802.07595.pdf
and they have a PyTorch implementation: https://github.com/oval-group/smooth-topk

NickleDave mentioned this issue Mar 23, 2020

run VSD the same way as searchstims: not multi-label #41

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement more state-of-the-art loss for multi-label classification #52

implement more state-of-the-art loss for multi-label classification #52

NickleDave commented Mar 23, 2020

NickleDave commented Mar 23, 2020

NickleDave commented Mar 28, 2020

implement more state-of-the-art loss for multi-label classification #52

implement more state-of-the-art loss for multi-label classification #52

Comments

NickleDave commented Mar 23, 2020

NickleDave commented Mar 23, 2020

NickleDave commented Mar 28, 2020