You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
trying to figure out how to 'task optimize' a network for object recognition when multiple objects can be in an image (which, let's face it, is always the case)
seems like it does not make sense to just have a softmax and train with cross-entropy, because this enforces exactly one winning prediction per image. Can argue about whether the brain does this, but for our model, we don't want to disadvantage it and then have it look like its lower accuracy correlates with Visual Search Difficulty scores just because it can never be perfect. Better to train the "best" way and still see an effect of difficulty
also doesn't make sense to have the output be "present / absent" because we would then be training it to output present only for labels that are in the training set (ignoring a class of object / target that might actually be in the image, we just haven't labeled it)
Hence seems like we still want multi-label classification
Can't tell what SoTA is though for training out-of-the-box CNNs for multi-label image classifcation--just a bunch of fancy methods without anyone showing directly how bad a vanilla CNN is.
In terms of the paper, seems valid and useful to point out that not known what it means to "task optimize" when task involves "detecting a target among multiple objects", i.e., visual search, and so we should investigate multiple types of "task optimizing"
at least one paper finds training with just single-labels using cross-entropy can be surprisingly competitive for Pascal VOC which specifically is the dataset we are using, see #41
this is the paper: https://arxiv.org/pdf/1612.03663.pdf
trying to figure out how to 'task optimize' a network for object recognition when multiple objects can be in an image (which, let's face it, is always the case)
seems like it does not make sense to just have a softmax and train with cross-entropy, because this enforces exactly one winning prediction per image. Can argue about whether the brain does this, but for our model, we don't want to disadvantage it and then have it look like its lower accuracy correlates with Visual Search Difficulty scores just because it can never be perfect. Better to train the "best" way and still see an effect of difficulty
also doesn't make sense to have the output be "present / absent" because we would then be training it to output present only for labels that are in the training set (ignoring a class of object / target that might actually be in the image, we just haven't labeled it)
Hence seems like we still want multi-label classification
Can't tell what SoTA is though for training out-of-the-box CNNs for multi-label image classifcation--just a bunch of fancy methods without anyone showing directly how bad a vanilla CNN is.
BinaryCrossEntropy seems to be standard.
Seems like WARP is one method papers that propose fancy methods point to. Key idea is to learn a ranking using a sampling strategy, so we efficiently learn to rank positive samples higher than negatives
https://www.aaai.org/ocs/index.php/IJCAI/IJCAI11/paper/view/2926/3666
This paper applied a WARP-like loss to CNNs:
https://arxiv.org/pdf/1312.4894.pdf
An alternative is a pairwise loss:
http://openaccess.thecvf.com/content_cvpr_2017/papers/Li_Improving_Pairwise_Ranking_CVPR_2017_paper.pdf
but this looks more involved, not trained end-to-end, and a separate classifier
Some potentially useful implemenations of different losses for recommender systems here:
https://github.com/maciejkula/spotlight/blob/master/spotlight/losses.py
They call WARP "adaptive hinge loss" but cite the paper
https://github.com/maciejkula/spotlight/blob/75f4c8c55090771b52b88ef1a00f75bb39f9f2a9/spotlight/losses.py#L127
The text was updated successfully, but these errors were encountered: