You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed while reading the source code and using it to evaluate models that some ground truth are missed when there are no detections, which results in a higher Average Precision than required.
I don't have any estimate of how much it impacts scoring on real predictions (although I suspect it would boost hard to get, rare classes) but targeted submissions may exploit it by having a few high confidence prediction on a class an then not predict it on other images to artificially boost AP, precision, recall and thus mAP.
It happens in 2 cases:
Whenever a detector doesn't predict anything on a tile
Whenever a detector fail to predict a class present on the image
I don't know if this behaviour is knowingly expected but I wanted to share my concerns over a potential exploit for artificially high scored submissions.
The text was updated successfully, but these errors were encountered:
ClementMaliet
changed the title
Some False Negative are missed when no detections
Some False Negative are missed when there no detections
Aug 7, 2018
Hi,
I noticed while reading the source code and using it to evaluate models that some ground truth are missed when there are no detections, which results in a higher Average Precision than required.
I don't have any estimate of how much it impacts scoring on real predictions (although I suspect it would boost hard to get, rare classes) but targeted submissions may exploit it by having a few high confidence prediction on a class an then not predict it on other images to artificially boost AP, precision, recall and thus mAP.
It happens in 2 cases:
The issue is due to the fact that when the
Matching
object is provided with an empty detection list for a class (which happens in both cases). Thegreedy_match()
method returns 2 empty lists in this case (https://github.com/DIUx-xView/baseline/blob/master/scoring/matching.py#L93-L96) and the number of ground truths for every class is not incremented because it relies on the output ofgreedy_match()
to do so (https://github.com/DIUx-xView/baseline/blob/master/scoring/score.py#L239-L244). This leads to ignoring False Negatives and thus increasing recall (and boosting AP in the process) for every class present in the ground truth of the image where it happened.Changing
greedy_match()
to always returngt_rects_matched
or changing https://github.com/DIUx-xView/baseline/blob/master/scoring/score.py#L244 fromnum_gt_per_cls[i] += len(gt_matched)
tonum_gt_per_cls[i] += len(gt_rects)
might fix the issue from my understanding.I don't know if this behaviour is knowingly expected but I wanted to share my concerns over a potential exploit for artificially high scored submissions.
The text was updated successfully, but these errors were encountered: