-
Notifications
You must be signed in to change notification settings - Fork 296
Description
There are several issues with the mean_iou code here:
The most important is that it actually computes recall (sensitivity) instead of IoU. The root cause appears to be the way mask
is computed only from label
, but then the same mask
is applied to both pred_label
and label
(lines 144-149):
mask = label != ignore_index
mask = np.not_equal(label, ignore_index)
pred_label = pred_label[mask]
label = np.array(label)[mask]
intersect = pred_label[pred_label == label]
Because both pred_label
and label
are masked with pixels from label
only, the result of the computation in that function is the ratio of intersection and label (recall), instead of the ratio of intersection and the union of prediction and label (IoU).
It's a subtle error that is hard to discover because both IoU and recall have values between 0 and 1, and both behave similarly in training.
The problem is, recall is higher than IoU, which then leads to an overestimate of model performance. The unfortunate side-effect is that I've wasted a lot of time training a SegFormer model based on wrong assumptions.
I've only discovered this because I wrote my own metric functions, starting from TP / TN / FP / FN, and then from those four values I've computed Sorensen-Dice (a.k.a. F1-score), precision, recall, and (on a whim) IoU. This is my code (it's not optimized, the function docstrings are wrong, but it works):
https://gist.github.com/FlorinAndrei/da9ab770b16bfc671075d04a030f548b
I was very confused initially when my IoU was different from evaluate/metrics/mean_iou
. But then I noticed my recall was the same as "IoU" from evaluate/metrics/mean_iou
. I've checked my code in a few different ways and I believe it is correct.
Here's a visual sample:
eval/iou_lesion
is the result from evaluate/metrics/mean_iou
. eval/loss
is just the evaluation loss. The rest are computed by my code. eval/niou_lesion
is IoU computed by my code. Notice how the library code produces identical results to the recall value from my code.
My code has only been tested with SegFormer, and only for datasets with a single class, plus background, where the label pixels are 1 and the background is 0. I have not tested it for multiclass segmentation. I have not tested reduce_labels = True
.