You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are two issues within the validation step inside yolo/tools/solver.py/ValidateModel at Lines 47-62, containing the following code:
def validation_step(self, batch, batch_idx):
batch_size, images, targets, rev_tensor, img_paths = batch
H, W = images.shape[2:]
predicts = self.post_process(self.ema(images), image_size=[W, H])
batch_metrics = self.metric(
[to_metrics_format(predict) for predict in predicts], [to_metrics_format(target) for target in targets]
)
self.log_dict(
{
"map": batch_metrics["map"],
"map_50": batch_metrics["map_50"],
},
batch_size=batch_size,
)
return predicts
1, The call to self.metric() (which is a torchmetrics.detection.MeanAveragePrecision instance) at every validation step calculates the results not only for the in step provided batch, but for all values currently in its accumulator (so for every example accumulated so far from all the previous steps), resulting validation becoming progressively slower after each step. If one has more than a couple hundred validation images in the dataset, it makes validation take extremely long.
2, The collate function of the data loader fills up the "targets" tensor with dummy zeroes (boxes) for the images, which have less ground truth boxes than the image with the most labels (boxes), to be able to batch all the labels in to a single tensor. These dummy values however are not filtered out and are still there, when calculating the validation metrics. This leads to incorrect (lower than should be) metric values, since torchmetrics takes these dummy values as "missed" labels (boxes).
Proposed (and tested) solution
Change the above code snippet in Validate Model to:
def validation_step(self, batch, batch_idx):
batch_size, images, targets, rev_tensor, img_paths = batch
H, W = images.shape[2:]
predicts = self.post_process(self.ema(images), image_size=[W, H])
self.metric.update([to_metrics_format(predict) for predict in predicts],
[to_metrics_format(target[target.sum(1) > 0]) for target in targets])
return predicts
1, Change the self.metric() call to self.metric.update(), which only updates the accumulator with the new values, but does not calculate the results yet (which is done in "on_validation_epoch_end")
2, Drop the logging of per batch metric values (which were incorrect anyway, as they were cumulative values)
3, Filter out the dummy boxes from the per image "target" tensors before appending them to the metrics accumulator.
The text was updated successfully, but these errors were encountered:
Changed the self.metric() call to self.metric.update().
Removed the logging of per batch metric values.
Filtered out the dummy boxes from the per image "target" tensors before appending them to the metrics accumulator.
fixesWongKinYiu#133
Describe the bug
There are two issues within the validation step inside yolo/tools/solver.py/ValidateModel at Lines 47-62, containing the following code:
1, The call to self.metric() (which is a torchmetrics.detection.MeanAveragePrecision instance) at every validation step calculates the results not only for the in step provided batch, but for all values currently in its accumulator (so for every example accumulated so far from all the previous steps), resulting validation becoming progressively slower after each step. If one has more than a couple hundred validation images in the dataset, it makes validation take extremely long.
2, The collate function of the data loader fills up the "targets" tensor with dummy zeroes (boxes) for the images, which have less ground truth boxes than the image with the most labels (boxes), to be able to batch all the labels in to a single tensor. These dummy values however are not filtered out and are still there, when calculating the validation metrics. This leads to incorrect (lower than should be) metric values, since torchmetrics takes these dummy values as "missed" labels (boxes).
Proposed (and tested) solution
Change the above code snippet in Validate Model to:
1, Change the self.metric() call to self.metric.update(), which only updates the accumulator with the new values, but does not calculate the results yet (which is done in "on_validation_epoch_end")
2, Drop the logging of per batch metric values (which were incorrect anyway, as they were cumulative values)
3, Filter out the dummy boxes from the per image "target" tensors before appending them to the metrics accumulator.
The text was updated successfully, but these errors were encountered: