You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was using pytorchlightning.Trainer in ddp mode with 8 GPUs on a HPC. There are 5k images in my validation set, and I find all the CPU cores are fully loaded (100%) for extremely long time (~2 hours) at validation_epoch_end() stage.
The same computation takes place in N separate sub-process simultaneously. If the computations are on GPU, they should finish in roughly the same time. But current implementation moves all tensors to CPU and CPU overload occurs while N increases.
ddp with 1 GPU, number of fully loaded GPU cores 16/32, takes ~6 mins.
ddp with 2 GPU, number of fully loaded GPU cores 32/32, takes ~6 mins.
ddp with >2 GPU, number of fully loaded GPU cores 32/32, takes > 1 hours.
Environment
OS (e.g., Linux): Linux
Python: 3.8
PyTorch: 1.8.1 and 1.10.2
pytorch-lightning: 1.5.10
torchmetrics: 0.7.2
Temp solution
My temp solution is re-enable the mAP computation on GPU, although the GPU version is slower than CPU version #677
I commented the following lines and made other relevant variables on the correct device.
defcompute(self) ->dict:
# move everything to CPU, as we are faster here# self.detection_boxes = [box.cpu() for box in self.detection_boxes]# self.detection_labels = [label.cpu() for label in self.detection_labels]# self.detection_scores = [score.cpu() for score in self.detection_scores]# self.groundtruth_boxes = [box.cpu() for box in self.groundtruth_boxes]# self.groundtruth_labels = [label.cpu() for label in self.groundtruth_labels]
Should we make compute_on_cpu optional? and any suggestion according to TorchMetrics' API design?
I'm also keeping my eyes on the work of improving mAP performance on both CPU and GPU. #742
The text was updated successfully, but these errors were encountered:
🐛 Bug
I was using
pytorchlightning.Trainer
inddp
mode with 8 GPUs on a HPC. There are 5k images in my validation set, and I find all the CPU cores are fully loaded (100%) for extremely long time (~2 hours) atvalidation_epoch_end()
stage.To Reproduce
ddp
mode withnumber of device > 2
Code sample
Expected behavior
The same computation takes place in
N
separate sub-process simultaneously. If the computations are on GPU, they should finish in roughly the same time. But current implementation moves all tensors to CPU and CPU overload occurs whileN
increases.Environment
Temp solution
My temp solution is re-enable the mAP computation on GPU, although the GPU version is slower than CPU version #677
I commented the following lines and made other relevant variables on the correct device.
Should we make
compute_on_cpu
optional? and any suggestion according to TorchMetrics' API design?I'm also keeping my eyes on the work of improving mAP performance on both CPU and GPU. #742
The text was updated successfully, but these errors were encountered: