You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, first of all thanks for sharing your codebase!
We've been testing it for a while and it's working well for us.
But unfortunately we've noticed that turning on distributed training degrades the performance significantly on our setup.
Running fully supervised on the S3DIS dataset with spvcnn as the model we get ~62% validation mIoU.
With same hyper-parameters and distributed_training on 4 gpus it is much faster, but we only get ~50%.
Tweaking some hps and increasing the training epochs, the best we got was ~56%. (with batch size 2 and lr 0.005)
Now we're wondering, if you used the distributed training and noticed similar performance drops?
Or are there maybe some other parameters that need to be adjusted when using distributed training?
Thanks in advance!
The text was updated successfully, but these errors were encountered:
Hello, first of all thanks for sharing your codebase!
We've been testing it for a while and it's working well for us.
But unfortunately we've noticed that turning on distributed training degrades the performance significantly on our setup.
Running fully supervised on the S3DIS dataset with spvcnn as the model we get ~62% validation mIoU.
With same hyper-parameters and distributed_training on 4 gpus it is much faster, but we only get ~50%.
Tweaking some hps and increasing the training epochs, the best we got was ~56%. (with batch size 2 and lr 0.005)
Now we're wondering, if you used the distributed training and noticed similar performance drops?
Or are there maybe some other parameters that need to be adjusted when using distributed training?
Thanks in advance!
The text was updated successfully, but these errors were encountered: