You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In (536)[https://github.com//issues/536], we provided functionality of horovod with Merlin Models and added features which automates the process on the Merlin Models side. However, the current feature is not 100% user friendly and there are still open questions how a user can use multi-GPU data parallel training.
Goal:
Improve the user experience to user multi-GPU data parallel training
Test multi-GPU data parallel training: AUC? Scale-Up Performance?
Constraints:
I am not sure, if the issues with unequal batch size for the data loader is solved: [BUG] Data parallel training freezes due to different number of batches dataloader#75:
-- if the solution is about how the data is generated, correctly - how das that work?
-- How are we ensure it with NVTabular?
-- How about users who do NOT use NVTabular?
The unittest is written that each worker runs through the FULL dataset per epoch. That is incorrect. If we have 1M data points and 2 GPUs, each GPU should run only through 500k data points. I wrote the example that NVTabular produces distinct files per worker. Is that the proposed workflow for a user?
Starting Point:
Analyze scaling factor by using multiple GPUs: If we go from 1x GPU -> 2x GPUs -> 4x GPUs -> 8 GPUx - how much higher is the throughput?
Provide performance metrics (accuracy / AUC / etc) to show that there is no negative effect in the model performance
Provide guidance how to set global batch size, batch size per GPU and learning rate when scaling
The text was updated successfully, but these errors were encountered:
Problem:
In (536)[https://github.com//issues/536], we provided functionality of horovod with Merlin Models and added features which automates the process on the Merlin Models side. However, the current feature is not 100% user friendly and there are still open questions how a user can use multi-GPU data parallel training.
Goal:
Constraints:
-- if the solution is about how the data is generated, correctly - how das that work?
-- How are we ensure it with NVTabular?
-- How about users who do NOT use NVTabular?
Starting Point:
The text was updated successfully, but these errors were encountered: