You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The jobs got extended to enable multi-GPU usage for the torch backend (see #444 and #445). The horovod_num_processes variable name is now incorrect. This change needs to be done carefully since this is a potentially hash-breaking change.
Analog to distributed_launch_command rename horovod_num_processes to distributed_num_processes?
Ok, we could maybe define a custom _sis_hash for the ReturnnTrainingJob, and there use exactly the old name (horovod_num_processes), and then in __init__, we can do any handling we want for kwargs, so supporting both the new name and the old name. I think sth like this would work.
The jobs got extended to enable multi-GPU usage for the torch backend (see #444 and #445). The
horovod_num_processes
variable name is now incorrect. This change needs to be done carefully since this is a potentially hash-breaking change.Analog to
distributed_launch_command
renamehorovod_num_processes
todistributed_num_processes
?@albertz @Judyxujj @JackTemaki comments?
The text was updated successfully, but these errors were encountered: