You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It doesn't seem relevant to add define_model() function to the already existing define_model_architecture(). What define_model() does on top of define_model_architecture() is
(1) convert model to a DataParallel model if more than 1 gpu is used (which is only relevant for training and can be performed directly in main() function)
(2) push model to device, which can (and should?) be done directly in main() function for training and inference (without adding lines of code really).
(3) read weights from a checkpoint file if this file is provided. However, this operation creates overhead at inference, since the checkpoint must already be read once in main() function to override default params.
Related to #455 ::
With the "softcode download directory for checkpoints that are urls" feature, loading weights from checkpoint in this function with its current usage would mean:
Reading checkpoint in main(), and optionnaly downloading weights to checkpoint_dir if url
If checkpoint was url, read_checkpoint() would need to return not only the checkpoint dictionary containing params and weights, but also the new path to downloaded local checkpoint (not url) for further use by define_model().
Define_model() would then take this updated path to read checkpoint a second time into memory, then perform the "model.load_state_dict" operation from those weights.
Since, all of this would require reading two times rather than once the provided checkpoint, it wouldn't be very optimal in my opinion (and would also require more lines of code).
The text was updated successfully, but these errors were encountered:
See define_model().
It doesn't seem relevant to add define_model() function to the already existing define_model_architecture(). What define_model() does on top of define_model_architecture() is
(1) convert model to a DataParallel model if more than 1 gpu is used (which is only relevant for training and can be performed directly in main() function)
(2) push model to device, which can (and should?) be done directly in main() function for training and inference (without adding lines of code really).
(3) read weights from a checkpoint file if this file is provided. However, this operation creates overhead at inference, since the checkpoint must already be read once in main() function to override default params.
Related to #455 ::
With the "softcode download directory for checkpoints that are urls" feature, loading weights from checkpoint in this function with its current usage would mean:
Since, all of this would require reading two times rather than once the provided checkpoint, it wouldn't be very optimal in my opinion (and would also require more lines of code).
The text was updated successfully, but these errors were encountered: