notes of pytorch practices in experiments
PyTorch 1.0.0 or PyTorch 0.4.0
-
single PC with multiple GPUs using DataParallel (DataParallel may hang with PyTorch 0.4.0, Tesla V100/K80 due to nccl issue)
-
load model trained on multi gpus using torch.save({"model": model.state_dict()}, "xxx") to save instance of DataParallel: build a new OrderedDict with keys removing "module"
-
save model trained on multi gpus in order to load it without multi gpus: save the model without DataParallel wrap
-
load model trained on multi gpus by wrapping it in DataParallel again