Skip to content

Conversation

@tjruwase
Copy link
Contributor

Save ZeRO3 (partitioned) fp16 weights. This is a first step to using ZeRO3 weights outside DeepSpeed, #872.

@stas00
Copy link
Collaborator

stas00 commented Mar 19, 2021

That still leaves the partitions separate, so this is great if a user wants to load each partition separately, but doesn't work for when a user needs the model weights consolidated.

Also I don't think this PR should do this by default as it adds an overhead that most users won't need. So it should be configurable.

And also as suggested elsewhere the model_states.pt file with fake weights probably shouldn't even be saved as it just confuses the users who try to load it and it's guaranteed to fail.

def save_partitioned_weights(self, state_dict):
for name, param in self.module.named_parameters():
if name in state_dict.keys():
state_dict[name] = param.ds_tensor
Copy link
Collaborator

@stas00 stas00 Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found an issue here: param.ds_tensor in this place appears to be is a flattened buffer. So state_dicts ends up being populated with 1D vectors.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we can't shape it back to the original since we only have a part of the tensor, so doing something like narrow(0, 0, param.ds_numel).view(param.ds_shape) from _allgather_param() won't work and the shape has no meaning here anyway.

So this line of logic is useful when it's used to load the param.ds_tensor directly by each gpu, as coded in the rest of this PR.

I just tried to use it to get the partitioned fp16 weights, but now I understand this is not possible using this approach.

Bottom line - there is no problem here, just needed to understand that this is not a real state_dict that is being saved but something like flattened_params_state_dict.

All is good!

@tjruwase
Copy link
Contributor Author

Redundant by #892 and #893

@tjruwase tjruwase closed this Mar 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants