Skip to content

Commit 4a851f3

Browse files
deepcharmloadamshwchen2017
committed
Avoid graph break by removing redundant requires_grad attr change (#7158)
This PR is a continuation of the efforts to improve DeepSpeed performance when using PyTorch compile. Dynamo breaks the graph because `flat_tensor.requires_grad = False`: * Is a side-effecting operation on tensor metadata * Occurs in a context where Dynamo expects static tensor properties for tracing `flat_tensor.requires_grad` is redundant and can be safely removed because: * `_allgather_params()` function is already decorated with `@torch.no_grad()` which ensures the desired property * `flat_tensor` is created using the `torch.empty()` which sets the `requires_grad=False` by default. --------- Signed-off-by: Max Kovalenko <mkovalenko@habana.ai> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Hongwei Chen <33092912+hwchen2017@users.noreply.github.com> Signed-off-by: Logan Adams <loadams@microsoft.com>
1 parent 78ec025 commit 4a851f3

File tree

1 file changed

+0
-1
lines changed

1 file changed

+0
-1
lines changed

deepspeed/runtime/zero/partition_parameters.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1899,7 +1899,6 @@ def _allgather_params(self, param_list, hierarchy=0):
18991899

19001900
tensor_size = partition_size * self.num_partitions
19011901
flat_tensor = torch.empty(tensor_size, dtype=param_list[0].ds_tensor.dtype, device=self.local_device)
1902-
flat_tensor.requires_grad = False
19031902
partitions = []
19041903
for i in range(self.num_partitions):
19051904
start = partition_size * i

0 commit comments

Comments
 (0)