Commit 4a851f3
Avoid graph break by removing redundant requires_grad attr change (#7158)
This PR is a continuation of the efforts to improve DeepSpeed
performance when using PyTorch compile.
Dynamo breaks the graph because `flat_tensor.requires_grad = False`:
* Is a side-effecting operation on tensor metadata
* Occurs in a context where Dynamo expects static tensor properties for
tracing
`flat_tensor.requires_grad` is redundant and can be safely removed
because:
* `_allgather_params()` function is already decorated with
`@torch.no_grad()` which ensures the desired property
* `flat_tensor` is created using the `torch.empty()` which sets the
`requires_grad=False` by default.
---------
Signed-off-by: Max Kovalenko <mkovalenko@habana.ai>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Hongwei Chen <33092912+hwchen2017@users.noreply.github.com>
Signed-off-by: Logan Adams <loadams@microsoft.com>1 parent 78ec025 commit 4a851f3
1 file changed
+0
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1899 | 1899 | | |
1900 | 1900 | | |
1901 | 1901 | | |
1902 | | - | |
1903 | 1902 | | |
1904 | 1903 | | |
1905 | 1904 | | |
| |||
0 commit comments