Simplify and fix match_fw_and_bw_saved_for_bw_proxies implementation #1754

IvanYashchuk · 2025-02-07T12:37:42Z

Base PR: #1756, keeping draft mode to prevent merges into previous PR in the stack

This change is needed to unblock @jjsjann123 for #1732.

The previous implementation uses both tensors and non-tensors saved for backward in constructing old_saved_for_backward_fw but in construction of the mirrored object from the backward trace non-tensors are ignored leading to an error that @jjsjann123 observed while working on #1732.

cc @mruberry @lantiga @ali-alshaar7

IvanYashchuk · 2025-02-07T13:01:59Z

Failures to fix:

FAILED thunder/tests/test_inplace_copy.py::test_prim_inplace_copy_bwd_nvfuser_cuda_thunder.dtypes.bfloat16 - AssertionError
FAILED thunder/tests/test_inplace_copy.py::test_prim_inplace_copy_bwd_nvfuser_cuda_thunder.dtypes.float16 - AssertionError
FAILED thunder/tests/test_torch_compile_executor.py::test_litgpt_fabric_for_callable - AssertionError
FAILED thunder/tests/test_torch_compile_executor.py::test_torch_compile_cat_rope_single_fusion - AssertionError
FAILED thunder/tests/test_transforms.py::test_disable_params_and_buffer_check - AssertionError
FAILED thunder/tests/test_jit_general.py::test_tom_overrides_proxy[cuda] - AssertionError
FAILED thunder/tests/test_jit_general.py::test_litgpt_variants[cuda-llama1-like] - AssertionError
FAILED thunder/tests/test_sdpaex_executor.py::test_sdpa_attn_mask[True-cuda-bf16] - AssertionError
FAILED thunder/tests/test_sdpaex_executor.py::test_sdpa_attn_mask[True-cuda-f16] - AssertionError
FAILED thunder/tests/test_sdpaex_executor.py::test_sdpa_attn_mask[False-cuda-f16] - AssertionError
FAILED thunder/tests/test_dynamo.py::test_ThunderCompilerGraphBenchmarking_LlamaMLPBenchmark - torch._dynamo.exc.BackendCompilerFailed: backend='?' raised:
AssertionError: 

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True
FAILED thunder/tests/test_sdpaex_executor.py::test_sdpa_attn_mask[False-cuda-bf16] - AssertionError
FAILED thunder/tests/test_jit_general.py::test_litgpt_variants[cuda-gpt-neox-like] - AssertionError
FAILED thunder/tests/test_jit_general.py::test_litgpt_variants[cuda-long-context-like] - AssertionError
FAILED thunder/tests/test_jit_general.py::test_litgpt_variants[cuda-llama2-like] - AssertionError
FAILED thunder/tests/test_jit_general.py::test_litgpt_variants[cuda-codellama2-like] - AssertionError
= 16 failed, 2162 passed, 228 skipped, 22 xfailed, 7 xpassed, 51635 warnings in 426.08s (0:07:06) =

Needed to fix asserts in tests of #1754 Revert changes from 24d6a8d

IvanYashchuk · 2025-02-07T15:46:21Z

One more test to fix:

FAILED thunder/tests/distributed/test_fsdp.py::FSDPTest::test_rematerialize_all_gather - RuntimeError: Process 1 exited with error code 10 and exception:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 726, in run_test
    getattr(self, test_name)()
  File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 599, in wrapper
    fn()
  File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 3120, in wrapper
    method(*args, **kwargs)
  File "/__w/3/s/thunder/tests/distributed/test_fsdp.py", line 136, in test_rematerialize_all_gather
    self.assertTrue(all(t in result_saved_for_bwd for t in sharded_param_names))
  File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue
    raise self.failureException(msg)
AssertionError: False is not true

stale · 2025-04-16T05:38:49Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

IvanYashchuk added the tracing architecture label Feb 7, 2025

IvanYashchuk requested review from mruberry, lantiga and t-vi as code owners February 7, 2025 12:37

IvanYashchuk mentioned this pull request Feb 7, 2025

Update vjp implementation to attach residuals to all outputs #1755

Draft

IvanYashchuk mentioned this pull request Feb 7, 2025

use transform for execution to get torch_compile executable #1500

Merged

IvanYashchuk added a commit that referenced this pull request Feb 7, 2025

Revert updates to saved_tensors info on fw_extrace

45f2c25

Needed to fix asserts in tests of #1754 Revert changes from 24d6a8d

IvanYashchuk mentioned this pull request Feb 7, 2025

Revert updates to saved_tensors info on fw_extrace, no need to remove duplicate variables #1756

Open

Simplify and fix match_fw_and_bw_saved_for_bw_proxies implementation

3c4b0b3

IvanYashchuk force-pushed the ivan-1732-1 branch from 698dfc7 to 3c4b0b3 Compare February 7, 2025 13:27

IvanYashchuk changed the base branch from main to ivan-1732-0 February 7, 2025 13:28

IvanYashchuk marked this pull request as draft February 7, 2025 13:28

IvanYashchuk added 2 commits February 10, 2025 12:59

Merge remote-tracking branch 'upstream/ivan-1732-0' into ivan-1732-1

c8ffc28

Fix test_rematerialize_all_gather

43b6015

crcrpar mentioned this pull request Mar 14, 2025

Trace Transform for Tensor Wrapper Subclasses #1883

Closed

IvanYashchuk mentioned this pull request Mar 18, 2025

Simplify match_fw_and_bw_saved_for_bw_proxies #1885

Merged

4 tasks

stale bot added the won't fix label Apr 16, 2025

t-vi removed the won't fix label Apr 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplify and fix match_fw_and_bw_saved_for_bw_proxies implementation #1754

Simplify and fix match_fw_and_bw_saved_for_bw_proxies implementation #1754

Uh oh!

IvanYashchuk commented Feb 7, 2025 •

edited

Loading

Uh oh!

IvanYashchuk commented Feb 7, 2025

Uh oh!

IvanYashchuk commented Feb 7, 2025

Uh oh!

stale bot commented Apr 16, 2025

Uh oh!

Uh oh!

Simplify and fix match_fw_and_bw_saved_for_bw_proxies implementation #1754

Are you sure you want to change the base?

Simplify and fix match_fw_and_bw_saved_for_bw_proxies implementation #1754

Uh oh!

Conversation

IvanYashchuk commented Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IvanYashchuk commented Feb 7, 2025

Uh oh!

IvanYashchuk commented Feb 7, 2025

Uh oh!

stale bot commented Apr 16, 2025

Uh oh!

Uh oh!

IvanYashchuk commented Feb 7, 2025 •

edited

Loading