-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify and fix match_fw_and_bw_saved_for_bw_proxies implementation #1754
Draft
IvanYashchuk
wants to merge
3
commits into
ivan-1732-0
Choose a base branch
from
ivan-1732-1
base: ivan-1732-0
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Failures to fix: FAILED thunder/tests/test_inplace_copy.py::test_prim_inplace_copy_bwd_nvfuser_cuda_thunder.dtypes.bfloat16 - AssertionError
FAILED thunder/tests/test_inplace_copy.py::test_prim_inplace_copy_bwd_nvfuser_cuda_thunder.dtypes.float16 - AssertionError
FAILED thunder/tests/test_torch_compile_executor.py::test_litgpt_fabric_for_callable - AssertionError
FAILED thunder/tests/test_torch_compile_executor.py::test_torch_compile_cat_rope_single_fusion - AssertionError
FAILED thunder/tests/test_transforms.py::test_disable_params_and_buffer_check - AssertionError
FAILED thunder/tests/test_jit_general.py::test_tom_overrides_proxy[cuda] - AssertionError
FAILED thunder/tests/test_jit_general.py::test_litgpt_variants[cuda-llama1-like] - AssertionError
FAILED thunder/tests/test_sdpaex_executor.py::test_sdpa_attn_mask[True-cuda-bf16] - AssertionError
FAILED thunder/tests/test_sdpaex_executor.py::test_sdpa_attn_mask[True-cuda-f16] - AssertionError
FAILED thunder/tests/test_sdpaex_executor.py::test_sdpa_attn_mask[False-cuda-f16] - AssertionError
FAILED thunder/tests/test_dynamo.py::test_ThunderCompilerGraphBenchmarking_LlamaMLPBenchmark - torch._dynamo.exc.BackendCompilerFailed: backend='?' raised:
AssertionError:
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True
FAILED thunder/tests/test_sdpaex_executor.py::test_sdpa_attn_mask[False-cuda-bf16] - AssertionError
FAILED thunder/tests/test_jit_general.py::test_litgpt_variants[cuda-gpt-neox-like] - AssertionError
FAILED thunder/tests/test_jit_general.py::test_litgpt_variants[cuda-long-context-like] - AssertionError
FAILED thunder/tests/test_jit_general.py::test_litgpt_variants[cuda-llama2-like] - AssertionError
FAILED thunder/tests/test_jit_general.py::test_litgpt_variants[cuda-codellama2-like] - AssertionError
= 16 failed, 2162 passed, 228 skipped, 22 xfailed, 7 xpassed, 51635 warnings in 426.08s (0:07:06) = |
698dfc7
to
3c4b0b3
Compare
One more test to fix: FAILED thunder/tests/distributed/test_fsdp.py::FSDPTest::test_rematerialize_all_gather - RuntimeError: Process 1 exited with error code 10 and exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 726, in run_test
getattr(self, test_name)()
File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 599, in wrapper
fn()
File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 3120, in wrapper
method(*args, **kwargs)
File "/__w/3/s/thunder/tests/distributed/test_fsdp.py", line 136, in test_rematerialize_all_gather
self.assertTrue(all(t in result_saved_for_bwd for t in sharded_param_names))
File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue
raise self.failureException(msg)
AssertionError: False is not true |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Base PR: #1756, keeping draft mode to prevent merges into previous PR in the stack
This change is needed to unblock @jjsjann123 for #1732.
The previous implementation uses both tensors and non-tensors saved for backward in constructing
old_saved_for_backward_fw
but in construction of the mirrored object from the backward trace non-tensors are ignored leading to an error that @jjsjann123 observed while working on #1732.cc @mruberry @lantiga @ali-alshaar7