[functorch] test: try using reference_inputs in vmap tests #91355

kshitij12345 · 2022-12-23T08:52:48Z

Timings:

test_vmap_exhaustive

After PR

== 1168 passed, 55 skipped, 2353 deselected, 153 xfailed in 195.07s (0:03:15) ==

Before PR

== 1134 passed, 55 skipped, 2316 deselected, 150 xfailed in 77.18s (0:01:17) ==

test_op_has_batch_rule

After PR

== 988 passed, 57 skipped, 2353 deselected, 331 xfailed in 144.70s (0:02:24) ==

Before PR

== 969 passed, 57 skipped, 2316 deselected, 313 xfailed in 65.86s (0:01:05) ==

pytorch-bot · 2022-12-23T08:52:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91355

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 Failures

As of commit 21da1aa:

NEW FAILURES - The following jobs have failed:

linux-focal-py3.7-gcc7 / test (jit_legacy, 1, 1, linux.2xlarge)

BROKEN TRUNK - The following jobs failed but were present on the merge base c99a2a4:

linux-focal-py3.7-clang7-asan / test (default, 2, 5, linux.2xlarge)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kshitij12345 · 2023-01-02T10:37:09Z

test/functorch/test_vmap.py

-            sample_inputs_itr = op.sample_inputs(device, dtype, requires_grad=False)
+            sample_inputs_op = {
+                                # Take too long
+                                "special.chebyshev_polynomial_t",


These ops already have skip for taking long time with reference inputs

Eg.

pytorch/torch/testing/_internal/opinfo/definitions/special.py

Lines 374 to 388 in 39d49db

BinaryUfuncInfo(

"special.chebyshev_polynomial_t",

dtypes=all_types_and(torch.bool),

promotes_int_to_float=True,

skips=(

DecorateInfo(unittest.skip("Skipped!"), "TestCudaFuserOpInfo"),

DecorateInfo(unittest.skip("Skipped!"), "TestNNCOpInfo"),

DecorateInfo(

unittest.skip("testing takes an unreasonably long time, #79528"),

"TestCommon",

"test_compare_cpu",

),

),

supports_one_python_scalar=True,

supports_autograd=False,

kshitij12345 · 2023-01-03T06:06:48Z

Will take care of ASAN failure post the review.

zou3519 · 2023-01-03T15:12:16Z

test/functorch/test_vmap.py

+        xfail('__rsub__'),
+        # RuntimeError: Batching rule not implemented for aten::moveaxis.int;
+        # the fallback path doesn't work on out= or view ops.
+        xfail('movedim'),
+        # RuntimeError: NYI: querying is_contiguous inside of vmap for
+        # memory_format other than torch.contiguous_format
+        xfail('contiguous'),
+        # RuntimeError: NYI: Tensor.clone(memory_format) inside vmap is only supported
+        # with memory_format torch.preserve_format or torch.contiguous_format (got ChannelsLast)
+        xfail('clone'),
+        # RuntimeError: When vmap-ing torch.nn.functional.one_hot,
+        # please provide an explicit positive num_classes argument.
+        xfail('nn.functional.one_hot'),


Normally I'd feel bad about adding these xfails, but we do have manual tests for contiguous, clone, one_hot, sub, in the codebase; and movedim is tested just by virtue of being a part of the vmap implementation.

zou3519 · 2023-01-03T15:14:13Z

test/functorch/test_vmap.py

+        # AssertionError
+        # Mismatched elements: 18 / 20 (90.0%)
+        # Greatest absolute difference: 14.031710147857666 at index (0, 5) (up to 0.0001 allowed)
+        # Greatest relative difference: 2.9177700113052603 at index (0, 3) (up to 0.0001 allowed)
+        xfail('narrow_copy', device_type='cpu'),


Can you file an issue for silent correctness? Also, do you know which of the following is the actual problem?

the non-contiguous test is failing

the batching rule is bogus?

narrow_copy has inconsistent semantics on cpu/cuda?

Sure, will file an issue.

I don't think non-contiguous sample is an issue as we haven't added non-contig testing to vmap tests.

Batching rule for narrow_copy seems innocuous and doesn't have special handling for CPU and CUDA.

So maybe the operator has some issue.

Batching Rule Ref:

pytorch/aten/src/ATen/functorch/BatchRulesViews.cpp

Lines 506 to 515 in 3120054

std::tuple<Tensor, optional<int64_t>> narrow_copy_batch_rule(

const Tensor &self, optional<int64_t> self_bdim, int64_t dim, c10::SymInt start, c10::SymInt length)

{

TORCH_INTERNAL_ASSERT(self_bdim.has_value());

auto self_ = moveBatchDimToFront(self, self_bdim);

auto logical_rank = rankWithoutBatchDim(self, self_bdim);

dim = maybe_wrap_dim(dim, logical_rank) + 1;

auto result = self_.narrow_copy_symint(dim, start, length);

return std::make_tuple(result, 0);

If the operator is a problem: if we can come up with some repro that doesn't involve vmap that shows that on the same input (on cpu/cuda with the same strides), it produces different outputs, then that would be great. One idea to "get rid of the vmap" is to use make_fx to trace out what's happening

Sure. Thanks!

Have assigned the issue to myself. Will have a look soon.

More info here : #91690

test/functorch/test_vmap.py

zou3519

LGTM. We should try to dig into if some of the failures are important and file issues for them if so

kshitij12345 · 2023-01-04T12:25:23Z

@pytorchbot merge

pytorchmergebot · 2023-01-04T12:30:10Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-01-04T14:11:13Z

Merge failed

Reason: The following mandatory check(s) failed (Rule superuser):

pull

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

kshitij12345 · 2023-01-06T08:14:23Z

@pytorchbot merge

pytorchmergebot · 2023-01-06T08:16:05Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

kshitij12345 · 2023-01-06T09:53:09Z

@pytorchbot revert

pytorch-bot · 2023-01-06T09:53:10Z

❌ 🤖 pytorchbot command failed:

@pytorchbot revert: error: the following arguments are required: -m/--message, -c/--classification

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Try @pytorchbot --help for more info.

kshitij12345 · 2023-01-06T09:53:53Z

@pytorchbot revert -m"Broke trunk" -c landrace

pytorchmergebot · 2023-01-06T09:57:15Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2023-01-06T09:57:25Z

@kshitij12345 your PR has been successfully reverted.

…91355)" This reverts commit a51090d. Reverted #91355 on behalf of https://github.com/kshitij12345 due to Broke trunk

kshitij12345 · 2023-01-06T14:58:57Z

@pytorchbot merge -f"JIT failure looks unrelated"

pytorchmergebot · 2023-01-06T15:00:32Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

[functorch] test: try using reference_inputs in vmap tests

c7c4e66

pytorch-bot bot added the topic: not user facing topic category label Dec 23, 2022

kshitij12345 mentioned this pull request Dec 23, 2022

[testing] Insufficient coverage in test suite pytorch/functorch#1090

Open

pytorchbot added the open source label Dec 23, 2022

kshitij12345 added 2 commits January 2, 2023 10:13

update

adae0a4

latest master

fa1bd02

kshitij12345 commented Jan 2, 2023

View reviewed changes

make linter happy

7d04fb4

kshitij12345 marked this pull request as ready for review January 2, 2023 19:00

kshitij12345 requested a review from zou3519 as a code owner January 2, 2023 19:00

zou3519 reviewed Jan 3, 2023

View reviewed changes

test/functorch/test_vmap.py Outdated Show resolved Hide resolved

zou3519 approved these changes Jan 3, 2023

View reviewed changes

xfail for UBSAN

faefe61

kshitij12345 mentioned this pull request Jan 4, 2023

narrow_copy : silently incorrect for non-contiguous inputs on CPU #91690

Closed

kshitij12345 added 3 commits January 4, 2023 10:49

xfail with UBSAN and ASAN

93c8aa7

skip for UBSAN

613bd06

skip for new UBSAN failure

c91d9b2

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 4, 2023

kshitij12345 added the keep-going Don't stop on first failure, keep running tests until the end label Jan 4, 2023

kshitij12345 added 4 commits January 4, 2023 14:12

ubsan skip for right_shift

e2ad2bc

update xfail and skip

ccf7621

update skip

311af6a

update xfail

b6ba10c

pytorchmergebot added the Merged label Jan 6, 2023

pytorchmergebot closed this in a51090d Jan 6, 2023

kshitij12345 deleted the fix/vmap/use-reference-inputs branch January 6, 2023 08:29

kshitij12345 restored the fix/vmap/use-reference-inputs branch January 6, 2023 09:52

kshitij12345 reopened this Jan 6, 2023

pytorchmergebot added the Reverted label Jan 6, 2023

pytorchmergebot added a commit that referenced this pull request Jan 6, 2023

Revert "[functorch] test: try using reference_inputs in vmap tests (#…

ad70a70

…91355)" This reverts commit a51090d. Reverted #91355 on behalf of https://github.com/kshitij12345 due to Broke trunk

update xfail based on latest master

4b2e3b5

kshitij12345 force-pushed the fix/vmap/use-reference-inputs branch from eed545d to 4b2e3b5 Compare January 6, 2023 10:29

Merge branch 'pytorch:master' into fix/vmap/use-reference-inputs

21da1aa

pytorchmergebot closed this in 2354ff5 Jan 6, 2023

XiaobingSuper mentioned this pull request Jan 9, 2023

fix norrow_copy correctness issue for non-contiguous input for cpu path #91789

Closed

srossross mentioned this pull request Jan 10, 2023

change vmap tests to dtypes=OpDTypes.any_one #91733

Closed

kshitij12345 mentioned this pull request Jan 10, 2023

Make vmap tests use dtype any_one pytorch/functorch#1092

Closed

	BinaryUfuncInfo(
	"special.chebyshev_polynomial_t",
	dtypes=all_types_and(torch.bool),
	promotes_int_to_float=True,
	skips=(
	DecorateInfo(unittest.skip("Skipped!"), "TestCudaFuserOpInfo"),
	DecorateInfo(unittest.skip("Skipped!"), "TestNNCOpInfo"),
	DecorateInfo(
	unittest.skip("testing takes an unreasonably long time, #79528"),
	"TestCommon",
	"test_compare_cpu",
	),
	),
	supports_one_python_scalar=True,
	supports_autograd=False,

	std::tuple<Tensor, optional<int64_t>> narrow_copy_batch_rule(
	const Tensor &self, optional<int64_t> self_bdim, int64_t dim, c10::SymInt start, c10::SymInt length)
	{
	TORCH_INTERNAL_ASSERT(self_bdim.has_value());
	auto self_ = moveBatchDimToFront(self, self_bdim);
	auto logical_rank = rankWithoutBatchDim(self, self_bdim);
	dim = maybe_wrap_dim(dim, logical_rank) + 1;
	auto result = self_.narrow_copy_symint(dim, start, length);

	return std::make_tuple(result, 0);

[functorch] test: try using reference_inputs in vmap tests #91355

[functorch] test: try using reference_inputs in vmap tests #91355

Uh oh!

Conversation

kshitij12345 commented Dec 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91355

❌ 2 Failures

Uh oh!

kshitij12345 Jan 2, 2023

Choose a reason for hiding this comment

Uh oh!

kshitij12345 commented Jan 3, 2023

Uh oh!

zou3519 Jan 3, 2023

Choose a reason for hiding this comment

Uh oh!

zou3519 Jan 3, 2023

Choose a reason for hiding this comment

Uh oh!

kshitij12345 Jan 4, 2023

Choose a reason for hiding this comment

Uh oh!

zou3519 Jan 4, 2023

Choose a reason for hiding this comment

Uh oh!

kshitij12345 Jan 5, 2023

Choose a reason for hiding this comment

Uh oh!

kshitij12345 Jan 5, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

kshitij12345 commented Jan 4, 2023

Uh oh!

pytorchmergebot commented Jan 4, 2023

Merge started

Uh oh!

pytorchmergebot commented Jan 4, 2023

Merge failed

Uh oh!

kshitij12345 commented Jan 6, 2023

Uh oh!

pytorchmergebot commented Jan 6, 2023

Merge started

Uh oh!

kshitij12345 commented Jan 6, 2023

Uh oh!

pytorch-bot bot commented Jan 6, 2023

Uh oh!

kshitij12345 commented Jan 6, 2023

Uh oh!

pytorchmergebot commented Jan 6, 2023

Uh oh!

pytorchmergebot commented Jan 6, 2023

Uh oh!

kshitij12345 commented Jan 6, 2023

Uh oh!

pytorchmergebot commented Jan 6, 2023

Merge started

Uh oh!

Uh oh!

kshitij12345 commented Dec 23, 2022 •

edited

Loading

pytorch-bot bot commented Dec 23, 2022 •

edited

Loading