Test `rand` in a fusion with zero tensor input #1932

zasdfgbnm · 2022-08-26T05:05:45Z

This PR is stacked on top of @jjsjann123's PR #1914, which should be merged first before merging this PR.

Looks like Jie's PR fixed a bug:

unknown file: Failure
C++ exception with description "device is not coherent for fusion inputs
Exception raised from getKernelRuntimeFor at /home/gaoxiang/nvfuser4/torch/csrc/jit/codegen/cuda/kernel_cache.cpp:220 (most recent call first):
frame #0: <unknown function> + 0x87810 (0x7ff25c0f8810 in /home/gaoxiang/nvfuser4/build/lib/libc10.so)
frame #1: <unknown function> + 0x877a0 (0x7ff25c0f87a0 in /home/gaoxiang/nvfuser4/build/lib/libc10.so)
frame #2: <unknown function> + 0x876a0 (0x7ff25c0f86a0 in /home/gaoxiang/nvfuser4/build/lib/libc10.so)
frame #3: <unknown function> + 0x89908 (0x7ff25c0fa908 in /home/gaoxiang/nvfuser4/build/lib/libc10.so)
frame #4: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x65 (0x7ff25c0f8f25 in /home/gaoxiang/nvfuser4/build/lib/libc10.so)
frame #5: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0x9c (0x7ff25c0f6b6c in /home/gaoxiang/nvfuser4/build/lib/libc10.so)
frame #6: torch::jit::fuser::cuda::FusionExecutorCache::getKernelRuntimeFor(c10::ArrayRef<c10::IValue> const&, unsigned long) + 0xe8 (0x7ff2722794d8 in /home/gaoxiang/nvfuser4/build/lib/libtorch_cuda.so)
frame #7: torch::jit::fuser::cuda::FusionExecutorCache::runFusionWithInputs(c10::ArrayRef<c10::IValue> const&) + 0x54f (0x7ff272278f7f in /home/gaoxiang/nvfuser4/build/lib/libtorch_cuda.so)
frame #8: torch::jit::NVFuserTest_FusionRNGValidateWithCURand_CUDA_Test::TestBody() + 0x33d (0x55e103570c8d in ./build/bin/test_jit)
frame #9: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x7b (0x55e1035cb49b in ./build/bin/test_jit)
frame #10: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x7d (0x55e1035b41cd in ./build/bin/test_jit)
frame #11: testing::Test::Run() + 0xc3 (0x55e10358c673 in ./build/bin/test_jit)
frame #12: testing::TestInfo::Run() + 0x106 (0x55e10358d506 in ./build/bin/test_jit)
frame #13: testing::TestSuite::Run() + 0x111 (0x55e10358ddb1 in ./build/bin/test_jit)
frame #14: testing::internal::UnitTestImpl::RunAllTests() + 0x45b (0x55e10359ff7b in ./build/bin/test_jit)
frame #15: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 0x7b (0x55e1035cdf0b in ./build/bin/test_jit)
frame #16: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 0x83 (0x55e1035b6973 in ./build/bin/test_jit)
frame #17: testing::UnitTest::Run() + 0xd5 (0x55e10359fab5 in ./build/bin/test_jit)
frame #18: <unknown function> + 0x3bf741 (0x55e102d5f741 in ./build/bin/test_jit)
frame #19: main + 0x226 (0x55e102d5f6e6 in ./build/bin/test_jit)
frame #20: <unknown function> + 0x232d0 (0x7ff25bc762d0 in /usr/lib/libc.so.6)
frame #21: __libc_start_main + 0x8a (0x7ff25bc7638a in /usr/lib/libc.so.6)
frame #22: _start + 0x25 (0x55e102d5f315 in ./build/bin/test_jit)
" thrown in the test body.
[  FAILED  ] NVFuserTest.FusionRNGValidateWithCURand_CUDA (4 ms)

TODO: add ways to discard/skip memory allocation and proper check to safe guard that.

KernelArgumentHolder TODO: expand this to KernelPrecomputedIntegers

… standalone-rand

zasdfgbnm · 2022-08-26T05:06:38Z

torch/csrc/jit/codegen/cuda/test/test_gpu_rng.cu

@@ -112,24 +112,21 @@ TEST_F(NVFuserTest, FusionRNGValidateWithCURand_CUDA) {
 auto fusion = fusion_ptr.get();
 FusionGuard fg(fusion);

- TensorView* tv0 = makeSymbolicTensor(1, aten_to_data_type(dtype));


Modified this test to use the nullary rand

… standalone-rand

…rand

csarofeen

LGTM

Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Codegen changes include: - codegen improvement: i. improved view support on pointwise and transpose scheduler ii. grouped grid welford added for better outer-norm grid persistence in normalization - misc: i. new composite ops added: variance_mean , arange, ii. fixes misaligned address for transpose scheduler iii. refactor on separation of compilation API from execution API to prepare us for async compilation iv. double type support on expression evaluator v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN Commits that's in this PR from the devel branch: ``` 89330aa Tensor factories must set the output shape as its input (#1939) b2fd01e arange support (#1933) 56c00fd Double support on all expression evaluators (#1937) 371f282 Improve trivial reduction merge support (#1931) 1d0c267 Test `rand` in a fusion with zero tensor input (#1932) 0dab160 Fix softmax bwd sizes. (#1890) ef98f36 Fix a bug (#1936) 63132a0 Propagate permissive mapping information into indexing pass (#1929) b4ac2c8 Map IterationDomains through view operations. (#1919) c0a187a do not use deprecated functions (#1935) 88de85e Upstream cherry pick fixes 0811 (#1934) b247dcf Separate kernel compilation API from kernel execution API (#1914) b34e3b9 Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924) 14a53e6 Nullary RNGOp (#1892) 3c3c89e Misc fixes/tuning for transpose scheduler (#1912) 20cf109 Grouped grid welford (#1921) 6cf7eb0 Transpose scheduler small dim sizes better support (#1910) 9341ea9 Disabled ViewPersistentShmoo sizes that results in NAN (#1922) 057237f Fix CUDA driver error: misaligned address for transpose scheduler (#1918) 3fb3d80 Add variance_mean function using Welford (#1907) 98febf6 Remove DisableOption::UnrollWithRng (#1913) ee8ef33 Minor fix for the debug interface of using PTX directly (#1917) 6e8f953 Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916) 5eefa9a dopt is only available since nvrtc 11.7 (#1915) 2ec8fc7 Kill computeAtBetween (#1911) d0d106a Improve view support on pointwise and transpose scheduler (#1906) e71e1ec Fix name clash of RNG with shared memory (#1904) 3381793 Fix mutator and sameAs for expanded IterDomain (#1902) ``` RUN_TORCHBENCH: nvfuser Differential Revision: [D39324552](https://our.internmc.facebook.com/intern/diff/D39324552) Pull Request resolved: pytorch#84626 Approved by: https://github.com/malfet

jjsjann123 added 30 commits July 31, 2022 17:37

Added api for query kernel argument

ef1360e

TODO: add ways to discard/skip memory allocation and proper check to safe guard that.

updating kernel arg

d4acec1

adding size() in kernel arg

d7b6d16

adding getPointer in TensorArgAbstract

d3e1707

SchedulerRuntimeInfo to take KernelArgumentHolder

1e4f9f5

fixing build issues

9242048

missing ;

4c01ba7

fixing more compiler errors

b094b5d

remove unwanted changes; adding const qualifier to arg()

e8ea2e4

fixing return for const func arg()

49fdffe

overloading arg() for constness correctness

8e51482

multiple definition error

91359bd

exposing kernel argument holder for test access

d4ebb04

moving index mode computation to KernelArgumentHolder

21a8323

fixing name for function; adding const for getIndexMode()

e962fa6

fixing const for static function

7412fd1

switching scheduling from ArrayRef<at::Tensor> to KernelArgumentHolder

2fc2606

fixing compile issues

0d2d1a0

fixing more renames

b4b3c6e

Converting PrecomputedIntegersBase::FusionPrecomputedIntegers to use

9135a52

KernelArgumentHolder TODO: expand this to KernelPrecomputedIntegers

fixing more KernelArgumentHolder in tests

850ca93

refactor device_index & index_mode to KernelArgumentHolder

1b9a6b5

missing type

d3674f5

switch KernelArgumentHolder creation in benchmakrs

5d40c18

fixing return types

d99dde5

removing duplicated definition

f7ceabd

fixing pushing inputs to KernelArgumentHolder

f4740e1

fixing args construction with permuted tensor

a1144a0

updating push to duplicate entries in KernelArgumentHolder

0f899ab

fixing string name in macro

e1d2a9a

jjsjann123 and others added 19 commits August 19, 2022 02:04

fixing build

ba6ac91

fixing build

8e32779

Merge remote-tracking branch 'origin/devel' into async_compilation

f2a925a

code cleaning with comments

4596bdd

build issues

adbdef3

avoid async failure on input check

524c7e6

rename tests

92c2265

Merge remote-tracking branch 'origin/devel' into HEAD

8b591ea

address review comments

f3bc13a

fixing build

9710da7

fixing build

a7c1bf5

typo

ad0ccbf

Merge remote-tracking branch 'origin/devel' into HEAD

08f7d3b

clangformat

309d12c

more clean up from review comments

9f8bc95

lintrunner

26fafe6

fixing build

58f1daf

Add support for zero-input fusions

8bc1110

Merge branch 'async_compilation' of github.com:csarofeen/pytorch into…

6d09cf7

… standalone-rand

zasdfgbnm commented Aug 26, 2022

View reviewed changes

zasdfgbnm changed the title ~~Fix rand in arith.h~~ Test rand in a fusion with zero tensor input Aug 26, 2022

zasdfgbnm marked this pull request as ready for review August 26, 2022 05:14

jjsjann123 and others added 2 commits August 25, 2022 22:16

lintrunner

e8e9fe2

Merge branch 'async_compilation' of github.com:csarofeen/pytorch into…

1a01ea2

… standalone-rand

Base automatically changed from async_compilation to devel August 26, 2022 05:46

Merge branch 'devel' of github.com:csarofeen/pytorch into standalone-…

37986c8

…rand

csarofeen approved these changes Aug 27, 2022

View reviewed changes

zasdfgbnm merged commit 1d0c267 into devel Aug 27, 2022

zasdfgbnm deleted the standalone-rand branch August 27, 2022 16:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test `rand` in a fusion with zero tensor input #1932

Test `rand` in a fusion with zero tensor input #1932

zasdfgbnm commented Aug 26, 2022 •

edited

Loading

zasdfgbnm Aug 26, 2022

csarofeen left a comment

Test rand in a fusion with zero tensor input #1932

Test rand in a fusion with zero tensor input #1932

Conversation

zasdfgbnm commented Aug 26, 2022 • edited Loading

zasdfgbnm Aug 26, 2022

Choose a reason for hiding this comment

csarofeen left a comment

Choose a reason for hiding this comment

Test `rand` in a fusion with zero tensor input #1932

Test `rand` in a fusion with zero tensor input #1932

zasdfgbnm commented Aug 26, 2022 •

edited

Loading