Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvfuser_bench is broken #1829

Closed
zasdfgbnm opened this issue Jul 15, 2022 · 0 comments · Fixed by #1839
Closed

nvfuser_bench is broken #1829

zasdfgbnm opened this issue Jul 15, 2022 · 0 comments · Fixed by #1839

Comments

@zasdfgbnm
Copy link
Collaborator

zasdfgbnm commented Jul 15, 2022

🐛 Describe the bug

./build/bin/nvfuser_bench --benchmark_filter=.*Transpose_Random_fp16_Inner_2D_01_Axis.*
terminate called after throwing an instance of 'c10::Error'
  what():  !mismatch INTERNAL ASSERT FAILED at "/home/gaoxiang/nvfuser/torch/csrc/jit/codegen/cuda/executor_utils.cpp":354, please report a bug to PyTorch. Found one or more invalid arguments: Argument element type is Half, but the parameter is float
Argument element type is Half, but the parameter is float

Exception raised from validateKernelInputs at /home/gaoxiang/nvfuser/torch/csrc/jit/codegen/cuda/executor_utils.cpp:354 (most recent call first):
frame #0: <unknown function> + 0x85b50 (0x7f3dd037ab50 in /home/gaoxiang/nvfuser/build/lib/libc10.so)
frame #1: <unknown function> + 0x85ae0 (0x7f3dd037aae0 in /home/gaoxiang/nvfuser/build/lib/libc10.so)
frame #2: <unknown function> + 0x859e0 (0x7f3dd037a9e0 in /home/gaoxiang/nvfuser/build/lib/libc10.so)
frame #3: <unknown function> + 0x87c48 (0x7f3dd037cc48 in /home/gaoxiang/nvfuser/build/lib/libc10.so)
frame #4: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x65 (0x7f3dd037b265 in /home/gaoxiang/nvfuser/build/lib/libc10.so)
frame #5: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x7a (0x7f3dd0378c0a in /home/gaoxiang/nvfuser/build/lib/libc10.so)
frame #6: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x5d (0x7f3dd0378e7d in /home/gaoxiang/nvfuser/build/lib/libc10.so)
frame #7: <unknown function> + 0x6604a15 (0x7f3df0426a15 in /home/gaoxiang/nvfuser/build/lib/libtorch_cuda.so)
frame #8: torch::jit::fuser::cuda::FusionExecutor::runFusion(c10::ArrayRef<c10::IValue> const&, std::vector<at::Tensor, std::allocator<at::Tensor> > const&, torch::jit::fuser::cuda::LaunchParams const&, c10::optional<unsigned long> const&) + 0x1385 (0x7f3df03967b5 in /home/gaoxiang/nvfuser/build/lib/libtorch_cuda.so)
frame #9: <unknown function> + 0x6772350 (0x7f3df0594350 in /home/gaoxiang/nvfuser/build/lib/libtorch_cuda.so)
frame #10: torch::jit::fuser::cuda::FusionKernelRuntime::runKernelWithInput(c10::ArrayRef<c10::IValue> const&, unsigned long, torch::jit::fuser::cuda::SegmentedGroup*) + 0x7a8 (0x7f3df0584af8 in /home/gaoxiang/nvfuser/build/lib/libtorch_cuda.so)
frame #11: torch::jit::fuser::cuda::FusionKernelRuntime::runWithInput(c10::ArrayRef<c10::IValue> const&, unsigned long) + 0x747 (0x7f3df0581ca7 in /home/gaoxiang/nvfuser/build/lib/libtorch_cuda.so)
frame #12: torch::jit::fuser::cuda::FusionExecutorCache::runFusionWithInputs(c10::ArrayRef<c10::IValue> const&) + 0x5a3 (0x7f3df0580db3 in /home/gaoxiang/nvfuser/build/lib/libtorch_cuda.so)
frame #13: runBenchmarkIterations(benchmark::State&, torch::jit::fuser::cuda::FusionExecutorCache*, std::vector<c10::IValue, std::allocator<c10::IValue> >&) + 0x68 (0x55623e2b1c88 in ./build/bin/nvfuser_bench)
frame #14: <unknown function> + 0x15e080 (0x55623e299080 in ./build/bin/nvfuser_bench)
frame #15: NF_Transpose_Random_fp16_Inner_2D_01_Axis___GRAPH_NF_Transpose_Random_fp16_Inner_2D_01_Axis_Benchmark::BenchmarkCase(benchmark::State&) + 0x7d (0x55623e2975ed in ./build/bin/nvfuser_bench)
frame #16: <unknown function> + 0xdbb60 (0x55623e216b60 in ./build/bin/nvfuser_bench)
frame #17: benchmark::internal::BenchmarkInstance::Run(unsigned long, int, benchmark::internal::ThreadTimer*, benchmark::internal::ThreadManager*, benchmark::internal::PerfCountersMeasurement*) const + 0x82 (0x55623e32b6c2 in ./build/bin/nvfuser_bench)
frame #18: <unknown function> + 0x1cbda9 (0x55623e306da9 in ./build/bin/nvfuser_bench)
frame #19: benchmark::internal::BenchmarkRunner::DoNIterations() + 0x351 (0x55623e306851 in ./build/bin/nvfuser_bench)
frame #20: benchmark::internal::BenchmarkRunner::DoOneRepetition() + 0xd8 (0x55623e307528 in ./build/bin/nvfuser_bench)
frame #21: <unknown function> + 0x17ea7c (0x55623e2b9a7c in ./build/bin/nvfuser_bench)
frame #22: benchmark::RunSpecifiedBenchmarks(benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x5a6 (0x55623e2b8ca6 in ./build/bin/nvfuser_bench)
frame #23: benchmark::RunSpecifiedBenchmarks() + 0x39 (0x55623e2b86a9 in ./build/bin/nvfuser_bench)
frame #24: main + 0x5a (0x55623e2b714a in ./build/bin/nvfuser_bench)
frame #25: <unknown function> + 0x29290 (0x7f3db6de6290 in /usr/lib/libc.so.6)
frame #26: __libc_start_main + 0x8a (0x7f3db6de634a in /usr/lib/libc.so.6)
frame #27: _start + 0x25 (0x55623e213f75 in ./build/bin/nvfuser_bench)

Versions

devel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant