We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
./build/bin/nvfuser_bench --benchmark_filter=.*Transpose_Random_fp16_Inner_2D_01_Axis.*
terminate called after throwing an instance of 'c10::Error' what(): !mismatch INTERNAL ASSERT FAILED at "/home/gaoxiang/nvfuser/torch/csrc/jit/codegen/cuda/executor_utils.cpp":354, please report a bug to PyTorch. Found one or more invalid arguments: Argument element type is Half, but the parameter is float Argument element type is Half, but the parameter is float Exception raised from validateKernelInputs at /home/gaoxiang/nvfuser/torch/csrc/jit/codegen/cuda/executor_utils.cpp:354 (most recent call first): frame #0: <unknown function> + 0x85b50 (0x7f3dd037ab50 in /home/gaoxiang/nvfuser/build/lib/libc10.so) frame #1: <unknown function> + 0x85ae0 (0x7f3dd037aae0 in /home/gaoxiang/nvfuser/build/lib/libc10.so) frame #2: <unknown function> + 0x859e0 (0x7f3dd037a9e0 in /home/gaoxiang/nvfuser/build/lib/libc10.so) frame #3: <unknown function> + 0x87c48 (0x7f3dd037cc48 in /home/gaoxiang/nvfuser/build/lib/libc10.so) frame #4: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x65 (0x7f3dd037b265 in /home/gaoxiang/nvfuser/build/lib/libc10.so) frame #5: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x7a (0x7f3dd0378c0a in /home/gaoxiang/nvfuser/build/lib/libc10.so) frame #6: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x5d (0x7f3dd0378e7d in /home/gaoxiang/nvfuser/build/lib/libc10.so) frame #7: <unknown function> + 0x6604a15 (0x7f3df0426a15 in /home/gaoxiang/nvfuser/build/lib/libtorch_cuda.so) frame #8: torch::jit::fuser::cuda::FusionExecutor::runFusion(c10::ArrayRef<c10::IValue> const&, std::vector<at::Tensor, std::allocator<at::Tensor> > const&, torch::jit::fuser::cuda::LaunchParams const&, c10::optional<unsigned long> const&) + 0x1385 (0x7f3df03967b5 in /home/gaoxiang/nvfuser/build/lib/libtorch_cuda.so) frame #9: <unknown function> + 0x6772350 (0x7f3df0594350 in /home/gaoxiang/nvfuser/build/lib/libtorch_cuda.so) frame #10: torch::jit::fuser::cuda::FusionKernelRuntime::runKernelWithInput(c10::ArrayRef<c10::IValue> const&, unsigned long, torch::jit::fuser::cuda::SegmentedGroup*) + 0x7a8 (0x7f3df0584af8 in /home/gaoxiang/nvfuser/build/lib/libtorch_cuda.so) frame #11: torch::jit::fuser::cuda::FusionKernelRuntime::runWithInput(c10::ArrayRef<c10::IValue> const&, unsigned long) + 0x747 (0x7f3df0581ca7 in /home/gaoxiang/nvfuser/build/lib/libtorch_cuda.so) frame #12: torch::jit::fuser::cuda::FusionExecutorCache::runFusionWithInputs(c10::ArrayRef<c10::IValue> const&) + 0x5a3 (0x7f3df0580db3 in /home/gaoxiang/nvfuser/build/lib/libtorch_cuda.so) frame #13: runBenchmarkIterations(benchmark::State&, torch::jit::fuser::cuda::FusionExecutorCache*, std::vector<c10::IValue, std::allocator<c10::IValue> >&) + 0x68 (0x55623e2b1c88 in ./build/bin/nvfuser_bench) frame #14: <unknown function> + 0x15e080 (0x55623e299080 in ./build/bin/nvfuser_bench) frame #15: NF_Transpose_Random_fp16_Inner_2D_01_Axis___GRAPH_NF_Transpose_Random_fp16_Inner_2D_01_Axis_Benchmark::BenchmarkCase(benchmark::State&) + 0x7d (0x55623e2975ed in ./build/bin/nvfuser_bench) frame #16: <unknown function> + 0xdbb60 (0x55623e216b60 in ./build/bin/nvfuser_bench) frame #17: benchmark::internal::BenchmarkInstance::Run(unsigned long, int, benchmark::internal::ThreadTimer*, benchmark::internal::ThreadManager*, benchmark::internal::PerfCountersMeasurement*) const + 0x82 (0x55623e32b6c2 in ./build/bin/nvfuser_bench) frame #18: <unknown function> + 0x1cbda9 (0x55623e306da9 in ./build/bin/nvfuser_bench) frame #19: benchmark::internal::BenchmarkRunner::DoNIterations() + 0x351 (0x55623e306851 in ./build/bin/nvfuser_bench) frame #20: benchmark::internal::BenchmarkRunner::DoOneRepetition() + 0xd8 (0x55623e307528 in ./build/bin/nvfuser_bench) frame #21: <unknown function> + 0x17ea7c (0x55623e2b9a7c in ./build/bin/nvfuser_bench) frame #22: benchmark::RunSpecifiedBenchmarks(benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x5a6 (0x55623e2b8ca6 in ./build/bin/nvfuser_bench) frame #23: benchmark::RunSpecifiedBenchmarks() + 0x39 (0x55623e2b86a9 in ./build/bin/nvfuser_bench) frame #24: main + 0x5a (0x55623e2b714a in ./build/bin/nvfuser_bench) frame #25: <unknown function> + 0x29290 (0x7f3db6de6290 in /usr/lib/libc.so.6) frame #26: __libc_start_main + 0x8a (0x7f3db6de634a in /usr/lib/libc.so.6) frame #27: _start + 0x25 (0x55623e213f75 in ./build/bin/nvfuser_bench)
devel
The text was updated successfully, but these errors were encountered:
Successfully merging a pull request may close this issue.
🐛 Describe the bug
Versions
devel
The text was updated successfully, but these errors were encountered: