We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
My environment: python 3.8.8 cuda 11.1 pytorch 1.7.1/1.8.1/1.9 all failed
2 errors detected in the compilation of "forward_fp32.cu". error: command '/usr/local/cuda-11.1/bin/nvcc' failed with exit status 1
forward_fp32.cu(212): error: more than one instance of constructor "cutlass::Tensor4DCoord::Tensor4DCoord" matches the argu ment list: function "cutlass::Tensor4DCoord::Tensor4DCoord(cutlass::Tensor4DCoord::Index, cutlass::Tensor4DCoord::Index, c utlass::Tensor4DCoord::Index, cutlass::Tensor4DCoord::Index)" function "cutlass::Tensor4DCoord::Tensor4DCoord(cutlass::Tensor4DCoord::LongIndex, cutlass::Tensor4DCoord::Long Index, cutlass::Tensor4DCoord::LongIndex, cutlass::Tensor4DCoord::LongIndex)" argument types are: (int64_t, int64_t, int64_t, int) forward_fp32.cu(232): error: no instance of constructor "cutlass::conv::kernel::ImplicitBatchedGemmTnDepthwiseConvo[6/1944] ma_, Epilogue_, ThreadblockSwizzle_, ConvOperator, ConvProblemSize_>::Arguments::Arguments [with Mma_=cutlass::conv::thread block::MmaTnPrecompPipelined<ThreadblockShape, cutlass::conv::threadblock::Dwconv2dTileIterator<cutlass::MatrixShape<64, 8> , float, cutlass::layout::TensorNCHW, cutlass::transform::PitchLinearStripminedThreadMap<cutlass::layout::PitchLinearShape< 8, 64>, 128, 1>, 1, 0>, cutlass::conv::threadblock::RegularTileIteratorTransposed<cutlass::MatrixShape<64, 8>, float, cutla ss::layout::ColumnMajor, 1, cutlass::conv::threadblock::DefaultMmaCore<ThreadblockShape, WarpShape, cutlass::gemm::GemmShap e<1, 1, 1>, float, cutlass::layout::TensorNCHW, 1, float, cutlass::layout::TensorNCHW, 1, ElementDst, LayoutDst, cutlass::$ rch::OpClassSimt, 2, cutlass::arch::OpMultiplyAdd, true, cutlass::conv::ImplicitGemmMode::GEMM_TN, cutlass::arch::CacheOper ation::Global, cutlass::arch::CacheOperation::Global>::TransposedPitchLinearThreadMapVec, 4>, cutlass::conv::threadblock::D wconv2dTileFilterIteratorFpropPrecomp<cutlass::MatrixShape<8, 128>, float, cutlass::layout::TensorNCHW, cutlass::conv::thre adblock::PitchLinearStripminedThreadMapStrided<cutlass::layout::PitchLinearShape<128, 8>, 128, 1>, 1>, cutlass::transform:: threadblock::RegularTileIterator<cutlass::MatrixShape<8, 128>, float, cutlass::layout::RowMajor, 0, cutlass::conv::threadbl ock::PitchLinearStripminedThreadMapStrided<cutlass::layout::PitchLinearShape<128, 8>, 128, 1>, 4>, ElementDst, LayoutDst, c utlass::gemm::threadblock::MmaPolicy<cutlass::gemm::warp::MmaSimt<WarpShape, float, cutlass::layout::ColumnMajor, float, cu tlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShape<8, 4>, cutlass::layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>, 1, cutlass::ComplexTransform::kNone, cutla ss::ComplexTransform::kNone, __nv_bool>, cutlass::MatrixShape<4, 0>, cutlass::MatrixShape<0, 0>, 1>, cutlass::NumericArrayC onverter<float, float, 4, cutlass::FloatRoundStyle::round_to_nearest>, cutlass::NumericArrayConverter<float, float, 8, cutl ass::FloatRoundStyle::round_to_nearest>, __nv_bool>, Epilogue_=cutlass::epilogue::threadblock::ConvolutionEpilogue<Threadbl ockShape, cutlass::layout::TensorNCHW, 1, cutlass::gemm::warp::MmaSimt<WarpShape, float, cutlass::layout::ColumnMajor, floa t, cutlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShap e<8, 4>, cutlass::layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>, 1, cutlass::ComplexTransform::kNone, cutlass::ComplexTransform::kNone, __nv_bool>, cutlass::epilogue::threadblock::Dwconv2dPredicatedTileIterator<cutlass::epilo gue::threadblock::OutputTileOptimalThreadMap<cutlass::epilogue::threadblock::OutputTileShape<128, 1, 8, 1, 1>, cutlass::epi logue::threadblock::OutputTileShape<1, 4, 2, 1, 8>, 128, 1, 32>, cutlass::layout::TensorNCHW, ElementDst>, cutlass::epilogu e::warp::FragmentIteratorSimt<WarpShape, cutlass::gemm::thread::Mma<cutlass::gemm::GemmShape<8, 8, 1>, float, cutlass::layo ut::ColumnMajor, float, cutlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd, __n v_bool>, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShape<8, 4>, cutlass::layout::RowMajo rInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>, cutlass::epilogue::warp::SimtPolicy<WarpShape, cutlass::gemm::thread:: Mma<cutlass::gemm::GemmShape<8, 8, 1>, float, cutlass::layout::ColumnMajor, float, cutlass::layout::RowMajor, ElementDst, c utlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd, __nv_bool>, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimt Policy<cutlass::MatrixShape<8, 4>, cutlass::layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>>>, cutlass:: epilogue::warp::TileIteratorSimt<WarpShape, cutlass::gemm::thread::Mma<cutlass::gemm::GemmShape<8, 8, 1>, float, cutlass::l ayout::ColumnMajor, float, cutlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd, __nv_bool>, ElementDst, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShape<8, 4>, cutlass:: layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>>, cutlass::epilogue::threadblock::SharedLoadIterator<cut lass::epilogue::threadblock::OutputTileOptimalThreadMap<cutlass::epilogue::threadblock::OutputTileShape<128, 1, 8, 1, 1>, c utlass::epilogue::threadblock::OutputTileShape<1, 4, 2, 1, 8>, 128, 1, 32>::CompactedThreadMap, ElementDst, 4>, cutlass::ep ilogue::threadblock::Dwconv2dBiasTileIterator<cutlass::layout::TensorNCHW, ElementDst, 1>, EpilogueOp, cutlass::MatrixShape <0, 17>, false>, ThreadblockSwizzle_=SwizzleThreadBlock, ConvOperator=cutlass::conv::Operator::kFprop, ConvProblemSize_=cut lass::conv::Conv2dProblemSize]" matches the argument list argument types are: ({...}, cutlass::TensorRef<ElementSrc, LayoutSrc>, cutlass::TensorRef<ElementSrc, LayoutSrc>, long, long, cutlass::TensorRef<ElementSrc, LayoutSrc>, {...})
The text was updated successfully, but these errors were encountered:
Same error occurred on PyTorch 1.10 with CUDA 11.3/11.0 and cuDNN 8.4.1/8.2.0. And we received an error from cutlass
cutlass/include/cutlass/fast_math.h(741): error: no suitable conversion function from "__half" to "float" exists
Sorry, something went wrong.
@ewrfcas We attempted to solve this problem by downgrading Python version to 3.7. It finally works.
Could you please share the environment you used to install? like os version, gcc version, whether used C++14
@sleeplessai
python 3.7.1 still not work. What is the minor version you used?
No branches or pull requests
My environment:
python 3.8.8
cuda 11.1
pytorch 1.7.1/1.8.1/1.9 all failed
2 errors detected in the compilation of "forward_fp32.cu".
error: command '/usr/local/cuda-11.1/bin/nvcc' failed with exit status 1
The text was updated successfully, but these errors were encountered: