erros in installing cutlass #4

miracleagi · 2022-07-12T10:51:03Z

cutlass guide step 3 can not solve the errors as follows. Are there any way to install cutlass correctly?
FAILED: /home/xx/cutlass/examples/19_large_depthwise_conv2d_torch_extension/build/temp.linux-x86_64-3.8/backward_data_fp32.o
/usr/local/cuda-10.2/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/xx/cutlass/examples/19_large_depthwise_conv2d_torch_extension/build/temp.linux-x86_64-3.8/backward_data_fp32.o.d -I. -I/home/xx/cutlass/include -I/home/xx/cutlass/tools/library/include -I/home/xx/cutlass/tools/util/include -I/home/xx/cutlass/examples/common -I/root/miniconda3/lib/python3.8/site-packages/torch/include -I/root/miniconda3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/root/miniconda3/lib/python3.8/site-packages/torch/include/TH -I/root/miniconda3/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda-10.2/include -I/root/miniconda3/include/python3.8 -c -c /home/xx/cutlass/examples/19_large_depthwise_conv2d_torch_extension/backward_data_fp32.cu -o /home/xx/cutlass/examples/19_large_depthwise_conv2d_torch_extension/build/temp.linux-x86_64-3.8/backward_data_fp32.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -g -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_depthwise_conv2d_implicit_gemm_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 -std=c++14
/home/zhengchuanchuan/cutlass/examples/19_large_depthwise_conv2d_torch_extension/backward_data_fp32.cu(212): error: more than one instance of constructor "cutlass::Tensor4DCoord::Tensor4DCoord" matches the argument list:
function "cutlass::Tensor4DCoord::Tensor4DCoord(cutlass::Tensor4DCoord::Index, cutlass::Tensor4DCoord::Index, cutlass::Tensor4DCoord::Index, cutlass::Tensor4DCoord::Index)"
function "cutlass::Tensor4DCoord::Tensor4DCoord(cutlass::Tensor4DCoord::LongIndex, cutlass::Tensor4DCoord::LongIndex, cutlass::Tensor4DCoord::LongIndex, cutlass::Tensor4DCoord::LongIndex)"
argument types are: (int64_t, int64_t, int64_t, int)

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

Shiweiliuiiiiiii · 2022-07-12T11:20:29Z

Can you try to see if this help?

miracleagi · 2022-07-13T03:17:21Z

Can you try to see if this help?

tried. However, there still exists one error.
error info:
backward_data_fp16.cu(215): error: no instance of constructor "cutlass::conv::kernel::ImplicitBatchedGemmTnDepthwiseConvolution<Mma_, Epilogue_, ThreadblockSwizzle_, ConvOperator, ConvProblemSize_>::Arguments::Arguments [with Mma_=cutlass::conv::threadblock::MmaTnPrecompPipelined<ThreadblockShape, cutlass::conv::threadblock::Dwconv2dTileIterator<cutlass::MatrixShape<64, 32>, ElementSrc, cutlass::layout::TensorNCHW, cutlass::transform::PitchLinearWarpRakedThreadMap<cutlass::layout::PitchLinearShape<32, 64>, 256, cutlass::layout::PitchLinearShape<4, 8>, 8>, 1, 0>, cutlass::transform::threadblock::RegularTileIterator<cutlass::MatrixShape<64, 32>, ElementSrc, cutlass::layout::RowMajorVoltaTensorOpMultiplicandCrosswise<16, 32>, 0, cutlass::transform::PitchLinearWarpRakedThreadMap<cutlass::layout::PitchLinearShape<32, 64>, 256, cutlass::layout::PitchLinearShape<4, 8>, 8>, 16>, cutlass::conv::threadblock::Dwconv2dTileFilterIteratorDgradPrecomp<cutlass::MatrixShape<32, 128>, ElementFilter, cutlass::layout::TensorNCHW, cutlass::transform::PitchLinearWarpRakedThreadMap<cutlass::layout::PitchLinearShape<128, 32>, 256, cutlass::layout::PitchLinearShape<8, 4>, 8>, 1>, cutlass::transform::threadblock::RegularTileIterator<cutlass::MatrixShape<32, 128>, ElementFilter, cutlass::layout::RowMajorVoltaTensorOpMultiplicandBCongruous<16>, 0, cutlass::transform::PitchLinearWarpRakedThreadMap<cutlass::layout::PitchLinearShape<128, 32>, 256, cutlass::layout::PitchLinearShape<8, 4>, 8>, 16>, ElementAccumulator, LayoutDst, cutlass::gemm::threadblock::MmaPolicy<cutlass::gemm::warp::MmaVoltaTensorOp<WarpShape, ElementSrc, cutlass::layout::RowMajorVoltaTensorOpMultiplicandCrosswise<16, 32>, ElementFilter, cutlass::layout::RowMajorVoltaTensorOpMultiplicandBCongruous<16>, ElementAccumulator, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaTensorOpPolicy<cutlass::arch::Mma<cutlass::gemm::GemmShape<16, 16, 4>, 32, ElementSrc, cutlass::layout::RowMajor, ElementFilter, cutlass::layout::RowMajor, ElementAccumulator, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, cutlass::MatrixShape<1, 1>>, __nv_bool>, cutlass::MatrixShape<0, 0>, cutlass::MatrixShape<0, 0>, 1>, cutlass::NumericArrayConverter<ElementSrc, ElementSrc, 8, cutlass::FloatRoundStyle::round_to_nearest>, cutlass::NumericArrayConverter<ElementFilter, ElementFilter, 16, cutlass::FloatRoundStyle::round_to_nearest>, _nv_bool>, Epilogue=cutlass::epilogue::threadblock::ConvolutionEpilogue<ThreadblockShape, cutlass::layout::TensorNCHW, 1, cutlass::gemm::warp::MmaVoltaTensorOp<WarpShape, ElementSrc, cutlass::layout::RowMajorVoltaTensorOpMultiplicandCrosswise<16, 32>, ElementFilter, cutlass::layout::RowMajorVoltaTensorOpMultiplicandBCongruous<16>, ElementAccumulator, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaTensorOpPolicy<cutlass::arch::Mma<cutlass::gemm::GemmShape<16, 16, 4>, 32, ElementSrc, cutlass::layout::RowMajor, ElementFilter, cutlass::layout::RowMajor, ElementAccumulator, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, cutlass::MatrixShape<1, 1>>, nv_bool>, cutlass::epilogue::threadblock::Dwconv2dPredicatedTileIterator<cutlass::epilogue::threadblock::OutputTileOptimalThreadMap<cutlass::epilogue::threadblock::OutputTileShape<128, 4, 4, 2, 1>, cutlass::epilogue::threadblock::OutputTileShape<1, 2, 1, 1, 2>, 256, 1, 16>, cutlass::layout::TensorNCHW, ElementDst>, cutlass::epilogue::warp::FragmentIteratorVoltaTensorOp<WarpShape, cutlass::gemm::GemmShape<32, 32, 4>, ElementAccumulator, cutlass::layout::RowMajor>, cutlass::epilogue::warp::TileIteratorVoltaTensorOp<WarpShape, cutlass::gemm::GemmShape<32, 32, 4>, ElementAccumulator, cutlass::layout::RowMajor>, cutlass::epilogue::threadblock::SharedLoadIterator<cutlass::epilogue::threadblock::OutputTileOptimalThreadMap<cutlass::epilogue::threadblock::OutputTileShape<128, 4, 4, 2, 1>, cutlass::epilogue::threadblock::OutputTileShape<1, 2, 1, 1, 2>, 256, 1, 16>::CompactedThreadMap, ElementAccumulator, 4>, cutlass::epilogue::threadblock::Dwconv2dBiasTileIterator<cutlass::layout::TensorNCHW, ElementDst, 1>, EpilogueOp, cutlass::MatrixShape<0, 2>, false>, ThreadblockSwizzle=SwizzleThreadBlock, ConvOperator=cutlass::conv::Operator::kDgrad, ConvProblemSize=cutlass::conv::Conv2dProblemSize]" matches the argument list
argument types are: ({...}, cutlass::TensorRef<ElementSrc, LayoutSrc>, cutlass::TensorRef<ElementSrc, LayoutSrc>, long, long, cutlass::TensorRef<ElementSrc, LayoutSrc>, {...})

python: 3.7.13

Shiweiliuiiiiiii · 2022-07-16T10:51:33Z

Hi please see my reply here to see if you can install it successfully.

Shiweiliuiiiiiii closed this as completed Jul 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

erros in installing cutlass #4

erros in installing cutlass #4

miracleagi commented Jul 12, 2022

Shiweiliuiiiiiii commented Jul 12, 2022

miracleagi commented Jul 13, 2022

Shiweiliuiiiiiii commented Jul 16, 2022

erros in installing cutlass #4

erros in installing cutlass #4

Comments

miracleagi commented Jul 12, 2022

Shiweiliuiiiiiii commented Jul 12, 2022

miracleagi commented Jul 13, 2022

Shiweiliuiiiiiii commented Jul 16, 2022