============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_i16832gemm_s4_s8_256x128_64x3_tn_align32 Status: Success Verification: ON Disposition: Passed reference_device: Passed cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s32:column --D=s32:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=256 --cta_n=128 --cta_k=64 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=3 --warps_m=4 --warps_n=2 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 68550656 bytes FLOPs: 58010370048 flops FLOPs/Byte: 846 Runtime: 0.239811 ms Memory: 266.222 GiB/s Math: 241901 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_i16832gemm_s4_s8_128x256_64x3_tn_align32 Status: Success Verification: ON Disposition: Passed reference_device: Passed cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s32:column --D=s32:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=128 --cta_n=256 --cta_k=64 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=3 --warps_m=2 --warps_n=4 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 68550656 bytes FLOPs: 58010370048 flops FLOPs/Byte: 846 Runtime: 0.22529 ms Memory: 283.38 GiB/s Math: 257492 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_i16832gemm_s4_s8_256x64_64x4_tn_align32 Status: Success Verification: ON Disposition: Passed reference_device: Passed cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s32:column --D=s32:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=256 --cta_n=64 --cta_k=64 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=4 --warps_m=4 --warps_n=1 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 68550656 bytes FLOPs: 58010370048 flops FLOPs/Byte: 846 Runtime: 0.196229 ms Memory: 325.348 GiB/s Math: 295626 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_i16832gemm_s4_s8_64x256_64x4_tn_align32 Status: Success Verification: ON Disposition: Passed reference_device: Passed cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s32:column --D=s32:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=64 --cta_n=256 --cta_k=64 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=4 --warps_m=1 --warps_n=4 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 68550656 bytes FLOPs: 58010370048 flops FLOPs/Byte: 846 Runtime: 0.212132 ms Memory: 300.958 GiB/s Math: 273464 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_i16832gemm_s4_s8_32x256_64x4_tn_align32 Status: Success Verification: ON Disposition: Passed reference_device: Passed cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s32:column --D=s32:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=32 --cta_n=256 --cta_k=64 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=4 --warps_m=1 --warps_n=4 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 68550656 bytes FLOPs: 58010370048 flops FLOPs/Byte: 846 Runtime: 0.287785 ms Memory: 221.842 GiB/s Math: 201575 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_i16832gemm_s4_s8_128x128_64x5_tn_align32 Status: Success Verification: ON Disposition: Passed reference_device: Passed cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s32:column --D=s32:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=128 --cta_n=128 --cta_k=64 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=5 --warps_m=2 --warps_n=2 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 68550656 bytes FLOPs: 58010370048 flops FLOPs/Byte: 846 Runtime: 0.180972 ms Memory: 352.778 GiB/s Math: 320550 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_i16832gemm_s4_s8_64x128_64x6_tn_align32 Status: Success Verification: ON Disposition: Passed reference_device: Passed cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s32:column --D=s32:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=64 --cta_n=128 --cta_k=64 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=6 --warps_m=2 --warps_n=2 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 68550656 bytes FLOPs: 58010370048 flops FLOPs/Byte: 846 Runtime: 0.255652 ms Memory: 249.725 GiB/s Math: 226912 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_i16832gemm_s4_s8_256x128_128x3_tn_align32 Status: Success Verification: ON Disposition: Passed reference_device: Passed cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s32:column --D=s32:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=256 --cta_n=128 --cta_k=128 --cluster_m=1 \ --cluster_n=1 --cluster_k=1 --stages=3 --warps_m=4 --warps_n=2 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 \ --min_cc=80 --max_cc=1024 Bytes: 68550656 bytes FLOPs: 58010370048 flops FLOPs/Byte: 846 Runtime: 0.189368 ms Memory: 337.135 GiB/s Math: 306336 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_i16832gemm_s4_s8_128x256_128x3_tn_align32 Status: Success Verification: ON Disposition: Passed reference_device: Passed cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s32:column --D=s32:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=128 --cta_n=256 --cta_k=128 --cluster_m=1 \ --cluster_n=1 --cluster_k=1 --stages=3 --warps_m=2 --warps_n=4 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 \ --min_cc=80 --max_cc=1024 Bytes: 68550656 bytes FLOPs: 58010370048 flops FLOPs/Byte: 846 Runtime: 0.225495 ms Memory: 283.123 GiB/s Math: 257258 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_i16832gemm_s4_s8_256x64_128x4_tn_align32 Status: Success Verification: ON Disposition: Passed reference_device: Passed cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s32:column --D=s32:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=256 --cta_n=64 --cta_k=128 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=4 --warps_m=4 --warps_n=1 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 68550656 bytes FLOPs: 58010370048 flops FLOPs/Byte: 846 Runtime: 0.233912 ms Memory: 272.935 GiB/s Math: 248000 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_i16832gemm_s4_s8_64x256_128x4_tn_align32 Status: Success Verification: ON Disposition: Passed reference_device: Passed cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s32:column --D=s32:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=64 --cta_n=256 --cta_k=128 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=4 --warps_m=1 --warps_n=4 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 68550656 bytes FLOPs: 58010370048 flops FLOPs/Byte: 846 Runtime: 0.248617 ms Memory: 256.792 GiB/s Math: 233332 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_i16832gemm_s4_s8_256x32_128x4_tn_align32 Status: Success Verification: ON Disposition: Passed reference_device: Passed cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s32:column --D=s32:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=256 --cta_n=32 --cta_k=128 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=4 --warps_m=4 --warps_n=1 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 68550656 bytes FLOPs: 58010370048 flops FLOPs/Byte: 846 Runtime: 0.261468 ms Memory: 244.17 GiB/s Math: 221864 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_i16832gemm_s4_s8_32x256_128x4_tn_align32 Status: Success Verification: ON Disposition: Passed reference_device: Passed cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s32:column --D=s32:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=32 --cta_n=256 --cta_k=128 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=4 --warps_m=1 --warps_n=4 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 68550656 bytes FLOPs: 58010370048 flops FLOPs/Byte: 846 Runtime: 0.279398 ms Memory: 228.501 GiB/s Math: 207626 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_i16832gemm_s4_s8_128x128_128x4_tn_align32 Status: Success Verification: ON Disposition: Passed reference_device: Passed cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s32:column --D=s32:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=128 --cta_n=128 --cta_k=128 --cluster_m=1 \ --cluster_n=1 --cluster_k=1 --stages=4 --warps_m=2 --warps_n=2 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 \ --min_cc=80 --max_cc=1024 Bytes: 68550656 bytes FLOPs: 58010370048 flops FLOPs/Byte: 846 Runtime: 0.223918 ms Memory: 285.117 GiB/s Math: 259070 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_i16832gemm_s4_s8_64x128_128x3_tn_align32 Status: Success Verification: ON Disposition: Passed reference_device: Passed cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s32:column --D=s32:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=64 --cta_n=128 --cta_k=128 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=3 --warps_m=2 --warps_n=2 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 68550656 bytes FLOPs: 58010370048 flops FLOPs/Byte: 846 Runtime: 0.235264 ms Memory: 271.367 GiB/s Math: 246576 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_i16832gemm_s4_s8_128x32_128x4_tn_align32 Status: Success Verification: ON Disposition: Passed reference_device: Passed cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s32:column --D=s32:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=128 --cta_n=32 --cta_k=128 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=4 --warps_m=4 --warps_n=1 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 68550656 bytes FLOPs: 58010370048 flops FLOPs/Byte: 846 Runtime: 0.296858 ms Memory: 215.062 GiB/s Math: 195415 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_s8_i16832gemm_s4_s8_256x128_64x3_tn_align32 Status: Success Verification: ON Disposition: Incorrect reference_device: Incorrect cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s8:column --D=s8:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=256 --cta_n=128 --cta_k=64 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=3 --warps_m=4 --warps_n=2 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 26083328 bytes FLOPs: 58010370048 flops FLOPs/Byte: 2224 Runtime: 0.180244 ms Memory: 134.772 GiB/s Math: 321843 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_s8_i16832gemm_s4_s8_128x256_64x3_tn_align32 Status: Success Verification: ON Disposition: Incorrect reference_device: Incorrect cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s8:column --D=s8:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=128 --cta_n=256 --cta_k=64 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=3 --warps_m=2 --warps_n=4 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 26083328 bytes FLOPs: 58010370048 flops FLOPs/Byte: 2224 Runtime: 0.227871 ms Memory: 106.604 GiB/s Math: 254576 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_s8_i16832gemm_s4_s8_256x64_64x4_tn_align32 Status: Success Verification: ON Disposition: Incorrect reference_device: Incorrect cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s8:column --D=s8:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=256 --cta_n=64 --cta_k=64 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=4 --warps_m=4 --warps_n=1 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 26083328 bytes FLOPs: 58010370048 flops FLOPs/Byte: 2224 Runtime: 0.189522 ms Memory: 128.175 GiB/s Math: 306088 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_s8_i16832gemm_s4_s8_64x256_64x4_tn_align32 Status: Success Verification: ON Disposition: Incorrect reference_device: Incorrect cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s8:column --D=s8:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=64 --cta_n=256 --cta_k=64 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=4 --warps_m=1 --warps_n=4 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 26083328 bytes FLOPs: 58010370048 flops FLOPs/Byte: 2224 Runtime: 0.224686 ms Memory: 108.115 GiB/s Math: 258184 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_s8_i16832gemm_s4_s8_32x256_64x4_tn_align32 Status: Success Verification: ON Disposition: Incorrect reference_device: Incorrect cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s8:column --D=s8:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=32 --cta_n=256 --cta_k=64 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=4 --warps_m=1 --warps_n=4 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 26083328 bytes FLOPs: 58010370048 flops FLOPs/Byte: 2224 Runtime: 0.287273 ms Memory: 84.5607 GiB/s Math: 201935 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_s8_i16832gemm_s4_s8_128x128_64x5_tn_align32 Status: Success Verification: ON Disposition: Incorrect reference_device: Incorrect cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s8:column --D=s8:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=128 --cta_n=128 --cta_k=64 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=5 --warps_m=2 --warps_n=2 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 26083328 bytes FLOPs: 58010370048 flops FLOPs/Byte: 2224 Runtime: 0.177213 ms Memory: 137.078 GiB/s Math: 327347 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_s8_i16832gemm_s4_s8_64x128_64x6_tn_align32 Status: Success Verification: ON Disposition: Incorrect reference_device: Incorrect cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s8:column --D=s8:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=64 --cta_n=128 --cta_k=64 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=6 --warps_m=2 --warps_n=2 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 26083328 bytes FLOPs: 58010370048 flops FLOPs/Byte: 2224 Runtime: 0.253317 ms Memory: 95.8956 GiB/s Math: 229003 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_s8_i16832gemm_s4_s8_256x128_128x3_tn_align32 Status: Success Verification: ON Disposition: Incorrect reference_device: Incorrect cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s8:column --D=s8:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=256 --cta_n=128 --cta_k=128 --cluster_m=1 \ --cluster_n=1 --cluster_k=1 --stages=3 --warps_m=4 --warps_n=2 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 \ --min_cc=80 --max_cc=1024 Bytes: 26083328 bytes FLOPs: 58010370048 flops FLOPs/Byte: 2224 Runtime: 0.173087 ms Memory: 140.346 GiB/s Math: 335152 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_s8_i16832gemm_s4_s8_128x256_128x3_tn_align32 Status: Success Verification: ON Disposition: Incorrect reference_device: Incorrect cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s8:column --D=s8:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=128 --cta_n=256 --cta_k=128 --cluster_m=1 \ --cluster_n=1 --cluster_k=1 --stages=3 --warps_m=2 --warps_n=4 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 \ --min_cc=80 --max_cc=1024 Bytes: 26083328 bytes FLOPs: 58010370048 flops FLOPs/Byte: 2224 Runtime: 0.223416 ms Memory: 108.73 GiB/s Math: 259651 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_s8_i16832gemm_s4_s8_256x64_128x4_tn_align32 Status: Success Verification: ON Disposition: Incorrect reference_device: Incorrect cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s8:column --D=s8:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=256 --cta_n=64 --cta_k=128 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=4 --warps_m=4 --warps_n=1 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 26083328 bytes FLOPs: 58010370048 flops FLOPs/Byte: 2224 Runtime: 0.227451 ms Memory: 106.801 GiB/s Math: 255046 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_s8_i16832gemm_s4_s8_64x256_128x4_tn_align32 Status: Success Verification: ON Disposition: Incorrect reference_device: Incorrect cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s8:column --D=s8:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=64 --cta_n=256 --cta_k=128 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=4 --warps_m=1 --warps_n=4 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 26083328 bytes FLOPs: 58010370048 flops FLOPs/Byte: 2224 Runtime: 0.254802 ms Memory: 95.3368 GiB/s Math: 227668 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_s8_i16832gemm_s4_s8_256x32_128x4_tn_align32 Status: Success Verification: ON Disposition: Incorrect reference_device: Incorrect cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s8:column --D=s8:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=256 --cta_n=32 --cta_k=128 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=4 --warps_m=4 --warps_n=1 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 26083328 bytes FLOPs: 58010370048 flops FLOPs/Byte: 2224 Runtime: 0.271237 ms Memory: 89.56 GiB/s Math: 213873 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_s8_i16832gemm_s4_s8_32x256_128x4_tn_align32 Status: Success Verification: ON Disposition: Incorrect reference_device: Incorrect cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s8:column --D=s8:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=32 --cta_n=256 --cta_k=128 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=4 --warps_m=1 --warps_n=4 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 26083328 bytes FLOPs: 58010370048 flops FLOPs/Byte: 2224 Runtime: 0.280412 ms Memory: 86.6296 GiB/s Math: 206875 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_s8_i16832gemm_s4_s8_128x128_128x4_tn_align32 Status: Success Verification: ON Disposition: Incorrect reference_device: Incorrect cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s8:column --D=s8:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=128 --cta_n=128 --cta_k=128 --cluster_m=1 \ --cluster_n=1 --cluster_k=1 --stages=4 --warps_m=2 --warps_n=2 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 \ --min_cc=80 --max_cc=1024 Bytes: 26083328 bytes FLOPs: 58010370048 flops FLOPs/Byte: 2224 Runtime: 0.221327 ms Memory: 109.756 GiB/s Math: 262102 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_s8_i16832gemm_s4_s8_64x128_128x3_tn_align32 Status: Success Verification: ON Disposition: Incorrect reference_device: Incorrect cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s8:column --D=s8:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=64 --cta_n=128 --cta_k=128 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=3 --warps_m=2 --warps_n=2 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 26083328 bytes FLOPs: 58010370048 flops FLOPs/Byte: 2224 Runtime: 0.232192 ms Memory: 104.62 GiB/s Math: 249838 GFLOP/s ============================= Problem ID: 1 Provider: CUTLASS OperationKind: gemm Operation: cutlass_tensorop_s8_i16832gemm_s4_s8_128x32_128x4_tn_align32 Status: Success Verification: ON Disposition: Incorrect reference_device: Incorrect cuBLAS: Not run cuDNN: Not run Arguments: --gemm_kind=universal --m=3456 --n=4096 --k=2048 --A=s4:row --B=s8:column --C=s8:column --D=s8:column \ --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \ --swizzle_size=1 --op_class=tensorop --accum=s32 --cta_m=128 --cta_n=32 --cta_k=128 --cluster_m=1 --cluster_n=1 \ --cluster_k=1 --stages=4 --warps_m=4 --warps_n=1 --warps_k=1 --inst_m=16 --inst_n=8 --inst_k=32 --min_cc=80 \ --max_cc=1024 Bytes: 26083328 bytes FLOPs: 58010370048 flops FLOPs/Byte: 2224 Runtime: 0.303647 ms Memory: 80.0008 GiB/s Math: 191046 GFLOP/s ============================= CSV Results: Problem,Provider,OperationKind,Operation,Disposition,Status,gemm_kind,m,n,k,A,B,C,D,alpha,beta,split_k_mode,split_k_slices,batch_count,raster_order,swizzle_size,op_class,accum,cta_m,cta_n,cta_k,cluster_m,cluster_n,cluster_k,stages,warps_m,warps_n,warps_k,inst_m,inst_n,inst_k,min_cc,max_cc,Bytes,Flops,Flops/Byte,Runtime,GB/s,GFLOPs 1,CUTLASS,gemm,cutlass_tensorop_i16832gemm_s4_s8_256x128_64x3_tn_align32,passed,success,universal,3456,4096,2048,s4:row,s8:column,s32:column,s32:column,1,0,serial,1,1,heuristic,1,tensorop,s32,256,128,64,1,1,1,3,4,2,1,16,8,32,80,1024,68550656,58010370048,846,0.239811,266.222,241901 1,CUTLASS,gemm,cutlass_tensorop_i16832gemm_s4_s8_128x256_64x3_tn_align32,passed,success,universal,3456,4096,2048,s4:row,s8:column,s32:column,s32:column,1,0,serial,1,1,heuristic,1,tensorop,s32,128,256,64,1,1,1,3,2,4,1,16,8,32,80,1024,68550656,58010370048,846,0.22529,283.38,257492 1,CUTLASS,gemm,cutlass_tensorop_i16832gemm_s4_s8_256x64_64x4_tn_align32,passed,success,universal,3456,4096,2048,s4:row,s8:column,s32:column,s32:column,1,0,serial,1,1,heuristic,1,tensorop,s32,256,64,64,1,1,1,4,4,1,1,16,8,32,80,1024,68550656,58010370048,846,0.196229,325.348,295626 1,CUTLASS,gemm,cutlass_tensorop_i16832gemm_s4_s8_64x256_64x4_tn_align32,passed,success,universal,3456,4096,2048,s4:row,s8:column,s32:column,s32:column,1,0,serial,1,1,heuristic,1,tensorop,s32,64,256,64,1,1,1,4,1,4,1,16,8,32,80,1024,68550656,58010370048,846,0.212132,300.958,273464 1,CUTLASS,gemm,cutlass_tensorop_i16832gemm_s4_s8_32x256_64x4_tn_align32,passed,success,universal,3456,4096,2048,s4:row,s8:column,s32:column,s32:column,1,0,serial,1,1,heuristic,1,tensorop,s32,32,256,64,1,1,1,4,1,4,1,16,8,32,80,1024,68550656,58010370048,846,0.287785,221.842,201575 1,CUTLASS,gemm,cutlass_tensorop_i16832gemm_s4_s8_128x128_64x5_tn_align32,passed,success,universal,3456,4096,2048,s4:row,s8:column,s32:column,s32:column,1,0,serial,1,1,heuristic,1,tensorop,s32,128,128,64,1,1,1,5,2,2,1,16,8,32,80,1024,68550656,58010370048,846,0.180972,352.778,320550 1,CUTLASS,gemm,cutlass_tensorop_i16832gemm_s4_s8_64x128_64x6_tn_align32,passed,success,universal,3456,4096,2048,s4:row,s8:column,s32:column,s32:column,1,0,serial,1,1,heuristic,1,tensorop,s32,64,128,64,1,1,1,6,2,2,1,16,8,32,80,1024,68550656,58010370048,846,0.255652,249.725,226912 1,CUTLASS,gemm,cutlass_tensorop_i16832gemm_s4_s8_256x128_128x3_tn_align32,passed,success,universal,3456,4096,2048,s4:row,s8:column,s32:column,s32:column,1,0,serial,1,1,heuristic,1,tensorop,s32,256,128,128,1,1,1,3,4,2,1,16,8,32,80,1024,68550656,58010370048,846,0.189368,337.135,306336 1,CUTLASS,gemm,cutlass_tensorop_i16832gemm_s4_s8_128x256_128x3_tn_align32,passed,success,universal,3456,4096,2048,s4:row,s8:column,s32:column,s32:column,1,0,serial,1,1,heuristic,1,tensorop,s32,128,256,128,1,1,1,3,2,4,1,16,8,32,80,1024,68550656,58010370048,846,0.225495,283.123,257258 1,CUTLASS,gemm,cutlass_tensorop_i16832gemm_s4_s8_256x64_128x4_tn_align32,passed,success,universal,3456,4096,2048,s4:row,s8:column,s32:column,s32:column,1,0,serial,1,1,heuristic,1,tensorop,s32,256,64,128,1,1,1,4,4,1,1,16,8,32,80,1024,68550656,58010370048,846,0.233912,272.935,248000 1,CUTLASS,gemm,cutlass_tensorop_i16832gemm_s4_s8_64x256_128x4_tn_align32,passed,success,universal,3456,4096,2048,s4:row,s8:column,s32:column,s32:column,1,0,serial,1,1,heuristic,1,tensorop,s32,64,256,128,1,1,1,4,1,4,1,16,8,32,80,1024,68550656,58010370048,846,0.248617,256.792,233332 1,CUTLASS,gemm,cutlass_tensorop_i16832gemm_s4_s8_256x32_128x4_tn_align32,passed,success,universal,3456,4096,2048,s4:row,s8:column,s32:column,s32:column,1,0,serial,1,1,heuristic,1,tensorop,s32,256,32,128,1,1,1,4,4,1,1,16,8,32,80,1024,68550656,58010370048,846,0.261468,244.17,221864 1,CUTLASS,gemm,cutlass_tensorop_i16832gemm_s4_s8_32x256_128x4_tn_align32,passed,success,universal,3456,4096,2048,s4:row,s8:column,s32:column,s32:column,1,0,serial,1,1,heuristic,1,tensorop,s32,32,256,128,1,1,1,4,1,4,1,16,8,32,80,1024,68550656,58010370048,846,0.279398,228.501,207626 1,CUTLASS,gemm,cutlass_tensorop_i16832gemm_s4_s8_128x128_128x4_tn_align32,passed,success,universal,3456,4096,2048,s4:row,s8:column,s32:column,s32:column,1,0,serial,1,1,heuristic,1,tensorop,s32,128,128,128,1,1,1,4,2,2,1,16,8,32,80,1024,68550656,58010370048,846,0.223918,285.117,259070 1,CUTLASS,gemm,cutlass_tensorop_i16832gemm_s4_s8_64x128_128x3_tn_align32,passed,success,universal,3456,4096,2048,s4:row,s8:column,s32:column,s32:column,1,0,serial,1,1,heuristic,1,tensorop,s32,64,128,128,1,1,1,3,2,2,1,16,8,32,80,1024,68550656,58010370048,846,0.235264,271.367,246576 1,CUTLASS,gemm,cutlass_tensorop_i16832gemm_s4_s8_128x32_128x4_tn_align32,passed,success,universal,3456,4096,2048,s4:row,s8:column,s32:column,s32:column,1,0,serial,1,1,heuristic,1,tensorop,s32,128,32,128,1,1,1,4,4,1,1,16,8,32,80,1024,68550656,58010370048,846,0.296858,215.062,195415 1,CUTLASS,gemm,cutlass_tensorop_s8_i16832gemm_s4_s8_256x128_64x3_tn_align32,incorrect,success,universal,3456,4096,2048,s4:row,s8:column,s8:column,s8:column,1,0,serial,1,1,heuristic,1,tensorop,s32,256,128,64,1,1,1,3,4,2,1,16,8,32,80,1024,26083328,58010370048,2224,0.180244,134.772,321843 1,CUTLASS,gemm,cutlass_tensorop_s8_i16832gemm_s4_s8_128x256_64x3_tn_align32,incorrect,success,universal,3456,4096,2048,s4:row,s8:column,s8:column,s8:column,1,0,serial,1,1,heuristic,1,tensorop,s32,128,256,64,1,1,1,3,2,4,1,16,8,32,80,1024,26083328,58010370048,2224,0.227871,106.604,254576 1,CUTLASS,gemm,cutlass_tensorop_s8_i16832gemm_s4_s8_256x64_64x4_tn_align32,incorrect,success,universal,3456,4096,2048,s4:row,s8:column,s8:column,s8:column,1,0,serial,1,1,heuristic,1,tensorop,s32,256,64,64,1,1,1,4,4,1,1,16,8,32,80,1024,26083328,58010370048,2224,0.189522,128.175,306088 1,CUTLASS,gemm,cutlass_tensorop_s8_i16832gemm_s4_s8_64x256_64x4_tn_align32,incorrect,success,universal,3456,4096,2048,s4:row,s8:column,s8:column,s8:column,1,0,serial,1,1,heuristic,1,tensorop,s32,64,256,64,1,1,1,4,1,4,1,16,8,32,80,1024,26083328,58010370048,2224,0.224686,108.115,258184 1,CUTLASS,gemm,cutlass_tensorop_s8_i16832gemm_s4_s8_32x256_64x4_tn_align32,incorrect,success,universal,3456,4096,2048,s4:row,s8:column,s8:column,s8:column,1,0,serial,1,1,heuristic,1,tensorop,s32,32,256,64,1,1,1,4,1,4,1,16,8,32,80,1024,26083328,58010370048,2224,0.287273,84.5607,201935 1,CUTLASS,gemm,cutlass_tensorop_s8_i16832gemm_s4_s8_128x128_64x5_tn_align32,incorrect,success,universal,3456,4096,2048,s4:row,s8:column,s8:column,s8:column,1,0,serial,1,1,heuristic,1,tensorop,s32,128,128,64,1,1,1,5,2,2,1,16,8,32,80,1024,26083328,58010370048,2224,0.177213,137.078,327347 1,CUTLASS,gemm,cutlass_tensorop_s8_i16832gemm_s4_s8_64x128_64x6_tn_align32,incorrect,success,universal,3456,4096,2048,s4:row,s8:column,s8:column,s8:column,1,0,serial,1,1,heuristic,1,tensorop,s32,64,128,64,1,1,1,6,2,2,1,16,8,32,80,1024,26083328,58010370048,2224,0.253317,95.8956,229003 1,CUTLASS,gemm,cutlass_tensorop_s8_i16832gemm_s4_s8_256x128_128x3_tn_align32,incorrect,success,universal,3456,4096,2048,s4:row,s8:column,s8:column,s8:column,1,0,serial,1,1,heuristic,1,tensorop,s32,256,128,128,1,1,1,3,4,2,1,16,8,32,80,1024,26083328,58010370048,2224,0.173087,140.346,335152 1,CUTLASS,gemm,cutlass_tensorop_s8_i16832gemm_s4_s8_128x256_128x3_tn_align32,incorrect,success,universal,3456,4096,2048,s4:row,s8:column,s8:column,s8:column,1,0,serial,1,1,heuristic,1,tensorop,s32,128,256,128,1,1,1,3,2,4,1,16,8,32,80,1024,26083328,58010370048,2224,0.223416,108.73,259651 1,CUTLASS,gemm,cutlass_tensorop_s8_i16832gemm_s4_s8_256x64_128x4_tn_align32,incorrect,success,universal,3456,4096,2048,s4:row,s8:column,s8:column,s8:column,1,0,serial,1,1,heuristic,1,tensorop,s32,256,64,128,1,1,1,4,4,1,1,16,8,32,80,1024,26083328,58010370048,2224,0.227451,106.801,255046 1,CUTLASS,gemm,cutlass_tensorop_s8_i16832gemm_s4_s8_64x256_128x4_tn_align32,incorrect,success,universal,3456,4096,2048,s4:row,s8:column,s8:column,s8:column,1,0,serial,1,1,heuristic,1,tensorop,s32,64,256,128,1,1,1,4,1,4,1,16,8,32,80,1024,26083328,58010370048,2224,0.254802,95.3368,227668 1,CUTLASS,gemm,cutlass_tensorop_s8_i16832gemm_s4_s8_256x32_128x4_tn_align32,incorrect,success,universal,3456,4096,2048,s4:row,s8:column,s8:column,s8:column,1,0,serial,1,1,heuristic,1,tensorop,s32,256,32,128,1,1,1,4,4,1,1,16,8,32,80,1024,26083328,58010370048,2224,0.271237,89.56,213873 1,CUTLASS,gemm,cutlass_tensorop_s8_i16832gemm_s4_s8_32x256_128x4_tn_align32,incorrect,success,universal,3456,4096,2048,s4:row,s8:column,s8:column,s8:column,1,0,serial,1,1,heuristic,1,tensorop,s32,32,256,128,1,1,1,4,1,4,1,16,8,32,80,1024,26083328,58010370048,2224,0.280412,86.6296,206875 1,CUTLASS,gemm,cutlass_tensorop_s8_i16832gemm_s4_s8_128x128_128x4_tn_align32,incorrect,success,universal,3456,4096,2048,s4:row,s8:column,s8:column,s8:column,1,0,serial,1,1,heuristic,1,tensorop,s32,128,128,128,1,1,1,4,2,2,1,16,8,32,80,1024,26083328,58010370048,2224,0.221327,109.756,262102 1,CUTLASS,gemm,cutlass_tensorop_s8_i16832gemm_s4_s8_64x128_128x3_tn_align32,incorrect,success,universal,3456,4096,2048,s4:row,s8:column,s8:column,s8:column,1,0,serial,1,1,heuristic,1,tensorop,s32,64,128,128,1,1,1,3,2,2,1,16,8,32,80,1024,26083328,58010370048,2224,0.232192,104.62,249838 1,CUTLASS,gemm,cutlass_tensorop_s8_i16832gemm_s4_s8_128x32_128x4_tn_align32,incorrect,success,universal,3456,4096,2048,s4:row,s8:column,s8:column,s8:column,1,0,serial,1,1,heuristic,1,tensorop,s32,128,32,128,1,1,1,4,4,1,1,16,8,32,80,1024,26083328,58010370048,2224,0.303647,80.0008,191046