Generic OpenCL kernels are broken #1960

nwnk · 2024-06-13T18:08:31Z

The build documentation claims that generic OpenCL kernels are always available. I wanted to verify that they worked, and the straightforward way to do that seemed to be this:

commit 221691fab2a936267c0cf352e9b9b64ebf813973 (HEAD)
Author: Adam Jackson <ajax@redhat.com>
Date:   Mon Jun 10 22:03:30 2024 -0400

    cmake: Allow building no gen-specific OpenCL kernels

diff --git a/cmake/configuring_primitive_list.cmake b/cmake/configuring_primitive_list.cmake
index 3524f17107..75333fd1e6 100644
--- a/cmake/configuring_primitive_list.cmake
+++ b/cmake/configuring_primitive_list.cmake
@@ -55,6 +55,8 @@ message(STATUS "Enabled primitive CPU ISA: ${DNNL_ENABLE_PRIMITIVE_CPU_ISA}")
 
 if (DNNL_ENABLE_PRIMITIVE_GPU_ISA STREQUAL "ALL")
     set(BUILD_PRIMITIVE_GPU_ISA_ALL TRUE)
+elseif (DNNL_ENABLE_PRIMITIVE_GPU_ISA STREQUAL "NONE")
+    #
 else()
     foreach(isa ${DNNL_ENABLE_PRIMITIVE_GPU_ISA})
         string(TOUPPER ${isa} uisa)

And that builds! And it works more than it doesn't! With the Intel oneAPI 2024.1 DPC++ compiler, I built 3c0e1f1635c81ae9074f2deeff9977a2a8ef149d with the above patch, SYCL CPU and GPU backends. (I am not using the OpenCL driver from the oneAPI release. I am using Fedora 40's build of the Intel Compute Runtime, intel-compute-runtime-24.09.28717.17-1.fc40.x86_64. I don't expect that matters much here, but I can try with a different version if it helps.)

With the normal build, ctest says:

99% tests passed, 6 tests failed out of 453
        
Total Test time (real) = 6392.06 sec
        
The following tests FAILED:
        406 - test_benchdnn_modeC_concat_ci_gpu (Failed)
        408 - test_benchdnn_modeC_conv_gpu_ci_gpu (Failed)
        410 - test_benchdnn_modeC_deconv_ci_gpu (Failed)
        416 - test_benchdnn_modeC_graph_ci_gpu (Failed)
        432 - test_benchdnn_modeC_reorder_ci_gpu (Failed)
        450 - test_benchdnn_modeC_sum_ci_gpu (Failed)

Then, I rebuilt with DNNL_ENABLE_PRIMITIVE_GPU_ISA set to NONE, and ctest said:

78% tests passed, 99 tests failed out of 453

Total Test time (real) = 4957.57 sec

The following tests FAILED:
	  4 - gpu-cnn-inference-f32-cpp (Failed)
	  6 - gpu-cnn-inference-int8-cpp (Failed)
	  8 - gpu-cnn-training-bf16-cpp (Failed)
	 10 - gpu-cnn-training-f32-cpp (Failed)
	 15 - gpu-graph-sycl-getting-started-cpp (Failed)
	 16 - cpu-graph-sycl-single-op-partition-cpp (Failed)
	 17 - gpu-graph-sycl-single-op-partition-cpp (Failed)
	 19 - gpu-matmul-perf-cpp (Failed)
	 21 - gpu-memory-format-propagation-cpp (Failed)
	 23 - gpu-performance-profiling-cpp (Failed)
	 33 - gpu-primitives-convolution-cpp (Failed)
	 39 - gpu-primitives-inner-product-cpp (Failed)
	 43 - gpu-primitives-lbr-gru-cpp (Failed)
	 47 - gpu-primitives-lstm-cpp (SEGFAULT)
	 49 - gpu-primitives-matmul-cpp (Failed)
	 61 - gpu-primitives-shuffle-cpp (Failed)
	 65 - gpu-primitives-sum-cpp (Failed)
	 67 - gpu-primitives-vanilla-rnn-cpp (Failed)
	 69 - gpu-rnn-training-f32-cpp (Failed)
	 75 - gpu-tutorials-matmul-inference-int8-matmul-cpp (Failed)
	 84 - test_binary_gpu (Failed)
	 86 - test_binary_buffer_gpu (Failed)
	 88 - test_concat_gpu (Failed)
	 90 - test_concat_buffer_gpu (Failed)
	 92 - test_concurrency_gpu (Failed)
	 94 - test_concurrency_buffer_gpu (Failed)
	 96 - test_convolution_backward_data_f32_gpu (Failed)
	 98 - test_convolution_backward_data_f32_buffer_gpu (Failed)
	100 - test_convolution_backward_weights_f32_gpu (Failed)
	102 - test_convolution_backward_weights_f32_buffer_gpu (Failed)
	104 - test_convolution_eltwise_forward_f32_gpu (Failed)
	106 - test_convolution_eltwise_forward_f32_buffer_gpu (Failed)
	108 - test_convolution_eltwise_forward_x8s8f32s32_gpu (Failed)
	110 - test_convolution_eltwise_forward_x8s8f32s32_buffer_gpu (Failed)
	112 - test_convolution_forward_f32_gpu (Failed)
	114 - test_convolution_forward_f32_buffer_gpu (Failed)
	123 - test_cross_engine_reorder_buffer (Failed)
	125 - test_deconvolution_gpu (Failed)
	127 - test_deconvolution_buffer_gpu (Failed)
	177 - test_inner_product_backward_data_gpu (Failed)
	179 - test_inner_product_backward_data_buffer_gpu (Failed)
	181 - test_inner_product_backward_weights_gpu (Failed)
	183 - test_inner_product_backward_weights_buffer_gpu (Failed)
	185 - test_inner_product_forward_gpu (Failed)
	187 - test_inner_product_forward_buffer_gpu (Failed)
	197 - test_matmul_gpu (Failed)
	199 - test_matmul_buffer_gpu (Failed)
	201 - test_persistent_cache_api_gpu (Failed)
	203 - test_persistent_cache_api_buffer_gpu (Failed)
	209 - test_pooling_forward_gpu (Failed)
	211 - test_pooling_forward_buffer_gpu (Failed)
	217 - test_primitive_cache_mt_gpu (Failed)
	219 - test_primitive_cache_mt_buffer_gpu (Failed)
	225 - test_reorder_gpu (Failed)
	227 - test_reorder_buffer_gpu (Failed)
	237 - test_shuffle_gpu (Failed)
	239 - test_shuffle_buffer_gpu (Failed)
	245 - test_sum_gpu (Failed)
	247 - test_sum_buffer_gpu (Failed)
	298 - test_api (Failed)
	299 - test_api_buffer (Failed)
	304 - test_api_sycl (Failed)
	317 - test_graph_c_api_compile_usm_gpu (Failed)
	319 - test_graph_c_api_compile_parametrized_usm_gpu (Failed)
	321 - test_graph_cpp_api_compile_usm_gpu (Failed)
	323 - test_graph_cpp_api_partition_usm_gpu (Failed)
	325 - test_graph_cpp_api_compiled_partition_sycl_usm_gpu (Failed)
	353 - test_graph_unit_dnnl_batch_norm_usm_gpu (Failed)
	355 - test_graph_unit_dnnl_binary_op_usm_gpu (Failed)
	357 - test_graph_unit_dnnl_bmm_usm_gpu (Failed)
	359 - test_graph_unit_dnnl_compiled_partition_usm_gpu (Failed)
	361 - test_graph_unit_dnnl_concat_usm_gpu (Failed)
	363 - test_graph_unit_dnnl_conv_usm_gpu (Failed)
	365 - test_graph_unit_dnnl_convtranspose_usm_gpu (Failed)
	367 - test_graph_unit_dnnl_dequantize_usm_gpu (Failed)
	369 - test_graph_unit_dnnl_eltwise_usm_gpu (Failed)
	373 - test_graph_unit_dnnl_large_partition_usm_gpu (Failed)
	377 - test_graph_unit_dnnl_matmul_usm_gpu (Failed)
	381 - test_graph_unit_dnnl_pool_usm_gpu (Failed)
	385 - test_graph_unit_dnnl_quantize_usm_gpu (Failed)
	387 - test_graph_unit_dnnl_reduce_usm_gpu (Failed)
	389 - test_graph_unit_dnnl_reorder_usm_gpu (Failed)
	393 - test_graph_unit_dnnl_softmax_usm_gpu (Failed)
	406 - test_benchdnn_modeC_concat_ci_gpu (Failed)
	408 - test_benchdnn_modeC_conv_gpu_ci_gpu (Failed)
	410 - test_benchdnn_modeC_deconv_ci_gpu (Failed)
	412 - test_benchdnn_modeC_eltwise_ci_gpu (Failed)
	416 - test_benchdnn_modeC_graph_ci_gpu (Subprocess aborted)
	418 - test_benchdnn_modeC_ip_ci_gpu (Failed)
	424 - test_benchdnn_modeC_matmul_ci_gpu (Failed)
	426 - test_benchdnn_modeC_pool_ci_gpu (Failed)
	432 - test_benchdnn_modeC_reorder_ci_gpu (Failed)
	437 - test_benchdnn_modeC_gru_ci_gpu (SEGFAULT)
	438 - test_benchdnn_modeC_lstm_ci_gpu (SEGFAULT)
	439 - test_benchdnn_modeC_rnn_ci_gpu (SEGFAULT)
	444 - test_benchdnn_modeC_self_ci_gpu (Failed)
	446 - test_benchdnn_modeC_shuffle_ci_gpu (Failed)
	448 - test_benchdnn_modeC_softmax_ci_gpu (Failed)
	450 - test_benchdnn_modeC_sum_ci_gpu (Failed)

So 93 new failures. 107 GPU tests did pass, though, so it seems like this should work. This is on a gen9 GPU, specifically:

% lspci -vnn -s 0:2
00:02.0 Display controller [0380]: Intel Corporation CometLake-S GT2 [UHD Graphics 630] [8086:9bc5] (rev 05)

Since GEN9 is the lowest ISA specifically supported this suggests that some of the generic OpenCL kernels are broken.

The text was updated successfully, but these errors were encountered:

nwnk · 2024-06-13T21:19:42Z

For additional data, with the OMP and OCL backends, the same baseline tests fail without the NONE setting; with it set, the OCL backend seems to be in better shape than SYCL:

76% tests passed, 78 tests failed out of 322

Total Test time (real) = 5526.59 sec

The following tests FAILED:
	  7 - test_binary_gpu (Failed)
	  8 - test_binary_buffer_gpu (Failed)
	 10 - test_concat_gpu (Failed)
	 11 - test_concat_buffer_gpu (Failed)
	 13 - test_concurrency_gpu (Failed)
	 14 - test_concurrency_buffer_gpu (Failed)
	 16 - test_convolution_backward_data_f32_gpu (Failed)
	 17 - test_convolution_backward_data_f32_buffer_gpu (Failed)
	 19 - test_convolution_backward_weights_f32_gpu (Failed)
	 20 - test_convolution_backward_weights_f32_buffer_gpu (Failed)
	 22 - test_convolution_eltwise_forward_f32_gpu (Failed)
	 23 - test_convolution_eltwise_forward_f32_buffer_gpu (Failed)
	 25 - test_convolution_eltwise_forward_x8s8f32s32_gpu (Failed)
	 26 - test_convolution_eltwise_forward_x8s8f32s32_buffer_gpu (Failed)
	 28 - test_convolution_forward_f32_gpu (Failed)
	 29 - test_convolution_forward_f32_buffer_gpu (Failed)
	 36 - test_cross_engine_reorder (Failed)
	 37 - test_cross_engine_reorder_buffer (Failed)
	 39 - test_deconvolution_gpu (Failed)
	 40 - test_deconvolution_buffer_gpu (Failed)
	 78 - test_inner_product_backward_data_gpu (Failed)
	 79 - test_inner_product_backward_data_buffer_gpu (Failed)
	 81 - test_inner_product_backward_weights_gpu (Failed)
	 82 - test_inner_product_backward_weights_buffer_gpu (Failed)
	 84 - test_inner_product_forward_gpu (Failed)
	 85 - test_inner_product_forward_buffer_gpu (Failed)
	 93 - test_matmul_gpu (Failed)
	 94 - test_matmul_buffer_gpu (Failed)
	 96 - test_persistent_cache_api_gpu (Failed)
	 97 - test_persistent_cache_api_buffer_gpu (Failed)
	102 - test_pooling_forward_gpu (Failed)
	103 - test_pooling_forward_buffer_gpu (Failed)
	108 - test_primitive_cache_mt_gpu (Subprocess aborted)
	109 - test_primitive_cache_mt_buffer_gpu (Subprocess aborted)
	114 - test_reorder_gpu (Failed)
	115 - test_reorder_buffer_gpu (Failed)
	123 - test_shuffle_gpu (Failed)
	124 - test_shuffle_buffer_gpu (Failed)
	129 - test_sum_gpu (Failed)
	130 - test_sum_buffer_gpu (Failed)
	170 - test_api (Failed)
	188 - test_graph_c_api_compile_usm_gpu (Failed)
	190 - test_graph_c_api_compile_parametrized_usm_gpu (Failed)
	192 - test_graph_cpp_api_compile_usm_gpu (Failed)
	194 - test_graph_cpp_api_partition_usm_gpu (Failed)
	196 - test_graph_cpp_api_compiled_partition_ocl_gpu (Failed)
	221 - test_graph_unit_dnnl_batch_norm_usm_gpu (Failed)
	223 - test_graph_unit_dnnl_binary_op_usm_gpu (Failed)
	225 - test_graph_unit_dnnl_bmm_usm_gpu (Failed)
	227 - test_graph_unit_dnnl_compiled_partition_usm_gpu (Failed)
	229 - test_graph_unit_dnnl_concat_usm_gpu (Failed)
	231 - test_graph_unit_dnnl_conv_usm_gpu (Failed)
	233 - test_graph_unit_dnnl_convtranspose_usm_gpu (Failed)
	235 - test_graph_unit_dnnl_dequantize_usm_gpu (Failed)
	237 - test_graph_unit_dnnl_eltwise_usm_gpu (Failed)
	241 - test_graph_unit_dnnl_large_partition_usm_gpu (Failed)
	245 - test_graph_unit_dnnl_matmul_usm_gpu (Failed)
	249 - test_graph_unit_dnnl_pool_usm_gpu (Failed)
	253 - test_graph_unit_dnnl_quantize_usm_gpu (Failed)
	255 - test_graph_unit_dnnl_reduce_usm_gpu (Failed)
	257 - test_graph_unit_dnnl_reorder_usm_gpu (Failed)
	261 - test_graph_unit_dnnl_softmax_usm_gpu (Failed)
	274 - test_benchdnn_modeC_concat_ci_gpu (Failed)
	276 - test_benchdnn_modeC_conv_gpu_ci_gpu (Failed)
	278 - test_benchdnn_modeC_deconv_ci_gpu (Failed)
	280 - test_benchdnn_modeC_eltwise_ci_gpu (Failed)
	284 - test_benchdnn_modeC_graph_ci_gpu (Subprocess aborted)
	286 - test_benchdnn_modeC_ip_ci_gpu (Failed)
	292 - test_benchdnn_modeC_matmul_ci_gpu (Failed)
	294 - test_benchdnn_modeC_pool_ci_gpu (Failed)
	300 - test_benchdnn_modeC_reorder_ci_gpu (Failed)
	305 - test_benchdnn_modeC_gru_ci_gpu (SEGFAULT)
	306 - test_benchdnn_modeC_lstm_ci_gpu (SEGFAULT)
	307 - test_benchdnn_modeC_rnn_ci_gpu (SEGFAULT)
	312 - test_benchdnn_modeC_self_ci_gpu (Failed)
	314 - test_benchdnn_modeC_shuffle_ci_gpu (Failed)
	316 - test_benchdnn_modeC_softmax_ci_gpu (Failed)
	318 - test_benchdnn_modeC_sum_ci_gpu (Failed)

88 GPU tests passed, so again, more working than not, but still not really working.

vpirogov · 2024-06-21T22:57:27Z

Intel(R) UHD Graphics 630 support was discontinued and the last driver update published in the end of 2022. oneDNN dropped support for GEN9 in v3.4 release. Looks like we neglected to drop GEN9 from the ISA list though.

Trying your patch on newer architecture (Xe-HPC) I see 'could not create a primitive' errors for some tests. This looks like empty ISA list results in issues with platform detection and/or kernel dispatching. If you want to make DNNL_ENABLE_PRIMITIVE_GPU_ISA=NONE work likely additional implementation changes would be needed.

densamoilov · 2024-07-03T07:01:29Z

@nwnk,

The build documentation claims that generic OpenCL kernels are always available.

The documentation doesn't claim that, it says that ONEDNN_ENABLE_PRIMITIVE_GPU_ISA knob controls the just-in-time kernel generation based implementations and that the OpenCL based kernels and implementations are always available. It doesn't imply that the OpenCL kernels are generic even though some of them may be.

If there is a need to introduce generic OpenCL kernels then I believe that best way to do that would be via introducing a generic GPU vendor (ONEDNN_GPU_VENDOR=GENERIC). We have a plan to do that for SYCL GPU runtime.

The ONEDNN_ENABLE_PRIMITIVE_GPU_ISA knob should be used to control implementations within a particular vendor if there is such a need.

vpirogov · 2024-07-09T21:05:21Z

It's also important to note that there are no "generic OpenCL kernels" in oneDNN. These are relying on Intel vendor extensions. We are working on SYCL-based cross-platform implementation currently as part of UXL Foundation initiative.

nwnk added the sighting Suspicious library behavior. Should be promoted to a bug when confirmed label Jun 13, 2024

shu1chen added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Jun 14, 2024

vpirogov self-assigned this Jun 21, 2024

vpirogov added enhancement A feature or an optimization request help wanted and removed sighting Suspicious library behavior. Should be promoted to a bug when confirmed labels Jun 21, 2024

vpirogov assigned nwnk and unassigned vpirogov Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic OpenCL kernels are broken #1960

Generic OpenCL kernels are broken #1960

nwnk commented Jun 13, 2024

nwnk commented Jun 13, 2024

vpirogov commented Jun 21, 2024

densamoilov commented Jul 3, 2024 •

edited

Loading

vpirogov commented Jul 9, 2024

Generic OpenCL kernels are broken #1960

Generic OpenCL kernels are broken #1960

Comments

nwnk commented Jun 13, 2024

nwnk commented Jun 13, 2024

vpirogov commented Jun 21, 2024

densamoilov commented Jul 3, 2024 • edited Loading

vpirogov commented Jul 9, 2024

densamoilov commented Jul 3, 2024 •

edited

Loading