Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable cuda feature in onnxruntime package #5

Merged
merged 21 commits into from
Aug 8, 2023
Merged

Conversation

traversaro
Copy link
Contributor

No description provided.

@traversaro
Copy link
Contributor Author

As expected the build is failing with:

-- [OVERLAY] Loading triplet configuration from: D:\a\idjl-vision-dependencies-vcpkg\idjl-vision-dependencies-vcpkg\triplets\x64-windows-mt-v142.cmake
-- Executing NVCC-NOTFOUND --version resulted in error: 1
CMake Error at ports/cuda/vcpkg_find_cuda.cmake:69 (message):
  Could not find CUDA.  Before continuing, please download and install CUDA
  (v10.1.0 or higher) from:

      https://developer.nvidia.com/cuda-downloads

Call Stack (most recent call first):
  ports/cuda/portfile.cmake:9 (vcpkg_find_cuda)
  scripts/ports.cmake:147 (include)


error: building cuda:x64-windows-mt-v142 failed with: BUILD_FAILED
Elapsed time to handle cuda:x64-windows-mt-v142: 62.6 ms
Please ensure you're using the latest port files with `git pull` and `vcpkg update`.

@traversaro
Copy link
Contributor Author

Cool, now the compilation works fine but apparently just compiling in Debug mode takes ~3 hours, and so the compilation fails due to running out of time in ~6 hours:

2023-08-01T22:08:01.7249374Z -- Configuring x64-windows-mt-v142
2023-08-01T22:09:23.8631216Z -- Building x64-windows-mt-v142-dbg
2023-08-02T01:06:44.0517276Z -- Building x64-windows-mt-v142-rel
2023-08-02T01:21:48.2475636Z ##[error]The operation was canceled.

@traversaro traversaro changed the title Enable dnn-cuda feature in opencv package Enable cuda feature in onnxruntime package Aug 2, 2023
@traversaro
Copy link
Contributor Author

Cool, now the compilation works fine but apparently just compiling in Debug mode takes ~3 hours, and so the compilation fails due to running out of time in ~6 hours:

2023-08-01T22:08:01.7249374Z -- Configuring x64-windows-mt-v142
2023-08-01T22:09:23.8631216Z -- Building x64-windows-mt-v142-dbg
2023-08-02T01:06:44.0517276Z -- Building x64-windows-mt-v142-rel
2023-08-02T01:21:48.2475636Z ##[error]The operation was canceled.

Let's pivot the PR to try to compile onnxruntime with cuda support to check if that takes less time.

@traversaro
Copy link
Contributor Author

Cool, now the compilation works fine but apparently just compiling in Debug mode takes ~3 hours, and so the compilation fails due to running out of time in ~6 hours:

2023-08-01T22:08:01.7249374Z -- Configuring x64-windows-mt-v142
2023-08-01T22:09:23.8631216Z -- Building x64-windows-mt-v142-dbg
2023-08-02T01:06:44.0517276Z -- Building x64-windows-mt-v142-rel
2023-08-02T01:21:48.2475636Z ##[error]The operation was canceled.

As suggested in https://answers.opencv.org/question/5090/why-opencv-building-is-so-slow-with-cuda/, probably we could modify the opencv port to compile only the specific cuda architectures we are interested (by checking our dev systems and the deploy systems).

@traversaro
Copy link
Contributor Author

Build is now failing:

Building onnxruntime[core,cuda]:x64-windows-mt-v142...
-- [OVERLAY] Loading triplet configuration from: D:\a\idjl-vision-dependencies-vcpkg\idjl-vision-dependencies-vcpkg\triplets\x64-windows-mt-v142.cmake
-- Installing port from location: D:\a\idjl-vision-dependencies-vcpkg\idjl-vision-dependencies-vcpkg\./ports\onnxruntime
-- Downloading https://github.com/microsoft/onnxruntime/archive/v1.15.1.tar.gz -> microsoft-onnxruntime-v1.15.1.tar.gz...
-- Extracting source C:/ivdv/vcpkg/downloads/microsoft-onnxruntime-v1.15.1.tar.gz
-- Applying patch 1.14.1-0004-abseil-no-string-view.patch
-- Applying patch 1.15.1-0001-cmake-dependencies.patch
-- Using source at C:/ivdv/vcpkg/buildtrees/onnxruntime/src/v1.15.1-6f03679407.clean
-- Using Python3: C:/ivdv/vcpkg/downloads/tools/python/python-3.10.7-x64/python.exe
-- Found external ninja('1.10.2').
-- Configuring x64-windows-mt-v142
CMake Error at scripts/cmake/vcpkg_execute_required_process.cmake:112 (message):
    Command failed: "C:/Program Files (x86)/Microsoft Visual Studio/2019/Enterprise/Common7/IDE/CommonExtensions/Microsoft/CMake/Ninja/ninja.exe" -v
    Working Directory: C:/ivdv/vcpkg/buildtrees/onnxruntime/x64-windows-mt-v142-rel/vcpkg-parallel-configure
    Error code: 1
    See logs for more information:
      C:\ivdv\vcpkg\buildtrees\onnxruntime\config-x64-windows-mt-v142-dbg-CMakeCache.txt.log
      C:\ivdv\vcpkg\buildtrees\onnxruntime\config-x64-windows-mt-v142-rel-CMakeCache.txt.log
      C:\ivdv\vcpkg\buildtrees\onnxruntime\config-x64-windows-mt-v142-out.log

Call Stack (most recent call first):
  installed/x64-windows/share/vcpkg-cmake/vcpkg_cmake_configure.cmake:252 (vcpkg_execute_required_process)
  D:/a/idjl-vision-dependencies-vcpkg/idjl-vision-dependencies-vcpkg/ports/onnxruntime/portfile.cmake:69 (vcpkg_cmake_configure)
  scripts/ports.cmake:147 (include)


error: building onnxruntime:x64-windows-mt-v142 failed with: BUILD_FAILED
Elapsed time to handle onnxruntime:x64-windows-mt-v142: 58 s
Please ensure you're using the latest port files with `git pull` and `vcpkg update`.
Then check for known issues at:
    https://github.com/microsoft/vcpkg/issues?q=is%3Aissue+is%3Aopen+in%3Atitle+onnxruntime
You can submit a new issue at:
    https://github.com/microsoft/vcpkg/issues/new?title=[onnxruntime]+Build+error&body=Copy+issue+body+from+C%3A%2Fivdv%2Fvcpkg%2Finstalled%2Fvcpkg%2Fissue_body.md

We need to either debug on a machine or upload the logs.

@traversaro traversaro self-assigned this Aug 7, 2023
@traversaro
Copy link
Contributor Author

traversaro commented Aug 7, 2023

The error is:

-- CMAKE_CUDA_COMPILER_VERSION: 12.2.128
-- Enable flash attention for CUDA EP
-- Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.33.0.windows.2") 
fatal: not a git repository (or any of the parent directories): .git
fatal: not a git repository (or any of the parent directories): .git
CMake Warning at CMakeLists.txt:1475 (message):
  MPI and NCCL disabled on Win build.


CMake Warning (dev) at onnxruntime_mlas.cmake:587:
  Syntax Warning in cmake code at column 107

  Argument not separated from preceding token by whitespace.
Call Stack (most recent call first):
  CMakeLists.txt:1609 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Configuring done (47.3s)
CMake Error at onnxruntime_common.cmake:118 (target_link_libraries):
  Target "onnxruntime_common" links to:

    Microsoft.GSL::GSL

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

Call Stack (most recent call first):
  CMakeLists.txt:1609 (include)


CMake Error at onnxruntime.cmake:217 (target_link_libraries):
  Target "onnxruntime" links to:

    Microsoft.GSL::GSL

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

Call Stack (most recent call first):
  CMakeLists.txt:1609 (include)


-- Generating done (1.1s)
CMake Warning:
  Manually-specified variables were not used by the project:

    _VCPKG_ROOT_DIR


CMake Generate step failed.  Build files cannot be regenerated correctly.
ninja: build stopped: subcommand failed.

Probably this happens due to this snippet of code: https://github.com/microsoft/onnxruntime/blob/v1.15.1/cmake/external/onnxruntime_external_deps.cmake#L283-L297 , in which for CUDA only the custom downloaded GSL is supported, as there is added this patch: https://github.com/microsoft/onnxruntime/blob/3649376f09d238394cf0c22de14db3f4e8c11310/cmake/patches/gsl/1064.patch#L4 .

The patch was contributed upstream in microsoft/GSL#1064, but unfortunatly no release was done since that PR was merged. So we need to also prepare a different version of this port with that patch included.

@traversaro
Copy link
Contributor Author

Locally I am encountering microsoft/onnxruntime#16942, but this should not be a problem in the CI were we are using CUDA 12.1.0 .

@traversaro
Copy link
Contributor Author

Now it fails with:

FAILED: CMakeFiles/onnxruntime_providers_cuda.dir/fd4cff21d46b32929d031faf3970a23b/onnxruntime/core/providers/cuda/cuda_execution_provider.cc.obj 
C:\PROGRA~2\MIB055~1\2019\ENTERP~1\VC\Tools\MSVC\1429~1.301\bin\Hostx64\x64\cl.exe   /TP -DCPUINFO_SUPPORTED_PLATFORM=1 -DDEBUG_NODE_INPUTS_OUTPUTS -DDISABLE_ABSEIL -DEIGEN_HAS_CONSTEXPR -DEIGEN_HAS_CXX11_ATOMIC -DEIGEN_HAS_CXX11_MATH -DEIGEN_HAS_VARIADIC_TEMPLATES -DEIGEN_MPL2_ONLY -DEIGEN_STRONG_INLINE=inline -DEIGEN_USE_THREADS -DENABLE_CPU_FP16_TRAINING_OPS -DENABLE_CUDA_PROFILING -DMICROSOFT_INTERNAL -DNOGDI -DNOMINMAX -DNTDDI_VERSION=0x0A000000 -DONNX_ML=1 -DONNX_NAMESPACE=onnx -DORT_ENABLE_STREAM -DPLATFORM_WINDOWS -DPROTOBUF_USE_DLLS -DUSE_CUDA=1 -DUSE_FLASH_ATTENTION=1 -DWIN32_LEAN_AND_MEAN -DWINAPI_FAMILY=100 -DWINVER=0x0A00 -D_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS -D_USE_MATH_DEFINES -D_WIN32_WINNT=0x0A00 -Donnxruntime_providers_cuda_EXPORTS -IC:\src\idjl-vision-dependencies-vcpkg\vcpkg\buildtrees\onnxruntime\src\v1.15.1-8f270b0407.clean\include\onnxruntime -IC:\src\idjl-vision-dependencies-vcpkg\vcpkg\buildtrees\onnxruntime\src\v1.15.1-8f270b0407.clean\include\onnxruntime\core\session -IC:\src\idjl-vision-dependencies-vcpkg\vcpkg\buildtrees\onnxruntime\x64-windows-mt-v142-dbg -IC:\src\idjl-vision-dependencies-vcpkg\vcpkg\buildtrees\onnxruntime\src\v1.15.1-8f270b0407.clean\onnxruntime -IC:\src\idjl-vision-dependencies-vcpkg\vcpkg\buildtrees\onnxruntime\x64-windows-mt-v142-dbg\_deps\cutlass-src\include -IC:\src\idjl-vision-dependencies-vcpkg\vcpkg\buildtrees\onnxruntime\x64-windows-mt-v142-dbg\_deps\cutlass-src\examples -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -I\extras\CUPTI\include -external:IC:\src\idjl-vision-dependencies-vcpkg\vcpkg\installed\x64-windows-mt-v142\include -external:W0 /nologo /DWIN32 /D_WINDOWS /W3 /utf-8 /GR /EHsc /MP  /EHsc /wd26812 /Qspectre /MP -DEIGEN_HAS_C99_MATH -DCPUINFO_SUPPORTED /D_DEBUG /MTd /Z7 /Ob0 /Od /RTC1  -std:c++17 -MTd -Zi /W3 /GR "/external:IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.1/include" /external:IC:\src\idjl-vision-dependencies-vcpkg\vcpkg\installed\x64-windows-mt-v142\include /external:Ionnxruntime_external_lib_include_dirs-NOTFOUND /utf-8 /sdl /experimental:external /external:W0 /external:templates- /external:IC:/src/idjl-vision-dependencies-vcpkg/vcpkg/buildtrees/onnxruntime/src/v1.15.1-8f270b0407.clean/cmake /external:IC:/src/idjl-vision-dependencies-vcpkg/vcpkg/buildtrees/onnxruntime/x64-windows-mt-v142-dbg /wd4251 /wd4201 /wd5054 /w15038 -WX /YuC:/src/idjl-vision-dependencies-vcpkg/vcpkg/buildtrees/onnxruntime/x64-windows-mt-v142-dbg/CMakeFiles/onnxruntime_providers_cuda.dir/cmake_pch.hxx /FpC:/src/idjl-vision-dependencies-vcpkg/vcpkg/buildtrees/onnxruntime/x64-windows-mt-v142-dbg/CMakeFiles/onnxruntime_providers_cuda.dir/./cmake_pch.cxx.pch /FIC:/src/idjl-vision-dependencies-vcpkg/vcpkg/buildtrees/onnxruntime/x64-windows-mt-v142-dbg/CMakeFiles/onnxruntime_providers_cuda.dir/cmake_pch.hxx /showIncludes /FoCMakeFiles\onnxruntime_providers_cuda.dir\fd4cff21d46b32929d031faf3970a23b\onnxruntime\core\providers\cuda\cuda_execution_provider.cc.obj /FdCMakeFiles\onnxruntime_providers_cuda.dir\ /FS -c C:\src\idjl-vision-dependencies-vcpkg\vcpkg\buildtrees\onnxruntime\src\v1.15.1-8f270b0407.clean\onnxruntime\core\providers\cuda\cuda_execution_provider.cc
cl : Command line warning D9025 : overriding '/Z7' with '/Zi'
C:\src\idjl-vision-dependencies-vcpkg\vcpkg\buildtrees\onnxruntime\src\v1.15.1-8f270b0407.clean\onnxruntime\core\providers\cuda\cupti_manager.h(9): fatal error C1083: Cannot open include file: 'cupti.h': No such file or directory

@traversaro
Copy link
Contributor Author

New error:

C:\ivdv\vcpkg\buildtrees\onnxruntime\src\v1.15.1-8f270b0407.clean\onnxruntime\contrib_ops/cuda/bert/cutlass_fmha/fmha_launch_template.h(13): fatal error C1083: Cannot open include file: '41_fused_multi_head_attention/kernel_forward.h': No such file or directory
fmha_sm50.cu
C:\ivdv\vcpkg\buildtrees\onnxruntime\src\v1.15.1-8f270b0407.clean\onnxruntime\contrib_ops/cuda/bert/cutlass_fmha/fmha_launch_template.h(13): fatal error C1083: Cannot open include file: '41_fused_multi_head_attention/kernel_forward.h': No such file or directory

@traversaro
Copy link
Contributor Author

New error:

C:\ivdv\vcpkg\buildtrees\onnxruntime\src\v1.15.1-8f270b0407.clean\onnxruntime\contrib_ops/cuda/bert/cutlass_fmha/fmha_launch_template.h(13): fatal error C1083: Cannot open include file: '41_fused_multi_head_attention/kernel_forward.h': No such file or directory
fmha_sm50.cu
C:\ivdv\vcpkg\buildtrees\onnxruntime\src\v1.15.1-8f270b0407.clean\onnxruntime\contrib_ops/cuda/bert/cutlass_fmha/fmha_launch_template.h(13): fatal error C1083: Cannot open include file: '41_fused_multi_head_attention/kernel_forward.h': No such file or directory

The problem is that the cutlass dependency is not properly handled, and so the corresponding header are not found. Apparently, it is a dependency required by attention/transformers related code (see https://github.com/microsoft/onnxruntime/blob/063e9054b8056037c6c2af8de7acd0b66dadbac9/cmake/onnxruntime_providers.cmake#L533, not that they also include an header of an example, so in general it would be difficult to handled that via find_package). At at the moment we are not interested in transformers/LLM for this specific package, we can just disable the onnxruntime_USE_FLASH_ATTENTION option.

@traversaro
Copy link
Contributor Author

New error:

C:\ivdv\vcpkg\buildtrees\onnxruntime\src\v1.15.1-8f270b0407.clean\onnxruntime\contrib_ops/cuda/bert/cutlass_fmha/fmha_launch_template.h(13): fatal error C1083: Cannot open include file: '41_fused_multi_head_attention/kernel_forward.h': No such file or directory
fmha_sm50.cu
C:\ivdv\vcpkg\buildtrees\onnxruntime\src\v1.15.1-8f270b0407.clean\onnxruntime\contrib_ops/cuda/bert/cutlass_fmha/fmha_launch_template.h(13): fatal error C1083: Cannot open include file: '41_fused_multi_head_attention/kernel_forward.h': No such file or directory

The problem is that the cutlass dependency is not properly handled, and so the corresponding header are not found. Apparently, it is a dependency required by attention/transformers related code (see https://github.com/microsoft/onnxruntime/blob/063e9054b8056037c6c2af8de7acd0b66dadbac9/cmake/onnxruntime_providers.cmake#L533, not that they also include an header of an example, so in general it would be difficult to handled that via find_package). At at the moment we are not interested in transformers/LLM for this specific package, we can just disable the onnxruntime_USE_FLASH_ATTENTION option.

However, just for knoledge. The changes in the patches seems to be upstreamed in NVIDIA/cutlass@1eef5c3 and other changes, so probably just using cutlass 3.2 would be a good one to use via find_package .

@traversaro
Copy link
Contributor Author

traversaro commented Aug 8, 2023

The new error is:

C:\ivdv\vcpkg\buildtrees\onnxruntime\src\v1.15.1-8f270b0407.clean\onnxruntime\contrib_ops\cuda\bert\attention.cc(167): error C3861: 'ORT_UNUSED_VARIABLE': identifier not found
C:\ivdv\vcpkg\buildtrees\onnxruntime\src\v1.15.1-8f270b0407.clean\onnxruntime\contrib_ops\cuda\bert\attention.cc(59): note: while compiling class template member function 'onnxruntime::common::Status onnxruntime::contrib::cuda::Attention<float>::ComputeInternal(onnxruntime::OpKernelContext *) const'
C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include\type_traits(325): note: see reference to class template instantiation 'onnxruntime::contrib::cuda::Attention<float>' being compiled
C:\ivdv\vcpkg\buildtrees\onnxruntime\src\v1.15.1-8f270b0407.clean\onnxruntime\contrib_ops\cuda\bert\attention.cc(37): note: see reference to class template instantiation 'std::is_convertible<onnxruntime::contrib::cuda::Attention<float> *,_Ty *>' being compiled
        with
        [
            _Ty=onnxruntime::OpKernel
        ]

This error is just microsoft/onnxruntime#16000 and I had already encountered in conda-forge/onnxruntime-feedstock#63 (comment) . I can just bring the same patch also here.

@traversaro
Copy link
Contributor Author

CI successful in ~3 h and 22 minutes. Let's merge, we can then try also to build static triplet in a new PR.

@traversaro traversaro merged commit 31c3796 into main Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant