-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI][Windows] Workaround for error in Findzstd.cmake #17283
base: main
Are you sure you want to change the base?
[CI][Windows] Workaround for error in Findzstd.cmake #17283
Conversation
This is a workaround for an upstream LLVM issue [0], which looks to be caused by the `CMAKE_INSTALL_LIBDIR` variable is used before definition. While there is an LLVM PR to resolve this fix [1], as of 2024-08-19 it has not yet been merged to LLVM. This change is intended to resolve the following error, which occurs during the CI build of TVM on Windows. ``` The system cannot find the file specified. CMake Error at C:/Miniconda/envs/tvm-build/conda-bld/tvm-package_1723747883202/_h_env/Library/lib/cmake/llvm/Findzstd.cmake:39 (string): string sub-command REGEX, mode REPLACE: regex "$" matched an empty string. Call Stack (most recent call first): C:/Miniconda/envs/tvm-build/conda-bld/tvm-package_1723747883202/_h_env/Library/lib/cmake/llvm/LLVMConfig.cmake:277 (find_package) cmake/utils/FindLLVM.cmake:47 (find_package) cmake/modules/LLVM.cmake:31 (find_llvm) CMakeLists.txt:565 (include) ``` [0] llvm/llvm-project#83802 [1] llvm/llvm-project#83807
And confirming that the error in the Windows build in CI is resolved, as the CI has passed the location where |
Even though the CI build was able to complete, the unit test |
If pytest captures the output, segfaults in a unit test prevent any output from being printed.
They may have to do with some of the functions defined in https://github.com/apache/tvm/blob/main/src/runtime/builtin_fp16.cc#L46, although i am not sure which one |
Hmm. I'm seeing local However, I don't see any calls to these local functions in the LLVM IR. It looks like the generated LLVM IR instead uses |
I think I've tracked down the problem. Writing down the steps to record it, and to collect all the links in one spot.
To test whether there's an incompatibility between the ABI expected by LLVM, and the ABI that we provide, I've added more debug statements and commented out the call to |
i see, we can try to move forward and be compatible with later LLVM ver if that is something we can do |
Well, it was a theory, but it doesn't seem to have panned out. Even with the custom conversions in For now, I'm out of ideas. This may need to be debugged by somebody with access to a Windows development environment. |
With a few more debug print statements, the error appears unrelated to the use of If this is the case, it may be related to this issue in the github runners for Windows. From what I can tell, the Windows image shipped with a MSVC compiler newer than its MSVC runtime, causing incompatibilities between generated code and the runtime. The issue has a workaround in some cases, but several users have reported that the workaround is very fragile, and depends on (1) DLL load order, (2) whether any other program provided an older version of the MSVC runtime, and (3) whether the |
The f32-to-f32 test case passed, so it's not an issue with all generated code. Trying a f16-to-f16 conversion to see if it's a problem with the existence of f16 arguments at all.
And after a couple more test cases, it's back to looking like the conversion functions between float16 and float32 are the issue, since the Windows CI can pass when running either Though, that doesn't explain why the print statements from inside the TIR aren't showing up in the output, even when |
Just FYI, we can ssh to the GitHub Actions runner. |
This is a workaround for an upstream LLVM issue [0], which looks to be caused by the
CMAKE_INSTALL_LIBDIR
variable is used before definition. While there is an LLVM PR to resolve this fix [1], as of 2024-08-19 it has not yet been merged to LLVM.This change is intended to resolve the following error, which occurs during the CI build of TVM on Windows.
[0] llvm/llvm-project#83802
[1] llvm/llvm-project#83807