Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.8] norm of a Float16 vector crashes on Win64 (Windows 7) #45736

Closed
ViralBShah opened this issue Jun 18, 2022 · 17 comments · Fixed by JuliaSparse/SparseArrays.jl#143
Closed
Labels
compiler:codegen Generation of LLVM IR and native code float16 regression Regression in behavior compared to a previous version system:windows Affects only Windows
Milestone

Comments

@ViralBShah
Copy link
Member

ViralBShah commented Jun 18, 2022

This crashes on Win64 for me on 1.8-rc1, and is what has been causing JuliaSparse/SparseArrays.jl#147. I can't quite understand why it is non-deterministic on CI, but it fails reliably for me in my Win64 Windows 7 VM.

using LinearAlgebra
norm(rand(Float16, 5))

Doesn't fail through the debugger, which makes me believe that the issue is with codegen. Error log below, and seems like the missing half float conversion methods might be the culprit.

cc @Wimmerer @DilumAluthge

@ViralBShah ViralBShah added the system:windows Affects only Windows label Jun 18, 2022
@ViralBShah ViralBShah added this to the 1.8 milestone Jun 18, 2022
@ViralBShah
Copy link
Member Author

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\Viral>C:\Users\Viral\Desktop\julia-1.8.0-rc1\bin\julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.0-rc1 (2022-05-27)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> x = rand(Float16,5)
5-element Vector{Float16}:
 0.974
 0.526
 0.5166
 0.5273
 0.3027

julia> using LinearAlgebra

julia> norm(x)
JIT session error: Symbols not found: [ __gnu_f2h_ieee, __gnu_h2f_ieee ]
Failure value returned from cantFail wrapped call
Failed to materialize symbols: { (JuliaOJIT, { jfptr_mapreduce_impl_108, julia_m
apreduce_impl_107 }) }
UNREACHABLE executed at /cygdrive/c/buildbot/worker/package_win64/build/usr/incl
ude/llvm/Support/Error.h:782!

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

signal (22): SIGABRT
in expression starting at REPL[3]:1
crt_sig_handler at /cygdrive/c/buildbot/worker/package_win64/build/src\signals-w
in.c:93
raise at C:\Windows\system32\msvcrt.dll (unknown line)
abort at C:\Windows\system32\msvcrt.dll (unknown line)
.text$_ZN4llvm25llvm_unreachable_internalEPKcS1_j at C:\Users\Viral\Desktop\juli
a-1.8.0-rc1\bin\libLLVM-13jl.dll (unknown line)
cantFail<llvm::JITEvaluatedSymbol> at /cygdrive/c/buildbot/worker/package_win64/
build/usr/include/llvm/Support\Error.h:782 [inlined]
addModule at /cygdrive/c/buildbot/worker/package_win64/build/src\jitlayers.cpp:9
45
jl_add_to_ee at /cygdrive/c/buildbot/worker/package_win64/build/src\jitlayers.cp
p:1238
jl_add_to_ee at /cygdrive/c/buildbot/worker/package_win64/build/src\jitlayers.cp
p:1282
jl_add_to_ee at /cygdrive/c/buildbot/worker/package_win64/build/src\jitlayers.cp
p:1267
jl_add_to_ee at /cygdrive/c/buildbot/worker/package_win64/build/src\jitlayers.cp
p:1267
jl_add_to_ee at /cygdrive/c/buildbot/worker/package_win64/build/src\jitlayers.cp
p:1304 [inlined]
_jl_compile_codeinst at /cygdrive/c/buildbot/worker/package_win64/build/src\jitl
ayers.cpp:149
jl_generate_fptr_impl at /cygdrive/c/buildbot/worker/package_win64/build/src\jit
layers.cpp:327
jl_compile_method_internal at /cygdrive/c/buildbot/worker/package_win64/build/sr
c\gf.c:2072
jl_compile_method_internal at /cygdrive/c/buildbot/worker/package_win64/build/sr
c\gf.c:2019 [inlined]
_jl_invoke at /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2350 [inl
ined]
ijl_apply_generic at /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:25
40
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [in
lined]
do_call at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:126

eval_value at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:
215
eval_stmt_value at /cygdrive/c/buildbot/worker/package_win64/build/src\interpret
er.c:166 [inlined]
eval_body at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:5
94
jl_interpret_toplevel_thunk at /cygdrive/c/buildbot/worker/package_win64/build/s
rc\interpreter.c:750
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\top
level.c:906
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\top
level.c:850
ijl_toplevel_eval at /cygdrive/c/buildbot/worker/package_win64/build/src\topleve
l.c:915 [inlined]
ijl_toplevel_eval_in at /cygdrive/c/buildbot/worker/package_win64/build/src\topl
evel.c:965
eval at .\boot.jl:368 [inlined]
eval_user_input at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib
\v1.8\REPL\src\REPL.jl:151
repl_backend_loop at C:\buildbot\worker\package_win64\build\usr\share\julia\stdl
ib\v1.8\REPL\src\REPL.jl:247
start_repl_backend at C:\buildbot\worker\package_win64\build\usr\share\julia\std
lib\v1.8\REPL\src\REPL.jl:232
#run_repl#47 at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1
.8\REPL\src\REPL.jl:369
run_repl at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.8\R
EPL\src\REPL.jl:355
jfptr_run_repl_64032.clone_1 at C:\Users\Viral\Desktop\julia-1.8.0-rc1\lib\julia
\sys.dll (unknown line)
#966 at .\client.jl:419
jfptr_YY.966_40867.clone_1 at C:\Users\Viral\Desktop\julia-1.8.0-rc1\lib\julia\s
ys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [in
lined]
jl_f__call_latest at /cygdrive/c/buildbot/worker/package_win64/build/src\builtin
s.c:774
#invokelatest#2 at .\essentials.jl:729 [inlined]
invokelatest at .\essentials.jl:726 [inlined]
run_main_repl at .\client.jl:404
exec_options at .\client.jl:318
_start at .\client.jl:522
jfptr__start_26007.clone_1 at C:\Users\Viral\Desktop\julia-1.8.0-rc1\lib\julia\s
ys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [in
lined]
true_main at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:567
jl_repl_entrypoint at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.
c:711
mainCRTStartup at /cygdrive/c/buildbot/worker/package_win64/build/cli\loader_exe
.c:59
BaseThreadInitThunk at C:\Windows\system32\kernel32.dll (unknown line)
RtlUserThreadStart at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
Allocations: 803011 (Pool: 802628; Big: 383); GC: 1

C:\Users\Viral>

@ViralBShah ViralBShah added the compiler:codegen Generation of LLVM IR and native code label Jun 18, 2022
@rayegun
Copy link
Member

rayegun commented Jun 18, 2022

I only have beta3 right now, but that does not fail for me on my machine. Let me try rc1

E: rc1 also does not fail on a Windows 10 laptop for me.

@ViralBShah
Copy link
Member Author

ViralBShah commented Jun 18, 2022

BTW I am on Windows 7. Coming to think of it, it might just be too old to have these half float functions in the runtime (or could be an LLVM issue), which have only made an appearance in the last few years.

@staticfloat Are we running Windows 7 by any chance on the buildbots?

@ViralBShah ViralBShah changed the title norm of a Float16 vector crashes on Win64 norm of a Float16 vector crashes on Win64 (Windows 7) Jun 18, 2022
rayegun added a commit to JuliaSparse/SparseArrays.jl that referenced this issue Jun 18, 2022
@giordano
Copy link
Contributor

Can you try on master? There has been lots of movement about those functions lately

rayegun added a commit to JuliaSparse/SparseArrays.jl that referenced this issue Jun 18, 2022
* Temporarily remove tests for non 64-bit eltypes

Temporarily resolves: JuliaLang/julia#45736

* Update umfpack.jl

* Update umfpack.jl
@ViralBShah
Copy link
Member Author

Doesn't crash on the nightly!

@ViralBShah
Copy link
Member Author

One of the things I realized as a result of chasing this was that in CI for packages, we always use windows-latest, ubuntu-latest, etc. But on the buildbots, we are using conservative versions. I think it would be useful for all the packages to run their tests on conservative versions of the OSes than the latest ones, which might help with such issues.

@DilumAluthge @staticfloat Would you guys agree with this broad assessment? Perhaps we can update PkgTemplates.jl to reflect this?

@DilumAluthge
Copy link
Member

DilumAluthge commented Jun 24, 2022

For most packages in the Julia ecosystem, I think it's perfectly fine to use $OS-latest. So I would not recommend changing PkgTemplates

But for stdlibs, and packages that were formerly stdlibs, I think it does make sense to use an older OS image.

@ViralBShah
Copy link
Member Author

ViralBShah commented Jun 24, 2022

Why not just use conservative OS versions on everything? It is just better all around, isn't it? What's the right way to pick a conservative OS in GA (instead of -latest)?

At least in the stdlibs and the widely used packages, I think conservative is better.

@giordano
Copy link
Contributor

giordano commented Jun 24, 2022

One issue is that CI providers will remove support for older OS versions at some point, and the entire ecosystem will always have to catch up with those changes. It's rather annoying, especially for users not following much progress of CI providers.

@DilumAluthge
Copy link
Member

DilumAluthge commented Jun 24, 2022

Why not just use conservative OS versions on everything? It is just better all around, isn't it? What's the right way to pick a conservative OS in GA (instead of -latest)?

Yeah, so actually those two questions are related in my mind. The reason that I'm reluctant to recommend this for all packages in the Julia ecosystem is that now package authors will need to edit their CI workflow files more frequently.

For example, to use an older Windows, replace windows-latest with windows-2016.

But the problem is that windows-2016 will not be available forever. Eventually, GitHub will remove that from the list of GitHub-hosted runners. And then every single package author needs to change windows-2016 to windows-2019, which will then be the oldest Windows available.

The current list is here: https://github.com/actions/virtual-environments#available-environments

@ViralBShah
Copy link
Member Author

Yes good point. And these are not old enough to trigger this issue, I believe. Trying older OSes in JuliaSparse/SparseArrays.jl#164

@ViralBShah
Copy link
Member Author

Would be nice if this list of OSes could somehow be pulled from a central place.

@DilumAluthge
Copy link
Member

Perhaps we could send a feature request to GitHub to add aliases of the form windows-oldest or windows-earliest. That would take care of the ecosystem churn issue.

They already have aliases (windows-latest is an alias), so there is precedent.

@vchuravy
Copy link
Member

I think the actual issue is fixed?

@ViralBShah
Copy link
Member Author

No the issue remains - we are just working to avoid triggering it in SparseArrays CI.

@ViralBShah ViralBShah reopened this Jun 25, 2022
@ViralBShah
Copy link
Member Author

ViralBShah commented Jun 25, 2022

Doesn't crash on master, 1.7.3 or 1.6.6 - but does so on 1.8-rc1. This is clearly a regression on 1.8.

@DilumAluthge DilumAluthge changed the title norm of a Float16 vector crashes on Win64 (Windows 7) [1.8] norm of a Float16 vector crashes on Win64 (Windows 7) Jun 25, 2022
@DilumAluthge DilumAluthge added the regression Regression in behavior compared to a previous version label Jun 25, 2022
@vchuravy
Copy link
Member

Fixed by #45627 for rc2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code float16 regression Regression in behavior compared to a previous version system:windows Affects only Windows
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants