-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Healpix.jl tests hang on Julia v1.9 #46736
Comments
We're seeing similar hangs in MPI.jl, for example all jobs on nightly in https://github.com/JuliaParallel/MPI.jl/actions/runs/3201031150 are timing out. But hard to say whether it's related, not clear what the problem is exactly. CC: @simonbyrne |
I'm not sure what can be done here. Does anyone have something that can be reproduced? |
Healpix tests used to reproducibly time out on GitHub Actions, but now CI isn't running anymore on nightly because it was useless, so I don't know what's going on now. |
At least MPI seems to pass tests on nightly, so maybe this got fixed? |
Yeah, MPI had a different problem, #47472, addressed by JuliaParallel/MPI.jl#680. |
I just ran Healpix tests locally on
and they hang at the
(it exceeds the scrollbar in iTerm2). Looks fairly reproducible to me. Edit: actually when I got the stacktrace I think I was on something like 7b10d5f, on latest master |
Oh, I thought this was a memory related hang, not something different. |
They're also hanging in pkgeval: https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_date/2022-12/20/Healpix.primary.log |
playing a bit in lldb, it seems we are stuck in a call to Any idea @vtjnash ? |
Reproducer is using Healpix
Healpix.readPolarizedMapFromFITS(pkgdir(Healpix, "test", "pol_float_map.fits"), 1, Float32) |
Does an assert build of Julia give anything helpful? |
I believe my build already has them on :) |
Just a note to suggest that the peek profiler might be helpful to diagnose the issue here |
MWE: julia> x = Tuple{typeof(Base.convert), Type{T}, T} where T<:(AbstractArray{Union{Nothing, T}, 1} where T)
Tuple{typeof(convert), Type{T}, T} where T<:(AbstractArray{Union{Nothing, T}, 1} where T)
julia> y = Tuple{typeof(Base.convert), Type{AbstractArray{Union{Nothing, T}, 1}}, AbstractArray{var"#s95", 1} where var"#s95"} where T
Tuple{typeof(convert), Type{AbstractArray{Union{Nothing, T}, 1}}, AbstractVector} where T
julia> typeintersect(x, y) Started hanging on 1.7, so this specifically doesn't seem like a regression. |
Slightly shorter: x = Tuple{typeof(convert), Type{T}, T} where T<:(AbstractArray{Union{Nothing, T}} where T)
y = Tuple{typeof(convert), Type{AbstractArray{Union{Nothing, T}}}, AbstractArray} where T
typeintersect(x, y) |
The using Healpix
Healpix.readPolarizedMapFromFITS(pkgdir(Healpix, "test", "pol_float_map.fits"), 1, Float32) works fine in v1.8.3 though. |
cc @JeffBezanson; although fixing the typeintersect hang is probably a better solution. |
inside lldb it seems the hang comes from calling intersect with the following types, which seem to be kind of broken |
A suspicious call to me seems to be frame #1759: 0x00000001015a64b4 libjulia-internal.1.10.dylib`intersect(x=0x000000010b96d770, y=0x000000010b96d710, e=0x000000016eef5ab0, param=0) at subtype.c:0:13 [opt]
2989 // (as opposed to being outside any type constructor, or comparing variable bounds).
2990 // this is used to record the positions where type variables occur for the
2991 // diagonal rule (record_var_occurrence).
-> 2992 static jl_value_t *intersect(jl_value_t *x, jl_value_t *y, jl_stenv_t *e, int param)
2993 {
2994 if (x == y) return y;
2995 if (jl_is_typevar(x)) { Where doing (lldb) p *(jl_datatype_t*)x
(jl_datatype_t) $12 = {
name = 0x000000012d946590
super = 0x000000010b96d9f0
parameters = NULL
types = 0x000000012d9463b3
instance = 0x000000010b96d9f0
layout = 0x000000010b96da10
hash = 0
hasfreetypevars = '\0'
isconcretetype = '\0'
isdispatchtuple = '\0'
isbitstype = '\0'
zeroinit = '\0'
has_concrete_subtype = '\0'
cached_by_hash = '\0'
isprimitivetype = '\0'
}
(lldb) p *(jl_datatype_t*)y
(jl_datatype_t) $13 = {
name = 0x000000012d946590
super = 0x000000010b96d550
parameters = NULL
types = 0x000000012d9486d3
instance = 0x0000000000000002
layout = 0x000000010b96d710
hash = 682452752
hasfreetypevars = '\x01'
isconcretetype = '\0'
isdispatchtuple = '\0'
isbitstype = '\0'
zeroinit = '\0'
has_concrete_subtype = '\0'
cached_by_hash = '\0'
isprimitivetype = '\0'
} They have different super types, but both supertypes are the same struct. (lldb) p *(*(jl_datatype_t*)y).layout
(const jl_datatype_layout_t) $20 = {
size = 764700048
nfields = 1
npointers = 194434384
first_ptr = 1
alignment = 0
haspadding = 0
fielddesc_type = 0
padding = 0
}
(lldb) p *(*(jl_datatype_t*)x).layout
(const jl_datatype_layout_t) $21 = {
size = 764700048
nfields = 1
npointers = 194435568
first_ptr = 1
alignment = 0
haspadding = 0
fielddesc_type = 0
padding = 0
} |
|
When we perform re-intersection_unionall, the `Union` bounds might be generated from `simple_join` thus not identical to the src `Union`.
When we perform re-intersection_unionall, the `Union` bounds might be generated from `simple_join` thus not identical to the src `Union`.
When we perform re-intersection_unionall, the `Union` bounds might be generated from `simple_join` thus not identical to the src `Union`.
When we perform re-`intersection_unionall`, the `Union` bounds might be generated from `simple_join and thus not identical to the src `Union`. This commit adds a fast-path to skip the following `intersect_all.
In the last weeks, I have experienced several timeouts when I run CI tests using GitHub Actions on Julia nightly. These always happen on Windows, Linux, and Mac OS X when
Pkg.test()
is invoked. Here are two examples of jobs taken from different repositories (Healpix.jl and Stripeline.jl.Unfortunately, I cannot reproduce the failure locally: I downloaded Julia nightly on my laptop and re-ran these tests, but they completed fine (just like Julia 1.6 and 1.8).
The text was updated successfully, but these errors were encountered: