Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Healpix.jl tests hang on Julia v1.9 #46736

Closed
ziotom78 opened this issue Sep 13, 2022 · 20 comments · Fixed by #48029
Closed

Healpix.jl tests hang on Julia v1.9 #46736

ziotom78 opened this issue Sep 13, 2022 · 20 comments · Fixed by #48029
Labels
types and dispatch Types, subtyping and method dispatch
Milestone

Comments

@ziotom78
Copy link
Contributor

In the last weeks, I have experienced several timeouts when I run CI tests using GitHub Actions on Julia nightly. These always happen on Windows, Linux, and Mac OS X when Pkg.test() is invoked. Here are two examples of jobs taken from different repositories (Healpix.jl and Stripeline.jl.

Unfortunately, I cannot reproduce the failure locally: I downloaded Julia nightly on my laptop and re-ran these tests, but they completed fine (just like Julia 1.6 and 1.8).

@giordano
Copy link
Contributor

giordano commented Oct 7, 2022

We're seeing similar hangs in MPI.jl, for example all jobs on nightly in https://github.com/JuliaParallel/MPI.jl/actions/runs/3201031150 are timing out. But hard to say whether it's related, not clear what the problem is exactly. CC: @simonbyrne

@KristofferC
Copy link
Member

I'm not sure what can be done here. Does anyone have something that can be reproduced?

@giordano
Copy link
Contributor

Healpix tests used to reproducibly time out on GitHub Actions, but now CI isn't running anymore on nightly because it was useless, so I don't know what's going on now.

@gbaraldi
Copy link
Member

At least MPI seems to pass tests on nightly, so maybe this got fixed?

@giordano
Copy link
Contributor

Yeah, MPI had a different problem, #47472, addressed by JuliaParallel/MPI.jl#680.

@giordano
Copy link
Contributor

giordano commented Dec 21, 2022

I just ran Healpix tests locally on

Julia Version 1.10.0-DEV.170
Commit 077314063b* (2022-12-21 18:55 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.6.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
  Threads: 1 on 4 virtual cores

and they hang at the Map I/O test set. Ctrl+C-ing it shows and endless stacktrace

[...]
intersect_unionall_ at /Users/mose/repo/julia/src/subtype.c:2645
intersect_unionall at /Users/mose/repo/julia/src/subtype.c:2690
intersect at /Users/mose/repo/julia/src/subtype.c:0
intersect_all at /Users/mose/repo/julia/src/subtype.c:3298
intersect_aside at /Users/mose/repo/julia/src/subtype.c:2147
intersect_var at /Users/mose/repo/julia/src/subtype.c:0
intersect at /Users/mose/repo/julia/src/subtype.c:0
intersect at /Users/mose/repo/julia/src/subtype.c:0
intersect_all at /Users/mose/repo/julia/src/subtype.c:3298
intersect_aside at /Users/mose/repo/julia/src/subtype.c:2147
intersect_var at /Users/mose/repo/julia/src/subtype.c:0
[...]

(it exceeds the scrollbar in iTerm2). Looks fairly reproducible to me.

Edit: actually when I got the stacktrace I think I was on something like 7b10d5f, on latest master Ctrl+C-ing doesn't show any stacktrace, which is fairly useless, but tests still hang.

@gbaraldi
Copy link
Member

Oh, I thought this was a memory related hang, not something different.

@giordano
Copy link
Contributor

@gbaraldi
Copy link
Member

gbaraldi commented Dec 21, 2022

playing a bit in lldb, it seems we are stuck in a call to jl_type_intersection(t2, t1); where t1 = Tuple{typeof(Base.convert), Type{AbstractArray{Union{Nothing, T}, 1}}, AbstractArray{var"#s95", 1} where var"#s95"} where T and t2 = Tuple{typeof(Base.convert), Type{T}, T} where T<:(AbstractArray{Union{Nothing, T}, 1} where T)

Any idea @vtjnash ?

@giordano
Copy link
Contributor

giordano commented Dec 21, 2022

Reproducer is

using Healpix
Healpix.readPolarizedMapFromFITS(pkgdir(Healpix, "test", "pol_float_map.fits"), 1, Float32)

@DilumAluthge
Copy link
Member

Does an assert build of Julia give anything helpful?

@gbaraldi
Copy link
Member

I believe my build already has them on :)

@IanButterworth
Copy link
Member

Just a note to suggest that the peek profiler might be helpful to diagnose the issue here

@maleadt
Copy link
Member

maleadt commented Dec 22, 2022

MWE:

julia> x = Tuple{typeof(Base.convert), Type{T}, T} where T<:(AbstractArray{Union{Nothing, T}, 1} where T)
Tuple{typeof(convert), Type{T}, T} where T<:(AbstractArray{Union{Nothing, T}, 1} where T)

julia> y = Tuple{typeof(Base.convert), Type{AbstractArray{Union{Nothing, T}, 1}}, AbstractArray{var"#s95", 1} where var"#s95"} where T
Tuple{typeof(convert), Type{AbstractArray{Union{Nothing, T}, 1}}, AbstractVector} where T

julia> typeintersect(x, y)

Started hanging on 1.7, so this specifically doesn't seem like a regression.

@maleadt
Copy link
Member

maleadt commented Dec 22, 2022

Slightly shorter:

x = Tuple{typeof(convert), Type{T}, T} where T<:(AbstractArray{Union{Nothing, T}} where T)
y = Tuple{typeof(convert), Type{AbstractArray{Union{Nothing, T}}}, AbstractArray} where T
typeintersect(x, y)

@giordano
Copy link
Contributor

Started hanging on 1.7, so this specifically doesn't seem like a regression.

The Healpix.jl reproducer

using Healpix
Healpix.readPolarizedMapFromFITS(pkgdir(Healpix, "test", "pol_float_map.fits"), 1, Float32)

works fine in v1.8.3 though.

@maleadt
Copy link
Member

maleadt commented Dec 22, 2022

7ad0e3deae79934e75c7d2d886ea744f7ee1bef6 is the first bad commit
commit 7ad0e3deae79934e75c7d2d886ea744f7ee1bef6
Author: Jeff Bezanson <jeff.bezanson@gmail.com>
Date:   Tue Apr 26 18:25:40 2022 -0400

    lower `new()` to reference the called object instead of re-creating it with apply_type (#44664)

    addresses #36384

 src/julia-syntax.scm | 90 +++++++++++++++++++++++++++++++---------------------
 test/syntax.jl       |  8 +++++
 2 files changed, 62 insertions(+), 36 deletions(-)

cc @JeffBezanson; although fixing the typeintersect hang is probably a better solution.

@gbaraldi
Copy link
Member

inside lldb it seems the hang comes from calling intersect with the following types, which seem to be kind of broken
x = Union{Nothing, T} where T<:Union{Nothing, T} and y = Union{Nothing, T<:Union{Nothing, T}}

@gbaraldi
Copy link
Member

gbaraldi commented Dec 22, 2022

A suspicious call to me seems to be

frame #1759: 0x00000001015a64b4 libjulia-internal.1.10.dylib`intersect(x=0x000000010b96d770, y=0x000000010b96d710, e=0x000000016eef5ab0, param=0) at subtype.c:0:13 [opt]
   2989	// (as opposed to being outside any type constructor, or comparing variable bounds).
   2990	// this is used to record the positions where type variables occur for the
   2991	// diagonal rule (record_var_occurrence).
-> 2992	static jl_value_t *intersect(jl_value_t *x, jl_value_t *y, jl_stenv_t *e, int param)
   2993	{
   2994	    if (x == y) return y;
   2995	    if (jl_is_typevar(x)) {

Where doing jl_(x) gives Union{Nothing, T<:Union{Nothing, T}} and jl_(y) gives Union{Nothing, T<:Union{Nothing, T}} But they are't the same value it seems

(lldb) p *(jl_datatype_t*)x
(jl_datatype_t) $12 = {
  name = 0x000000012d946590
  super = 0x000000010b96d9f0
  parameters = NULL
  types = 0x000000012d9463b3
  instance = 0x000000010b96d9f0
  layout = 0x000000010b96da10
  hash = 0
  hasfreetypevars = '\0'
  isconcretetype = '\0'
  isdispatchtuple = '\0'
  isbitstype = '\0'
  zeroinit = '\0'
  has_concrete_subtype = '\0'
  cached_by_hash = '\0'
  isprimitivetype = '\0'
}
(lldb) p *(jl_datatype_t*)y
(jl_datatype_t) $13 = {
  name = 0x000000012d946590
  super = 0x000000010b96d550
  parameters = NULL
  types = 0x000000012d9486d3
  instance = 0x0000000000000002
  layout = 0x000000010b96d710
  hash = 682452752
  hasfreetypevars = '\x01'
  isconcretetype = '\0'
  isdispatchtuple = '\0'
  isbitstype = '\0'
  zeroinit = '\0'
  has_concrete_subtype = '\0'
  cached_by_hash = '\0'
  isprimitivetype = '\0'
}

They have different super types, but both supertypes are the same struct.
Their layouts are slightly different

(lldb) p *(*(jl_datatype_t*)y).layout
(const jl_datatype_layout_t) $20 = {
  size = 764700048
  nfields = 1
  npointers = 194434384
  first_ptr = 1
  alignment = 0
  haspadding = 0
  fielddesc_type = 0
  padding = 0
}
(lldb) p *(*(jl_datatype_t*)x).layout
(const jl_datatype_layout_t) $21 = {
  size = 764700048
  nfields = 1
  npointers = 194435568
  first_ptr = 1
  alignment = 0
  haspadding = 0
  fielddesc_type = 0
  padding = 0
}

@giordano giordano added the types and dispatch Types, subtyping and method dispatch label Dec 22, 2022
@giordano giordano changed the title Julia nightly timeouts when running tests on GitHub Actions Healpix.jl tests hang on Julia v1.9 Dec 22, 2022
@vtjnash
Copy link
Member

vtjnash commented Dec 22, 2022

T<:Union{Nothing, T}} is probably malformed in that a type should not be a subtype of itself. Sounds like a similar issue to one we have encountered elsewhere (e.g. #47877 (comment) and #47868)

N5N3 added a commit to N5N3/julia that referenced this issue Dec 28, 2022
When we perform re-intersection_unionall, the `Union` bounds might be generated from `simple_join` thus not identical to the src `Union`.
N5N3 added a commit to N5N3/julia that referenced this issue Dec 30, 2022
When we perform re-intersection_unionall, the `Union` bounds might be generated from `simple_join` thus not identical to the src `Union`.
N5N3 added a commit to N5N3/julia that referenced this issue Dec 31, 2022
When we perform re-intersection_unionall, the `Union` bounds might be generated from `simple_join` thus not identical to the src `Union`.
N5N3 added a commit to N5N3/julia that referenced this issue Jan 3, 2023
When we perform re-`intersection_unionall`, the `Union` bounds might be generated from `simple_join  and thus not identical to the src `Union`.
This commit adds a fast-path to skip the following `intersect_all.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
types and dispatch Types, subtyping and method dispatch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants