Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocating ridiculous amounts of memory hangs in GC #51242

Closed
maleadt opened this issue Sep 8, 2023 · 5 comments · Fixed by #51247
Closed

Allocating ridiculous amounts of memory hangs in GC #51242

maleadt opened this issue Sep 8, 2023 · 5 comments · Fixed by #51247
Assignees
Labels
GC Garbage collector regression Regression in behavior compared to a previous version
Milestone

Comments

@maleadt
Copy link
Member

maleadt commented Sep 8, 2023

julia> Base.format_bytes(17314512175606968824)
"15378.376 PiB"

julia> ccall(:jl_gc_counted_malloc, Ptr{Cvoid}, (Csize_t,), 17314512175606968824)
# hangs

Interrupting the process shows that it seems to be stuck doing GC, which doesn't make sense here. Maybe an issue with the GC counters?

[1534036] signal (15): Terminated
in expression starting at none:0
gc_mark_obj8 at /cache/build/default-amdci5-3/julialang/julia-release-1-dot-10/src/gc.c:1880
gc_mark_outrefs at /cache/build/default-amdci5-3/julialang/julia-release-1-dot-10/src/gc.c:2660 [inlined]
gc_mark_loop_serial_ at /cache/build/default-amdci5-3/julialang/julia-release-1-dot-10/src/gc.c:2716
gc_mark_loop_serial at /cache/build/default-amdci5-3/julialang/julia-release-1-dot-10/src/gc.c:2739
gc_mark_loop at /cache/build/default-amdci5-3/julialang/julia-release-1-dot-10/src/gc.c:2852 [inlined]
_jl_gc_collect at /cache/build/default-amdci5-3/julialang/julia-release-1-dot-10/src/gc.c:3178
ijl_gc_collect at /cache/build/default-amdci5-3/julialang/julia-release-1-dot-10/src/gc.c:3478
maybe_collect at /cache/build/default-amdci5-3/julialang/julia-release-1-dot-10/src/gc.c:941 [inlined]
jl_gc_pool_alloc_inner at /cache/build/default-amdci5-3/julialang/julia-release-1-dot-10/src/gc.c:1320
jl_gc_pool_alloc_noinline at /cache/build/default-amdci5-3/julialang/julia-release-1-dot-10/src/gc.c:1377 [inlined]
jl_gc_alloc_ at /cache/build/default-amdci5-3/julialang/julia-release-1-dot-10/src/julia_internal.h:466 [inlined]
jl_gc_alloc at /cache/build/default-amdci5-3/julialang/julia-release-1-dot-10/src/gc.c:3530
_new_array_ at /cache/build/default-amdci5-3/julialang/julia-release-1-dot-10/src/array.c:134 [inlined]
_new_array at /cache/build/default-amdci5-3/julialang/julia-release-1-dot-10/src/array.c:198 [inlined]
ijl_alloc_array_1d at /cache/build/default-amdci5-3/julialang/julia-release-1-dot-10/src/array.c:436

There also seems to be a difference between doing this from the REPL, and via --eval:

❯ ./julia -e '@show ccall(:jl_gc_counted_malloc, Ptr{Cvoid}, (Csize_t,), 17314512175606968824)'
ccall(:jl_gc_counted_malloc, Ptr{Cvoid}, (Csize_t,), 17314512175606968824) = Ptr{Nothing} @0x0000000000000000

Bisected to 8cfb350 on the backports branch, so #50682 is probably the culprit.

@maleadt maleadt added regression Regression in behavior compared to a previous version GC Garbage collector labels Sep 8, 2023
@maleadt maleadt added this to the 1.10 milestone Sep 8, 2023
@gbaraldi
Copy link
Member

gbaraldi commented Sep 8, 2023

I believe the issue is that we don't check if we actually got an allocation or not before increasing the counters, and the GC will then think it has a massive amount of memory allocated and will run very often. Though i'm not sure what can we do here, because a null check might not be good enough because the OS might give us a pointer and then segfault later.

@maleadt
Copy link
Member Author

maleadt commented Sep 8, 2023

a null check might not be good enough because the OS might give us a pointer and then segfault later

Could you elaborate? We should at least handle the NULL case.

@oscardssmith
Copy link
Member

The problem is that the OS might not actually give us the pages until we touch them so it's possible for the maloc to succeed but for us to get a segfault once we actually try to fill it with data.

@maleadt
Copy link
Member Author

maleadt commented Sep 8, 2023

I know how memory overcommit works, but I haven't seen it segfault, only OOM kills.

In any case, this doesn't matter here, as malloc actually returns NULL here. We used to handle this fine, and don't anymore, which breaks GMP.

@gbaraldi
Copy link
Member

gbaraldi commented Sep 8, 2023

I guess it won't segfault, but it will fail. In any case I have a PR that at least checks null l.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector regression Regression in behavior compared to a previous version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants