Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gc_managed_realloc_ should be incrementing gc_num.freed if it shrinks the memory buffer on reallocation #52923

Closed
d-netto opened this issue Jan 16, 2024 · 1 comment
Labels
GC Garbage collector

Comments

@d-netto
Copy link
Member

d-netto commented Jan 16, 2024

In release-1.10, gc_managed_realloc_ is being defined as:

1.10:
static void *gc_managed_realloc_(jl_ptls_t ptls, void *d, size_t sz, size_t oldsz,
                                 int isaligned, jl_value_t *owner, int8_t can_collect)
{
    if (can_collect)
        maybe_collect(ptls);
    int is_old_marked = jl_astaggedvalue(owner)->bits.gc == GC_OLD_MARKED;
    size_t allocsz = LLT_ALIGN(sz, JL_CACHE_BYTE_ALIGNMENT);
    if (allocsz < sz)  // overflow in adding offs, size was "negative"
        jl_throw(jl_memory_exception);

    int last_errno = errno;
#ifdef _OS_WINDOWS_
    DWORD last_error = GetLastError();
#endif
    void *b;
    if (isaligned)
        b = realloc_cache_align(d, allocsz, oldsz);
    else
        b = realloc(d, allocsz);
    if (b == NULL)
        jl_throw(jl_memory_exception);
#ifdef _OS_WINDOWS_
    SetLastError(last_error);
#endif
    errno = last_errno;
    // gc_managed_realloc_ is currently used exclusively for resizing array buffers.
    if (is_old_marked) {
        ptls->gc_cache.perm_scanned_bytes += allocsz - oldsz;
        inc_live_bytes(allocsz - oldsz);
    }
    else if (!(allocsz < oldsz))
        jl_atomic_store_relaxed(&ptls->gc_num.allocd,
            jl_atomic_load_relaxed(&ptls->gc_num.allocd) + (allocsz - oldsz));
    jl_atomic_store_relaxed(&ptls->gc_num.realloc,
        jl_atomic_load_relaxed(&ptls->gc_num.realloc) + 1);
    if (allocsz > oldsz) {
        maybe_record_alloc_to_profile((jl_value_t*)b, allocsz - oldsz, (jl_datatype_t*)jl_buff_tag);
    }
    return b;
}

as opposed to the implementation from 1.9:

1.9:
static void *gc_managed_realloc_(jl_ptls_t ptls, void *d, size_t sz, size_t oldsz,
                                 int isaligned, jl_value_t *owner, int8_t can_collect)
{
    if (can_collect)
        maybe_collect(ptls);

    size_t allocsz = LLT_ALIGN(sz, JL_CACHE_BYTE_ALIGNMENT);
    if (allocsz < sz)  // overflow in adding offs, size was "negative"
        jl_throw(jl_memory_exception);

    if (jl_astaggedvalue(owner)->bits.gc == GC_OLD_MARKED) {
        ptls->gc_cache.perm_scanned_bytes += allocsz - oldsz;
        live_bytes += allocsz - oldsz;
    }
    else if (allocsz < oldsz)
        jl_atomic_store_relaxed(&ptls->gc_num.freed,
            jl_atomic_load_relaxed(&ptls->gc_num.freed) + (oldsz - allocsz));
    else
        jl_atomic_store_relaxed(&ptls->gc_num.allocd,
            jl_atomic_load_relaxed(&ptls->gc_num.allocd) + (allocsz - oldsz));
    jl_atomic_store_relaxed(&ptls->gc_num.realloc,
        jl_atomic_load_relaxed(&ptls->gc_num.realloc) + 1);

    int last_errno = errno;
#ifdef _OS_WINDOWS_
    DWORD last_error = GetLastError();
#endif
    void *b;
    if (isaligned)
        b = realloc_cache_align(d, allocsz, oldsz);
    else
        b = realloc(d, allocsz);
    if (b == NULL)
        jl_throw(jl_memory_exception);
#ifdef _OS_WINDOWS_
    SetLastError(last_error);
#endif
    errno = last_errno;
    maybe_record_alloc_to_profile((jl_value_t*)b, sz, jl_gc_unknown_type_tag);
    return b;
}

For some reason, after #50144 we stopped incrementing the number of freed bytes if we shrink the memory buffer on a realloc.

In particular, this could lead to some issues since our heuristics use this metric to compute live_bytes as a proxy for the heap size.

It could be, for instance, one of the causes behind a pathological behavior we're seeing in one of our workloads in 1.10, where live_bytes increases monotonically despite RSS being stable.

CC: @gbaraldi

@d-netto d-netto added the GC Garbage collector label Jan 16, 2024
@gbaraldi
Copy link
Member

gbaraldi commented Jan 16, 2024

I think when we ported the old behavior into 1.10 this got missed? Because 1.11 and forward doesn't use live bytes, it uses heap size, so the freed amount there doesn't really matter. But in 1.10 it matters

oscardssmith pushed a commit that referenced this issue Jan 16, 2024
I think when we backed off the new heuristics for 1.10 this got missed
in the meanwhile.
Should fix #52923
@d-netto d-netto closed this as completed Jan 17, 2024
Drvi pushed a commit to RelationalAI/julia that referenced this issue Jun 7, 2024
I think when we backed off the new heuristics for 1.10 this got missed
in the meanwhile.
Should fix JuliaLang#52923
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector
Projects
None yet
Development

No branches or pull requests

2 participants