gc_managed_realloc_ should be incrementing gc_num.freed if it shrinks the memory buffer on reallocation #52923

d-netto · 2024-01-16T12:24:08Z

In release-1.10, gc_managed_realloc_ is being defined as:

1.10:

static void *gc_managed_realloc_(jl_ptls_t ptls, void *d, size_t sz, size_t oldsz,
                                 int isaligned, jl_value_t *owner, int8_t can_collect)
{
    if (can_collect)
        maybe_collect(ptls);
    int is_old_marked = jl_astaggedvalue(owner)->bits.gc == GC_OLD_MARKED;
    size_t allocsz = LLT_ALIGN(sz, JL_CACHE_BYTE_ALIGNMENT);
    if (allocsz < sz)  // overflow in adding offs, size was "negative"
        jl_throw(jl_memory_exception);

    int last_errno = errno;
#ifdef _OS_WINDOWS_
    DWORD last_error = GetLastError();
#endif
    void *b;
    if (isaligned)
        b = realloc_cache_align(d, allocsz, oldsz);
    else
        b = realloc(d, allocsz);
    if (b == NULL)
        jl_throw(jl_memory_exception);
#ifdef _OS_WINDOWS_
    SetLastError(last_error);
#endif
    errno = last_errno;
    // gc_managed_realloc_ is currently used exclusively for resizing array buffers.
    if (is_old_marked) {
        ptls->gc_cache.perm_scanned_bytes += allocsz - oldsz;
        inc_live_bytes(allocsz - oldsz);
    }
    else if (!(allocsz < oldsz))
        jl_atomic_store_relaxed(&ptls->gc_num.allocd,
            jl_atomic_load_relaxed(&ptls->gc_num.allocd) + (allocsz - oldsz));
    jl_atomic_store_relaxed(&ptls->gc_num.realloc,
        jl_atomic_load_relaxed(&ptls->gc_num.realloc) + 1);
    if (allocsz > oldsz) {
        maybe_record_alloc_to_profile((jl_value_t*)b, allocsz - oldsz, (jl_datatype_t*)jl_buff_tag);
    }
    return b;
}

as opposed to the implementation from 1.9:

1.9:

static void *gc_managed_realloc_(jl_ptls_t ptls, void *d, size_t sz, size_t oldsz,
                                 int isaligned, jl_value_t *owner, int8_t can_collect)
{
    if (can_collect)
        maybe_collect(ptls);

    size_t allocsz = LLT_ALIGN(sz, JL_CACHE_BYTE_ALIGNMENT);
    if (allocsz < sz)  // overflow in adding offs, size was "negative"
        jl_throw(jl_memory_exception);

    if (jl_astaggedvalue(owner)->bits.gc == GC_OLD_MARKED) {
        ptls->gc_cache.perm_scanned_bytes += allocsz - oldsz;
        live_bytes += allocsz - oldsz;
    }
    else if (allocsz < oldsz)
        jl_atomic_store_relaxed(&ptls->gc_num.freed,
            jl_atomic_load_relaxed(&ptls->gc_num.freed) + (oldsz - allocsz));
    else
        jl_atomic_store_relaxed(&ptls->gc_num.allocd,
            jl_atomic_load_relaxed(&ptls->gc_num.allocd) + (allocsz - oldsz));
    jl_atomic_store_relaxed(&ptls->gc_num.realloc,
        jl_atomic_load_relaxed(&ptls->gc_num.realloc) + 1);

    int last_errno = errno;
#ifdef _OS_WINDOWS_
    DWORD last_error = GetLastError();
#endif
    void *b;
    if (isaligned)
        b = realloc_cache_align(d, allocsz, oldsz);
    else
        b = realloc(d, allocsz);
    if (b == NULL)
        jl_throw(jl_memory_exception);
#ifdef _OS_WINDOWS_
    SetLastError(last_error);
#endif
    errno = last_errno;
    maybe_record_alloc_to_profile((jl_value_t*)b, sz, jl_gc_unknown_type_tag);
    return b;
}

For some reason, after #50144 we stopped incrementing the number of freed bytes if we shrink the memory buffer on a realloc.

In particular, this could lead to some issues since our heuristics use this metric to compute live_bytes as a proxy for the heap size.

It could be, for instance, one of the causes behind a pathological behavior we're seeing in one of our workloads in 1.10, where live_bytes increases monotonically despite RSS being stable.

CC: @gbaraldi

The text was updated successfully, but these errors were encountered:

gbaraldi · 2024-01-16T14:59:09Z

I think when we ported the old behavior into 1.10 this got missed? Because 1.11 and forward doesn't use live bytes, it uses heap size, so the freed amount there doesn't really matter. But in 1.10 it matters

I think when we backed off the new heuristics for 1.10 this got missed in the meanwhile. Should fix #52923

I think when we backed off the new heuristics for 1.10 this got missed in the meanwhile. Should fix JuliaLang#52923

d-netto added the GC Garbage collector label Jan 16, 2024

gbaraldi mentioned this issue Jan 16, 2024

Correctly port realloc from 1.9 to 1.10 #52929

Merged

oscardssmith pushed a commit that referenced this issue Jan 16, 2024

Correctly port realloc from 1.9 to 1.10 (#52929)

788eff9

I think when we backed off the new heuristics for 1.10 this got missed in the meanwhile. Should fix #52923

d-netto closed this as completed Jan 17, 2024

Drvi pushed a commit to RelationalAI/julia that referenced this issue Jun 7, 2024

Correctly port realloc from 1.9 to 1.10 (JuliaLang#52929)

e88fff6

I think when we backed off the new heuristics for 1.10 this got missed in the meanwhile. Should fix JuliaLang#52923

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gc_managed_realloc_ should be incrementing gc_num.freed if it shrinks the memory buffer on reallocation #52923

gc_managed_realloc_ should be incrementing gc_num.freed if it shrinks the memory buffer on reallocation #52923

d-netto commented Jan 16, 2024

gbaraldi commented Jan 16, 2024 •

edited

Loading

gc_managed_realloc_ should be incrementing gc_num.freed if it shrinks the memory buffer on reallocation #52923

gc_managed_realloc_ should be incrementing gc_num.freed if it shrinks the memory buffer on reallocation #52923

Comments

d-netto commented Jan 16, 2024

gbaraldi commented Jan 16, 2024 • edited Loading

gbaraldi commented Jan 16, 2024 •

edited

Loading