Skip to content

overcounting of memory in first/last. #15923

@ashdnazg

Description

@ashdnazg

Describe the bug

When aggregating first/last list over a column of lists, the first/last
accumulators hold the necessary scalar value as is, which points to the
list in the original input buffer.

This results in two issues:

  1. We prevent the deallocation of the input arrays which might be
    significantly larger than the single value we want to hold.

  2. During aggreagtion with groups, many accumulators receive slices of the
    same input buffer, resulting in all held values pointing to this buffer.
    Then, when calculating the size of all accumulators we count the buffer
    multiple times, since each accumulator considers it to be part of its own
    allocation.

While 1 can be tolerated, 2 can easily result in an OOM for a small input with a few thousand groups.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions