Skip to content

High CPU during dynamic filter bound computation: min_batch/max_batch #17486

@LiaCastaneda

Description

@LiaCastaneda

Describe the bug

Hello! We are very interested in using the Dynamic Filtering on hash joins optim feature, so I brought the dynamic filtering work DataDog#43 to test it on a production environment.

The API works as expected (we do receive dynamic filters). However, since bounds are computed for all queries using Hash Join, the change put all our resources at maximum CPU and profiling showed that almost 2/3 of the overall CPU usage was spent in min_batch and max_batch when comparing lists. Specifically the profiler showed the issue was in min_batch -> min_max_batch_generic -> ScalarValue::partial_cmp for lists (via partial_cmp_list).

It looks like comparing list values during bounds computation is expensive, is this expected?

To Reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions