Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(query): refactor hash join spill #15746

Merged
merged 51 commits into from
Aug 20, 2024

Conversation

Dousir9
Copy link
Member

@Dousir9 Dousir9 commented Jun 6, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Code readability and maintainability

  1. New step transition mechanism: this PR divides hash join probe/build step into three types: SyncStep, AsyncStep, and Finish, ensuring that step transition occurs in the event and this PR redesigns step transition to make the step transition logic clearer and improve code readability and maintainability.
  2. Unify ProbeSpillHandler and BuildSpillHandler to HashJoinSpiller to improve code maintainability.
  3. Unify the spill logic of cross join and other types.
  4. Introduce HashTableType to reduce code.

Bug Fixed

  1. Fix a corner case of left-related join when spill happens.
  2. The left-anti join should perform probe if there is no partition on the build side.
  3. Fix out of range for slice panic when spill happens.

Tests

  • Add TPC-H spill test
  • Add TPC-DS spill test

Performance (Databend Cloud XSMALL)

TPC-H SF100 Q18: 164s → 137s

Test script:

set join_spilling_buffer_threshold_per_proc_mb = 512;
set join_spilling_bytes_threshold_per_proc = 1;
set join_spilling_memory_ratio = 1;
select
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice,
    sum(l_quantity)
from
    customer,
    orders,
    lineitem
where
        o_orderkey in (
        select
            l_orderkey
        from
            lineitem
        group by
            l_orderkey having
                sum(l_quantity) > 300
    )
  and c_custkey = o_custkey
  and o_orderkey = l_orderkey
group by
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice
order by
    o_totalprice desc,
    o_orderdate;

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe): code maintainability

This change is Reviewable

@github-actions github-actions bot added the pr-refactor this PR changes the code base without new features or bugfix label Jun 6, 2024
@Dousir9 Dousir9 added the ci-cloud Build docker image for cloud test label Jun 7, 2024
Copy link
Contributor

github-actions bot commented Jun 7, 2024

Docker Image for PR

  • tag: pr-15746-9107967-1717765135

note: this image tag is only available for internal use,
please check the internal doc for more details.

@Dousir9 Dousir9 added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Jun 11, 2024
Copy link
Contributor

Docker Image for PR

  • tag: pr-15746-61aa91b-1718103773

note: this image tag is only available for internal use,
please check the internal doc for more details.

@Dousir9 Dousir9 added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Aug 16, 2024
Copy link
Contributor

Docker Image for PR

  • tag: pr-15746-d5f0732-1723784071

note: this image tag is only available for internal use,
please check the internal doc for more details.

@Dousir9 Dousir9 added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Aug 19, 2024
Copy link
Contributor

Docker Image for PR

  • tag: pr-15746-550a9dc-1724066510

note: this image tag is only available for internal use,
please check the internal doc for more details.

@Dousir9 Dousir9 removed the ci-cloud Build docker image for cloud test label Aug 19, 2024
@Dousir9 Dousir9 added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Aug 19, 2024
Copy link
Contributor

Docker Image for PR

  • tag: pr-15746-7335d28-1724069360

note: this image tag is only available for internal use,
please check the internal doc for more details.

@Dousir9 Dousir9 requested review from xudong963 and zhang2014 August 19, 2024 13:50
@Dousir9 Dousir9 marked this pull request as ready for review August 19, 2024 13:50
Copy link
Member

@zhang2014 zhang2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Dousir9 Dousir9 added this pull request to the merge queue Aug 20, 2024
Merged via the queue into databendlabs:main with commit 7dc8a1a Aug 20, 2024
69 checks passed
@Dousir9 Dousir9 deleted the refactor_hash_join_spill branch August 20, 2024 05:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-refactor this PR changes the code base without new features or bugfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants