Improve aggregate performance by special casing single group keys #6969

alamb · 2023-07-14T20:43:50Z

Is your feature request related to a problem or challenge?

This is another observation to make aggregation queries go faster

After #6904 the speed of grouping is much faster as much of the per-aggregate state overhead is removed.

Now, for queries with a single primitive type (e.g. Int64) a significant portion of their execution time is the converting the grouping key into the the arrow Row format: https://docs.rs/arrow-row/43.0.0/arrow_row/index.html to compare

An example of such a query is to find the number of interactions with some user id:

SELECT count(*)
FROM t
GROUP BY user_id

Describe the solution you'd like

As @Dandandan suggested on #6800 (comment)

we should also consider avoiding the conversion to the row-format as this likely will be one of the more expensive things now.

So that would mean that for group by with a single column, we could use the (native) representation of that column to store the group values rather than using the Arrow Row format.

This is the same approach taken today by Sort / SortPreservingMerge. It is implemented via a generic FieldCursor:

https://github.com/apache/arrow-datafusion/blob/d316702722e6c301fdb23a9698f7ec415ef548e9/datafusion/core/src/physical_plan/sorts/cursor.rs#L180-L191

Describe alternatives you've considered

@yahoNanJing has also been exploring a fixed width row cursor implementation that is optimized for comparison rather than ordering: apache/arrow-rs#4524

Additional context

There is some discussion in the ASF slack channel as well: https://the-asf.slack.com/archives/C01QUFS30TD/p1689301661826909

The text was updated successfully, but these errors were encountered:

alamb · 2023-07-17T12:57:42Z

@yahoNanJing reports that with apache/arrow-rs#4523 / apache/arrow-rs#4524 and yahoNanJing@a18ac07 they see a significant performance improvement in TPHC queries

Thus, I think this idea holds significant promise

alamb · 2023-07-17T22:02:21Z

Just to be clear, what I was imagining for the group storage is not to change the contents of the RawTable (it will continue to contain group_indexes).

But instead of storing group_values using the arrow Row format

 stores "group       stores group values, 
    indexes"          in arrow_row format 
                                          
 ┌─────────────┐      ┌────────────┐      
 │   ┌─────┐   │      │ ┌────────┐ │      
 │   │  5  │   │ ┌────┼▶│  "A"   │ │      
 │   ├─────┤   │ │    │ ├────────┤ │      
 │   │  9  │   │ │    │ │  "Z"   │ │      
 │   └─────┘   │ │    │ └────────┘ │      
 │     ...     │ │    │            │      
 │   ┌─────┐   │ │    │    ...     │      
 │   │  1  │───┼─┘    │            │      
 │   ├─────┤   │      │            │      
 │   │ 13  │───┼─┐    │ ┌────────┐ │      
 │   └─────┘   │ └────┼▶│  "Q"   │ │      
 └─────────────┘      │ └────────┘ │      
                      │            │      
                      └────────────┘      
                                          
                                          
                                          
       map            group_values        
  (Hash Table)

We would instead store the group values using a native type like Vec<T> like this

 stores "group               stored in a      
    indexes"                native Vec<T>     
                                              
 ┌─────────────┐            ┌──────────┐      
 │   ┌─────┐   │            │ ┌──────┐ │      
 │   │  5  │   │    ┌───────┼▶│  1   │ │      
 │   ├─────┤   │    │       │ ├──────┤ │      
 │   │  9  │   │    │       │ │  3   │ │      
 │   └─────┘   │    │       │ └──────┘ │      
 │     ...     │    │       │          │      
 │   ┌─────┐   │    │       │    ...   │      
 │   │  1  │───┼────┘       │          │      
 │   ├─────┤   │            │          │      
 │   │ 13  │───┼────┐       │ ┌──────┐ │      
 │   └─────┘   │    └───────┼▶│  5   │ │      
 └─────────────┘            │ └──────┘ │      
                            │          │      
                            └──────────┘      
                                              
                                              
                            group_values      
       map                                    
  (Hash Table)

Handling null values would need some care, but since this would only be for single columns (where there can be at most one null value) I think we could figure out some way to handle it

alamb · 2023-07-18T20:10:54Z

I ran some some tests of the various approaches on

fixed width row format, inlined into hash table: https://github.com/yahoNanJing/arrow-datafusion/tree/EBAY-KYLIN-4003-5 from @yahoNanJing (aka with the change yahoNanJing@a18ac07 embeds the group value directly into the table)
unsafe row access Test: inline everything and add row_unchecked arrow-rs#4539 / Test: Use unchecked group_value access in group hash #7010

EBAY-KYLIN-4003-5 (inlined group values): mixed results

--------------------
Benchmark tpch_mem.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ EBAY-KYLIN-4003-5 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  537.94ms │          544.54ms │     no change │
│ QQuery 2     │  161.75ms │          169.20ms │     no change │
│ QQuery 3     │  164.01ms │          169.22ms │     no change │
│ QQuery 4     │  114.47ms │          112.47ms │     no change │
│ QQuery 5     │  377.65ms │          389.09ms │     no change │
│ QQuery 6     │   41.12ms │           39.79ms │     no change │
│ QQuery 7     │  885.44ms │          882.59ms │     no change │
│ QQuery 8     │  237.45ms │          234.45ms │     no change │
│ QQuery 9     │  556.79ms │          556.93ms │     no change │
│ QQuery 10    │  313.01ms │          326.68ms │     no change │
│ QQuery 11    │  158.40ms │          168.29ms │  1.06x slower │
│ QQuery 12    │  171.57ms │          161.69ms │ +1.06x faster │
│ QQuery 13    │  298.77ms │          287.11ms │     no change │
│ QQuery 14    │   49.25ms │           47.53ms │     no change │
│ QQuery 15    │   49.57ms │           50.07ms │     no change │
│ QQuery 16    │  167.13ms │          171.66ms │     no change │
│ QQuery 17    │  950.03ms │          871.42ms │ +1.09x faster │
│ QQuery 18    │ 1576.97ms │         1706.98ms │  1.08x slower │
│ QQuery 19    │  178.84ms │          161.94ms │ +1.10x faster │
│ QQuery 20    │  331.30ms │          293.13ms │ +1.13x faster │
│ QQuery 21    │ 1120.20ms │         1055.25ms │ +1.06x faster │
│ QQuery 22    │   84.74ms │           86.40ms │     no change │
└──────────────┴───────────┴───────────────────┴───────────────┘

Unsafe Row Access: mixed results

--------------------
Benchmark tpch_mem.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ alamb_peephole ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  548.62ms │       519.95ms │ +1.06x faster │
│ QQuery 2     │  154.00ms │       149.26ms │     no change │
│ QQuery 3     │  159.95ms │       164.44ms │     no change │
│ QQuery 4     │  113.52ms │       117.78ms │     no change │
│ QQuery 5     │  374.59ms │       381.38ms │     no change │
│ QQuery 6     │   39.37ms │        43.51ms │  1.11x slower │
│ QQuery 7     │  932.03ms │       947.92ms │     no change │
│ QQuery 8     │  231.40ms │       245.70ms │  1.06x slower │
│ QQuery 9     │  569.26ms │       551.04ms │     no change │
│ QQuery 10    │  325.29ms │       321.81ms │     no change │
│ QQuery 11    │  171.33ms │       156.11ms │ +1.10x faster │
│ QQuery 12    │  167.87ms │       158.86ms │ +1.06x faster │
│ QQuery 13    │  302.12ms │       282.03ms │ +1.07x faster │
│ QQuery 14    │   47.34ms │        47.49ms │     no change │
│ QQuery 15    │   52.53ms │        52.89ms │     no change │
│ QQuery 16    │  153.12ms │       149.93ms │     no change │
│ QQuery 17    │  914.18ms │       977.18ms │  1.07x slower │
│ QQuery 18    │ 1527.32ms │      1612.21ms │  1.06x slower │
│ QQuery 19    │  164.35ms │       152.92ms │ +1.07x faster │
│ QQuery 20    │  308.12ms │       312.37ms │     no change │
│ QQuery 21    │ 1047.68ms │      1080.79ms │     no change │
│ QQuery 22    │   88.36ms │        85.59ms │     no change │
└──────────────┴───────────┴────────────────┴───────────────┘

alamb · 2023-07-24T11:09:49Z

@tustvold has specialized group keys for single column primitive types and set the code structure for other specializations. I have filed the idea for special casing string columns in #7064

Thus closing this ticket as complete and we can track follow on work using other tickets

alamb added the enhancement New feature or request label Jul 14, 2023

This was referenced Jul 14, 2023

Complete Implement fast Vectorized grouping for high cardinality #6889

Closed

[EPIC] (Even More) Grouping / Group By / Aggregation Performance #7000

Open

alamb mentioned this issue Jul 17, 2023

Add FixedWidthRows for the arrow-row apache/arrow-rs#4524

Closed

alamb mentioned this issue Jul 18, 2023

Test: Use unchecked group_value access in group hash #7010

Closed

tustvold self-assigned this Jul 18, 2023

tustvold changed the title ~~Improve aggregate performance by special casting single group keys~~ Improve aggregate performance by special casing single group keys Jul 18, 2023

alamb mentioned this issue Jul 18, 2023

Test: inline everything and add row_unchecked apache/arrow-rs#4539

Closed

tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Jul 19, 2023

Extract GroupValues (apache#6969)

a961ad0

tustvold mentioned this issue Jul 19, 2023

Extract GroupValues (#6969) #7016

Merged

tustvold added a commit that referenced this issue Jul 19, 2023

Extract GroupValues (#6969) (#7016)

a6dcd94

tustvold mentioned this issue Jul 20, 2023

Specialize single column primitive group values #7043

Merged

alamb mentioned this issue Jul 24, 2023

Improve aggregate performance with specialized groups accumulator for single string group by #7064

Closed

alamb closed this as completed Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve aggregate performance by special casing single group keys #6969

Improve aggregate performance by special casing single group keys #6969

alamb commented Jul 14, 2023 •

edited

Loading

alamb commented Jul 17, 2023 •

edited

Loading

alamb commented Jul 17, 2023

alamb commented Jul 18, 2023 •

edited

Loading

alamb commented Jul 24, 2023

Improve aggregate performance by special casing single group keys #6969

Improve aggregate performance by special casing single group keys #6969

Comments

alamb commented Jul 14, 2023 • edited Loading

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

alamb commented Jul 17, 2023 • edited Loading

alamb commented Jul 17, 2023

alamb commented Jul 18, 2023 • edited Loading

EBAY-KYLIN-4003-5 (inlined group values): mixed results

Unsafe Row Access: mixed results

alamb commented Jul 24, 2023

alamb commented Jul 14, 2023 •

edited

Loading

alamb commented Jul 17, 2023 •

edited

Loading

alamb commented Jul 18, 2023 •

edited

Loading