Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve aggregate performance by special casing single group keys #6969

Closed
alamb opened this issue Jul 14, 2023 · 4 comments
Closed

Improve aggregate performance by special casing single group keys #6969

alamb opened this issue Jul 14, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Jul 14, 2023

Is your feature request related to a problem or challenge?

This is another observation to make aggregation queries go faster

After #6904 the speed of grouping is much faster as much of the per-aggregate state overhead is removed.

Now, for queries with a single primitive type (e.g. Int64) a significant portion of their execution time is the converting the grouping key into the the arrow Row format: https://docs.rs/arrow-row/43.0.0/arrow_row/index.html to compare

An example of such a query is to find the number of interactions with some user id:

SELECT count(*)
FROM t
GROUP BY user_id

Describe the solution you'd like

As @Dandandan suggested on #6800 (comment)

we should also consider avoiding the conversion to the row-format as this likely will be one of the more expensive things now.

So that would mean that for group by with a single column, we could use the (native) representation of that column to store the group values rather than using the Arrow Row format.

This is the same approach taken today by Sort / SortPreservingMerge. It is implemented via a generic FieldCursor:

https://github.com/apache/arrow-datafusion/blob/d316702722e6c301fdb23a9698f7ec415ef548e9/datafusion/core/src/physical_plan/sorts/cursor.rs#L180-L191

Describe alternatives you've considered

@yahoNanJing has also been exploring a fixed width row cursor implementation that is optimized for comparison rather than ordering: apache/arrow-rs#4524

Additional context

There is some discussion in the ASF slack channel as well: https://the-asf.slack.com/archives/C01QUFS30TD/p1689301661826909

@alamb
Copy link
Contributor Author

alamb commented Jul 17, 2023

@yahoNanJing reports that with apache/arrow-rs#4523 / apache/arrow-rs#4524 and yahoNanJing@a18ac07 they see a significant performance improvement in TPHC queries

Screen Shot 2023-07-17 at 16 29 41

Thus, I think this idea holds significant promise

@alamb
Copy link
Contributor Author

alamb commented Jul 17, 2023

Just to be clear, what I was imagining for the group storage is not to change the contents of the RawTable (it will continue to contain group_indexes).

But instead of storing group_values using the arrow Row format

 stores "group       stores group values, 
    indexes"          in arrow_row format 
                                          
 ┌─────────────┐      ┌────────────┐      
 │   ┌─────┐   │      │ ┌────────┐ │      
 │   │  5  │   │ ┌────┼▶│  "A"   │ │      
 │   ├─────┤   │ │    │ ├────────┤ │      
 │   │  9  │   │ │    │ │  "Z"   │ │      
 │   └─────┘   │ │    │ └────────┘ │      
 │     ...     │ │    │            │      
 │   ┌─────┐   │ │    │    ...     │      
 │   │  1  │───┼─┘    │            │      
 │   ├─────┤   │      │            │      
 │   │ 13  │───┼─┐    │ ┌────────┐ │      
 │   └─────┘   │ └────┼▶│  "Q"   │ │      
 └─────────────┘      │ └────────┘ │      
                      │            │      
                      └────────────┘      
                                          
                                          
                                          
       map            group_values        
  (Hash Table)                            
                                                                                                
                                                                               

We would instead store the group values using a native type like Vec<T> like this

 stores "group               stored in a      
    indexes"                native Vec<T>     
                                              
 ┌─────────────┐            ┌──────────┐      
 │   ┌─────┐   │            │ ┌──────┐ │      
 │   │  5  │   │    ┌───────┼▶│  1   │ │      
 │   ├─────┤   │    │       │ ├──────┤ │      
 │   │  9  │   │    │       │ │  3   │ │      
 │   └─────┘   │    │       │ └──────┘ │      
 │     ...     │    │       │          │      
 │   ┌─────┐   │    │       │    ...   │      
 │   │  1  │───┼────┘       │          │      
 │   ├─────┤   │            │          │      
 │   │ 13  │───┼────┐       │ ┌──────┐ │      
 │   └─────┘   │    └───────┼▶│  5   │ │      
 └─────────────┘            │ └──────┘ │      
                            │          │      
                            └──────────┘      
                                              
                                              
                            group_values      
       map                                    
  (Hash Table)                                
                                              

Handling null values would need some care, but since this would only be for single columns (where there can be at most one null value) I think we could figure out some way to handle it

@alamb
Copy link
Contributor Author

alamb commented Jul 18, 2023

I ran some some tests of the various approaches on

EBAY-KYLIN-4003-5 (inlined group values): mixed results

--------------------
Benchmark tpch_mem.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ EBAY-KYLIN-4003-5 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  537.94ms │          544.54ms │     no change │
│ QQuery 2     │  161.75ms │          169.20ms │     no change │
│ QQuery 3     │  164.01ms │          169.22ms │     no change │
│ QQuery 4     │  114.47ms │          112.47ms │     no change │
│ QQuery 5     │  377.65ms │          389.09ms │     no change │
│ QQuery 6     │   41.12ms │           39.79ms │     no change │
│ QQuery 7     │  885.44ms │          882.59ms │     no change │
│ QQuery 8     │  237.45ms │          234.45ms │     no change │
│ QQuery 9     │  556.79ms │          556.93ms │     no change │
│ QQuery 10    │  313.01ms │          326.68ms │     no change │
│ QQuery 11    │  158.40ms │          168.29ms │  1.06x slower │
│ QQuery 12    │  171.57ms │          161.69ms │ +1.06x faster │
│ QQuery 13    │  298.77ms │          287.11ms │     no change │
│ QQuery 14    │   49.25ms │           47.53ms │     no change │
│ QQuery 15    │   49.57ms │           50.07ms │     no change │
│ QQuery 16    │  167.13ms │          171.66ms │     no change │
│ QQuery 17    │  950.03ms │          871.42ms │ +1.09x faster │
│ QQuery 18    │ 1576.97ms │         1706.98ms │  1.08x slower │
│ QQuery 19    │  178.84ms │          161.94ms │ +1.10x faster │
│ QQuery 20    │  331.30ms │          293.13ms │ +1.13x faster │
│ QQuery 21    │ 1120.20ms │         1055.25ms │ +1.06x faster │
│ QQuery 22    │   84.74ms │           86.40ms │     no change │
└──────────────┴───────────┴───────────────────┴───────────────┘

Unsafe Row Access: mixed results

--------------------
Benchmark tpch_mem.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ alamb_peephole ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  548.62ms │       519.95ms │ +1.06x faster │
│ QQuery 2     │  154.00ms │       149.26ms │     no change │
│ QQuery 3     │  159.95ms │       164.44ms │     no change │
│ QQuery 4     │  113.52ms │       117.78ms │     no change │
│ QQuery 5     │  374.59ms │       381.38ms │     no change │
│ QQuery 6     │   39.37ms │        43.51ms │  1.11x slower │
│ QQuery 7     │  932.03ms │       947.92ms │     no change │
│ QQuery 8     │  231.40ms │       245.70ms │  1.06x slower │
│ QQuery 9     │  569.26ms │       551.04ms │     no change │
│ QQuery 10    │  325.29ms │       321.81ms │     no change │
│ QQuery 11    │  171.33ms │       156.11ms │ +1.10x faster │
│ QQuery 12    │  167.87ms │       158.86ms │ +1.06x faster │
│ QQuery 13    │  302.12ms │       282.03ms │ +1.07x faster │
│ QQuery 14    │   47.34ms │        47.49ms │     no change │
│ QQuery 15    │   52.53ms │        52.89ms │     no change │
│ QQuery 16    │  153.12ms │       149.93ms │     no change │
│ QQuery 17    │  914.18ms │       977.18ms │  1.07x slower │
│ QQuery 18    │ 1527.32ms │      1612.21ms │  1.06x slower │
│ QQuery 19    │  164.35ms │       152.92ms │ +1.07x faster │
│ QQuery 20    │  308.12ms │       312.37ms │     no change │
│ QQuery 21    │ 1047.68ms │      1080.79ms │     no change │
│ QQuery 22    │   88.36ms │        85.59ms │     no change │
└──────────────┴───────────┴────────────────┴───────────────┘

@tustvold tustvold self-assigned this Jul 18, 2023
@tustvold tustvold changed the title Improve aggregate performance by special casting single group keys Improve aggregate performance by special casing single group keys Jul 18, 2023
tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Jul 19, 2023
tustvold added a commit that referenced this issue Jul 19, 2023
@alamb
Copy link
Contributor Author

alamb commented Jul 24, 2023

@tustvold has specialized group keys for single column primitive types and set the code structure for other specializations. I have filed the idea for special casing string columns in #7064

Thus closing this ticket as complete and we can track follow on work using other tickets

@alamb alamb closed this as completed Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants