Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate performance of decimal math / aggregation #670

Closed
Tracked by #717
andygrove opened this issue Jul 15, 2024 · 9 comments
Closed
Tracked by #717

Investigate performance of decimal math / aggregation #670

andygrove opened this issue Jul 15, 2024 · 9 comments
Assignees
Labels
enhancement New feature or request performance
Milestone

Comments

@andygrove
Copy link
Member

What is the problem the feature request solves?

SQL

select sum(
    ss_wholesale_cost+
    ss_list_price+
    ss_sales_price+
    ss_ext_discount_amt+
    ss_ext_sales_price+
    ss_ext_wholesale_cost+
    ss_ext_list_price+
    ss_ext_tax+
    ss_coupon_amt+
    ss_net_paid+
    ss_net_paid_inc_tax+
    ss_net_profit
)
from store_sales;

Query time in seconds with Comet disabled:

[24.197060108184814, 22.000279426574707, 22.047854900360107, 22.033849954605103, 21.86830496788025]

With Comet enabled:

[32.81851553916931, 30.068572998046875, 31.110345125198364, 29.976839780807495, 29.35631513595581]

I do not see a slow down if I add all the integer columns in the table, so this seems specific to decimal.

Part of the issue here may be the creation of all of the intermediate result vectors.

Describe the potential solution

No response

Additional context

No response

@andygrove andygrove added enhancement New feature or request performance labels Jul 15, 2024
@andygrove andygrove changed the title Investigate performance of decimal math Investigate performance of decimal math / aggregation Jul 16, 2024
@andygrove
Copy link
Member Author

microbenchmark results @ sf=1

AMD Ryzen 9 7950X3D 16-Core Processor
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
add_many_decimals                                   809            872          67          3.6         281.1       1.0X
add_many_decimals                                   770            788          16          3.7         267.5       1.1X
add_many_decimals: Comet (Scan)                     930            952          38          3.1         323.0       0.9X
add_many_decimals: Comet (Scan, Exec)              2021           2030          12          1.4         701.9       0.4X

@andygrove
Copy link
Member Author

Benchmark runs @ sf=100 suggest that reading decimal from parquet could potentially be a performance issue.

TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
add_many_decimals                                 20502          20648         208         14.0          71.2       1.0X
add_many_decimals                                 20498          20544          65         14.1          71.2       1.0X
add_many_decimals: Comet (Scan)                   28143          28161          26         10.2          97.7       0.7X
add_many_decimals: Comet (Scan, Exec)             19323          19497         246         14.9          67.1       1.1X

TPCDS Micro Benchmarks:                           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
--------------------------------------------------------------------------------------------------------------------------------
agg_sum_decimals_no_grouping                              10552          10583          44         27.3          36.6       1.0X
agg_sum_decimals_no_grouping                              10406          10450          61         27.7          36.1       1.0X
agg_sum_decimals_no_grouping: Comet (Scan)                46013          46278         375          6.3         159.8       0.2X
agg_sum_decimals_no_grouping: Comet (Scan, Exec)          13840          13956         164         20.8          48.1       0.8X

@kazuyukitanimura
Copy link
Contributor

kazuyukitanimura commented Jul 20, 2024

33 iterations of sf=1

Pure scan without doing sum(decimals)

Screenshot 2024-07-19 at 11 55 27 PM

Scan with doing Spark sum(decimals)

Screenshot 2024-07-19 at 11 57 10 PM

So somehow once Sum is applied, Comet scan slows down...

@kazuyukitanimura
Copy link
Contributor

kazuyukitanimura commented Jul 20, 2024

I looks like once Comet scan is enabled GangWoker uses more time.

Pure Spark scan + sum

Screenshot 2024-07-20 at 12 13 36 AM

Comet scan + Spark sum

Screenshot 2024-07-20 at 12 13 56 AM

@andygrove
Copy link
Member Author

33 iterations of sf=1

Pure scan without doing sum(decimals)

Screenshot 2024-07-19 at 11 57 10 PM

Scan with doing Spark sum(decimals)

Screenshot 2024-07-19 at 11 57 10 PM

So somehow once Sum is applied, Comet scan slows down...

I think these two screen shots are identical?

@kazuyukitanimura
Copy link
Contributor

kazuyukitanimura commented Jul 20, 2024

I think these two screen shots are identical?

Thanks @andygrove updated to the correct one

kazuyukitanimura added a commit that referenced this issue Jul 24, 2024
## Which issue does this PR close?

Part of #679 and #670
Related #490

## Rationale for this change

For dictionary decimal vectors, it was unpacking even for Int and Long decimals that used more memory than necessary.

## What changes are included in this PR?

Unpack only for Decimal 128

## How are these changes tested?

Existing test
@andygrove andygrove added this to the 0.2.0 milestone Jul 25, 2024
@kazuyukitanimura
Copy link
Contributor

After #741 there are still some issues with if and case


OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
add_many_decimals                                 11255          11385         184         25.6          39.1       1.0X
add_many_decimals: Comet (Scan)                   14372          14545         245         20.0          49.9       0.8X
add_many_decimals: Comet (Scan, Exec)              9846           9933         123         29.3          34.2       1.1X

Running benchmark: TPCDS Micro Benchmarks
  Running case: add_many_integers
  Stopped after 2 iterations, 7870 ms
  Running case: add_many_integers: Comet (Scan)
  Stopped after 2 iterations, 6910 ms
  Running case: add_many_integers: Comet (Scan, Exec)
  Stopped after 2 iterations, 7138 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
add_many_integers                                  3866           3935          98         74.5          13.4       1.0X
add_many_integers: Comet (Scan)                    3450           3455           7         83.5          12.0       1.1X
add_many_integers: Comet (Scan, Exec)              3548           3569          30         81.2          12.3       1.1X

Running benchmark: TPCDS Micro Benchmarks
  Running case: agg_high_cardinality
  Stopped after 2 iterations, 3620 ms
  Running case: agg_high_cardinality: Comet (Scan)
  Stopped after 2 iterations, 5411 ms
  Running case: agg_high_cardinality: Comet (Scan, Exec)
  Stopped after 2 iterations, 2126 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
agg_high_cardinality                               1769           1810          59         40.7          24.6       1.0X
agg_high_cardinality: Comet (Scan)                 2642           2706          90         27.3          36.7       0.7X
agg_high_cardinality: Comet (Scan, Exec)           1060           1063           4         67.9          14.7       1.7X

Running benchmark: TPCDS Micro Benchmarks
  Running case: agg_low_cardinality
  Stopped after 5 iterations, 2032 ms
  Running case: agg_low_cardinality: Comet (Scan)
  Stopped after 3 iterations, 2179 ms
  Running case: agg_low_cardinality: Comet (Scan, Exec)
  Stopped after 7 iterations, 2089 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
agg_low_cardinality                                 371            406          34        194.2           5.1       1.0X
agg_low_cardinality: Comet (Scan)                   717            727          13        100.4          10.0       0.5X
agg_low_cardinality: Comet (Scan, Exec)             278            298          19        258.7           3.9       1.3X

Running benchmark: TPCDS Micro Benchmarks
  Running case: agg_sum_decimals_no_grouping
  Stopped after 2 iterations, 14633 ms
  Running case: agg_sum_decimals_no_grouping: Comet (Scan)
  Stopped after 2 iterations, 78280 ms
  Running case: agg_sum_decimals_no_grouping: Comet (Scan, Exec)
  Stopped after 2 iterations, 16892 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
--------------------------------------------------------------------------------------------------------------------------------
agg_sum_decimals_no_grouping                               7251           7317          93         39.7          25.2       1.0X
agg_sum_decimals_no_grouping: Comet (Scan)                38946          39140         274          7.4         135.2       0.2X
agg_sum_decimals_no_grouping: Comet (Scan, Exec)           8407           8446          55         34.3          29.2       0.9X

Running benchmark: TPCDS Micro Benchmarks
  Running case: agg_sum_integers_no_grouping
  Stopped after 2 iterations, 8131 ms
  Running case: agg_sum_integers_no_grouping: Comet (Scan)
  Stopped after 2 iterations, 8375 ms
  Running case: agg_sum_integers_no_grouping: Comet (Scan, Exec)
  Stopped after 2 iterations, 9179 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
--------------------------------------------------------------------------------------------------------------------------------
agg_sum_integers_no_grouping                               3914           4066         215         73.6          13.6       1.0X
agg_sum_integers_no_grouping: Comet (Scan)                 4079           4188         154         70.6          14.2       1.0X
agg_sum_integers_no_grouping: Comet (Scan, Exec)           4557           4590          47         63.2          15.8       0.9X

Running benchmark: TPCDS Micro Benchmarks
  Running case: case_when_column_or_null
  Stopped after 2 iterations, 3114 ms
  Running case: case_when_column_or_null: Comet (Scan)
  Stopped after 2 iterations, 5693 ms
  Running case: case_when_column_or_null: Comet (Scan, Exec)
  Stopped after 2 iterations, 3415 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                       Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
----------------------------------------------------------------------------------------------------------------------------
case_when_column_or_null                               1440           1557         166        200.1           5.0       1.0X
case_when_column_or_null: Comet (Scan)                 2832           2847          20        101.7           9.8       0.5X
case_when_column_or_null: Comet (Scan, Exec)           1691           1708          23        170.3           5.9       0.9X

Running benchmark: TPCDS Micro Benchmarks
  Running case: case_when_scalar
  Stopped after 9 iterations, 2090 ms
  Running case: case_when_scalar: Comet (Scan)
  Stopped after 2 iterations, 2312 ms
  Running case: case_when_scalar: Comet (Scan, Exec)
  Stopped after 5 iterations, 2048 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
case_when_scalar                                    196            232          51        367.0           2.7       1.0X
case_when_scalar: Comet (Scan)                     1133           1156          33         63.5          15.7       0.2X
case_when_scalar: Comet (Scan, Exec)                376            410          38        191.5           5.2       0.5X

Running benchmark: TPCDS Micro Benchmarks
  Running case: filter_highly_selective
  Stopped after 11 iterations, 2132 ms
  Running case: filter_highly_selective: Comet (Scan)
  Stopped after 3 iterations, 2233 ms
  Running case: filter_highly_selective: Comet (Scan, Exec)
  Stopped after 10 iterations, 2054 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
---------------------------------------------------------------------------------------------------------------------------
filter_highly_selective                                154            194          85        466.1           2.1       1.0X
filter_highly_selective: Comet (Scan)                  734            745           9         98.0          10.2       0.2X
filter_highly_selective: Comet (Scan, Exec)            156            205         108        462.3           2.2       1.0X

Running benchmark: TPCDS Micro Benchmarks
  Running case: filter_less_selective
  Stopped after 7 iterations, 2091 ms
  Running case: filter_less_selective: Comet (Scan)
  Stopped after 3 iterations, 2093 ms
  Running case: filter_less_selective: Comet (Scan, Exec)
  Stopped after 10 iterations, 2272 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------------------------------
filter_less_selective                                165            299         165        435.4           2.3       1.0X
filter_less_selective: Comet (Scan)                  691            698          11        104.3           9.6       0.2X
filter_less_selective: Comet (Scan, Exec)            193            227          39        373.7           2.7       0.9X

Running benchmark: TPCDS Micro Benchmarks
  Running case: if_column_or_null
  Stopped after 2 iterations, 2630 ms
  Running case: if_column_or_null: Comet (Scan)
  Stopped after 2 iterations, 3102 ms
  Running case: if_column_or_null: Comet (Scan, Exec)
  Stopped after 2 iterations, 5368 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
if_column_or_null                                  1315           1315           0        219.1           4.6       1.0X
if_column_or_null: Comet (Scan)                    1518           1551          46        189.8           5.3       0.9X
if_column_or_null: Comet (Scan, Exec)              2606           2684         111        110.5           9.0       0.5X

Running benchmark: TPCDS Micro Benchmarks
  Running case: join_anti
  Stopped after 2 iterations, 12404 ms
  Running case: join_anti: Comet (Scan)
  Stopped after 2 iterations, 12023 ms
  Running case: join_anti: Comet (Scan, Exec)
[528.105s][warning][gc,alloc] Executor task launch worker for task 2.0 in stage 802.0 (TID 14773): Retried waiting for GCLocker too often allocating 134217730 words
  Stopped after 2 iterations, 11999 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
join_anti                                          6128           6202         104         11.7          85.1       1.0X
join_anti: Comet (Scan)                            5774           6012         335         12.5          80.2       1.1X
join_anti: Comet (Scan, Exec)                      5971           6000          41         12.1          82.9       1.0X

Running benchmark: TPCDS Micro Benchmarks
  Running case: join_condition
  Stopped after 2 iterations, 4255 ms
  Running case: join_condition: Comet (Scan)
  Stopped after 2 iterations, 3245 ms
  Running case: join_condition: Comet (Scan, Exec)
  Stopped after 2 iterations, 3721 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
join_condition                                     2055           2128         103        264.4           3.8       1.0X
join_condition: Comet (Scan)                       1534           1623         125        354.2           2.8       1.3X
join_condition: Comet (Scan, Exec)                 1808           1861          75        300.6           3.3       1.1X

Running benchmark: TPCDS Micro Benchmarks
  Running case: join_exploding_output
  Stopped after 2 iterations, 3425 ms
  Running case: join_exploding_output: Comet (Scan)
  Stopped after 2 iterations, 3004 ms
  Running case: join_exploding_output: Comet (Scan, Exec)
  Stopped after 2 iterations, 3495 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------------------------------
join_exploding_output                               1603           1713         155        339.0           2.9       1.0X
join_exploding_output: Comet (Scan)                 1413           1502         126        384.5           2.6       1.1X
join_exploding_output: Comet (Scan, Exec)           1722           1748          37        315.6           3.2       0.9X

Running benchmark: TPCDS Micro Benchmarks
  Running case: join_inner
  Stopped after 4 iterations, 2194 ms
  Running case: join_inner: Comet (Scan)
  Stopped after 5 iterations, 2324 ms
  Running case: join_inner: Comet (Scan, Exec)
  Stopped after 3 iterations, 2110 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
join_inner                                          512            549          36        562.2           1.8       1.0X
join_inner: Comet (Scan)                            462            465           3        623.9           1.6       1.1X
join_inner: Comet (Scan, Exec)                      694            703          10        415.1           2.4       0.7X

Running benchmark: TPCDS Micro Benchmarks
  Running case: join_left_outer
  Stopped after 2 iterations, 192095 ms
  Running case: join_left_outer: Comet (Scan)
  Stopped after 2 iterations, 192225 ms
  Running case: join_left_outer: Comet (Scan, Exec)
  Stopped after 2 iterations, 191166 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
join_left_outer                                   95996          96048          73          3.3         303.0       1.0X
join_left_outer: Comet (Scan)                     95378          96113        1038          3.3         301.1       1.0X
join_left_outer: Comet (Scan, Exec)               95377          95583         292          3.3         301.1       1.0X

Running benchmark: TPCDS Micro Benchmarks
  Running case: join_semi
[1448.983s][warning][gc,alloc] Executor task launch worker for task 0.0 in stage 1153.0 (TID 20715): Retried waiting for GCLocker too often allocating 134217730 words
[1463.584s][warning][gc,alloc] Executor task launch worker for task 2.0 in stage 1159.0 (TID 20772): Retried waiting for GCLocker too often allocating 134217730 words
[1476.740s][warning][gc,alloc] Executor task launch worker for task 0.0 in stage 1165.0 (TID 20825): Retried waiting for GCLocker too often allocating 134217730 words
[1476.754s][warning][gc,alloc] Executor task launch worker for task 1.0 in stage 1165.0 (TID 20826): Retried waiting for GCLocker too often allocating 134217730 words
  Stopped after 2 iterations, 26371 ms
  Running case: join_semi: Comet (Scan)
[1490.216s][warning][gc,alloc] Executor task launch worker for task 2.0 in stage 1171.0 (TID 20882): Retried waiting for GCLocker too often allocating 134217730 words
[1515.531s][warning][gc,alloc] Executor task launch worker for task 1.0 in stage 1183.0 (TID 20991): Retried waiting for GCLocker too often allocating 134217730 words
  Stopped after 2 iterations, 25521 ms
  Running case: join_semi: Comet (Scan, Exec)
[1541.053s][warning][gc,alloc] Executor task launch worker for task 2.0 in stage 1195.0 (TID 21102): Retried waiting for GCLocker too often allocating 134217730 words
  Stopped after 2 iterations, 25372 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
join_semi                                         13164          13186          30          5.5         182.8       1.0X
join_semi: Comet (Scan)                           12522          12761         338          5.8         173.9       1.1X
join_semi: Comet (Scan, Exec)                     12276          12686         580          5.9         170.5       1.1X

@kazuyukitanimura
Copy link
Contributor

After merging with latest main

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
add_many_decimals                                 11828          12330         709         24.3          41.1       1.0X
add_many_decimals: Comet (Scan)                   15061          15181         171         19.1          52.3       0.8X
add_many_decimals: Comet (Scan, Exec)             12400          12885         686         23.2          43.1       1.0X

Running benchmark: TPCDS Micro Benchmarks
  Running case: add_many_integers
  Stopped after 2 iterations, 7415 ms
  Running case: add_many_integers: Comet (Scan)
  Stopped after 2 iterations, 6969 ms
  Running case: add_many_integers: Comet (Scan, Exec)
  Stopped after 2 iterations, 7371 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
add_many_integers                                  3590           3708         166         80.2          12.5       1.0X
add_many_integers: Comet (Scan)                    3467           3485          24         83.1          12.0       1.0X
add_many_integers: Comet (Scan, Exec)              3659           3686          38         78.7          12.7       1.0X

Running benchmark: TPCDS Micro Benchmarks
  Running case: agg_high_cardinality
  Stopped after 2 iterations, 3604 ms
  Running case: agg_high_cardinality: Comet (Scan)
  Stopped after 2 iterations, 5400 ms
  Running case: agg_high_cardinality: Comet (Scan, Exec)
  Stopped after 2 iterations, 2131 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
agg_high_cardinality                               1793           1802          13         40.2          24.9       1.0X
agg_high_cardinality: Comet (Scan)                 2683           2700          24         26.8          37.3       0.7X
agg_high_cardinality: Comet (Scan, Exec)           1056           1066          14         68.2          14.7       1.7X

Running benchmark: TPCDS Micro Benchmarks
  Running case: agg_low_cardinality
  Stopped after 6 iterations, 2078 ms
  Running case: agg_low_cardinality: Comet (Scan)
  Stopped after 3 iterations, 2226 ms
  Running case: agg_low_cardinality: Comet (Scan, Exec)
  Stopped after 8 iterations, 2066 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
agg_low_cardinality                                 333            346          10        216.5           4.6       1.0X
agg_low_cardinality: Comet (Scan)                   730            742          14         98.7          10.1       0.5X
agg_low_cardinality: Comet (Scan, Exec)             252            258           7        286.1           3.5       1.3X

Running benchmark: TPCDS Micro Benchmarks
  Running case: agg_sum_decimals_no_grouping
  Stopped after 2 iterations, 15998 ms
  Running case: agg_sum_decimals_no_grouping: Comet (Scan)
  Stopped after 2 iterations, 87493 ms
  Running case: agg_sum_decimals_no_grouping: Comet (Scan, Exec)
  Stopped after 2 iterations, 16900 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
--------------------------------------------------------------------------------------------------------------------------------
agg_sum_decimals_no_grouping                               7828           7999         243         36.8          27.2       1.0X
agg_sum_decimals_no_grouping: Comet (Scan)                42539          43747        1708          6.8         147.7       0.2X
agg_sum_decimals_no_grouping: Comet (Scan, Exec)           8441           8450          12         34.1          29.3       0.9X

Running benchmark: TPCDS Micro Benchmarks
  Running case: agg_sum_integers_no_grouping
  Stopped after 2 iterations, 7564 ms
  Running case: agg_sum_integers_no_grouping: Comet (Scan)
  Stopped after 2 iterations, 8300 ms
  Running case: agg_sum_integers_no_grouping: Comet (Scan, Exec)
  Stopped after 2 iterations, 9181 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
--------------------------------------------------------------------------------------------------------------------------------
agg_sum_integers_no_grouping                               3761           3782          30         76.6          13.1       1.0X
agg_sum_integers_no_grouping: Comet (Scan)                 4125           4150          36         69.8          14.3       0.9X
agg_sum_integers_no_grouping: Comet (Scan, Exec)           4523           4591          96         63.7          15.7       0.8X

Running benchmark: TPCDS Micro Benchmarks
  Running case: case_when_column_or_null
  Stopped after 2 iterations, 2203 ms
  Running case: case_when_column_or_null: Comet (Scan)
  Stopped after 2 iterations, 5703 ms
  Running case: case_when_column_or_null: Comet (Scan, Exec)
  Stopped after 2 iterations, 3428 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                       Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
----------------------------------------------------------------------------------------------------------------------------
case_when_column_or_null                               1099           1102           4        262.1           3.8       1.0X
case_when_column_or_null: Comet (Scan)                 2827           2852          35        101.9           9.8       0.4X
case_when_column_or_null: Comet (Scan, Exec)           1681           1714          47        171.3           5.8       0.7X

Running benchmark: TPCDS Micro Benchmarks
  Running case: case_when_scalar
  Stopped after 10 iterations, 2146 ms
  Running case: case_when_scalar: Comet (Scan)
  Stopped after 2 iterations, 2433 ms
  Running case: case_when_scalar: Comet (Scan, Exec)
  Stopped after 5 iterations, 2019 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
case_when_scalar                                    193            215          32        373.6           2.7       1.0X
case_when_scalar: Comet (Scan)                     1210           1217           9         59.5          16.8       0.2X
case_when_scalar: Comet (Scan, Exec)                378            404          44        190.4           5.3       0.5X

Running benchmark: TPCDS Micro Benchmarks
  Running case: filter_highly_selective
  Stopped after 13 iterations, 2221 ms
  Running case: filter_highly_selective: Comet (Scan)
  Stopped after 3 iterations, 2297 ms
  Running case: filter_highly_selective: Comet (Scan, Exec)
  Stopped after 11 iterations, 2084 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
---------------------------------------------------------------------------------------------------------------------------
filter_highly_selective                                145            171          32        496.7           2.0       1.0X
filter_highly_selective: Comet (Scan)                  762            766           4         94.5          10.6       0.2X
filter_highly_selective: Comet (Scan, Exec)            171            190          18        420.4           2.4       0.8X

Running benchmark: TPCDS Micro Benchmarks
  Running case: filter_less_selective
  Stopped after 10 iterations, 2011 ms
  Running case: filter_less_selective: Comet (Scan)
  Stopped after 3 iterations, 2121 ms
  Running case: filter_less_selective: Comet (Scan, Exec)
  Stopped after 11 iterations, 2159 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------------------------------
filter_less_selective                                170            201          42        422.4           2.4       1.0X
filter_less_selective: Comet (Scan)                  694            707          12        103.8           9.6       0.2X
filter_less_selective: Comet (Scan, Exec)            176            196          20        409.9           2.4       1.0X

Running benchmark: TPCDS Micro Benchmarks
  Running case: if_column_or_null
  Stopped after 2 iterations, 2282 ms
  Running case: if_column_or_null: Comet (Scan)
  Stopped after 2 iterations, 3123 ms
  Running case: if_column_or_null: Comet (Scan, Exec)
  Stopped after 2 iterations, 3524 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
if_column_or_null                                  1136           1141           7        253.6           3.9       1.0X
if_column_or_null: Comet (Scan)                    1473           1562         126        195.6           5.1       0.8X
if_column_or_null: Comet (Scan, Exec)              1751           1762          17        164.6           6.1       0.6X

Running benchmark: TPCDS Micro Benchmarks
  Running case: join_anti
  Stopped after 2 iterations, 12717 ms
  Running case: join_anti: Comet (Scan)
  Stopped after 2 iterations, 11808 ms
  Running case: join_anti: Comet (Scan, Exec)
  Stopped after 2 iterations, 11755 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
join_anti                                          6230           6359         183         11.6          86.5       1.0X
join_anti: Comet (Scan)                            5789           5904         163         12.4          80.4       1.1X
join_anti: Comet (Scan, Exec)                      5720           5878         222         12.6          79.4       1.1X

Running benchmark: TPCDS Micro Benchmarks
  Running case: join_condition
  Stopped after 2 iterations, 3323 ms
  Running case: join_condition: Comet (Scan)
  Stopped after 2 iterations, 3201 ms
  Running case: join_condition: Comet (Scan, Exec)
  Stopped after 2 iterations, 3376 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
join_condition                                     1568           1662         133        346.6           2.9       1.0X
join_condition: Comet (Scan)                       1479           1601         172        367.4           2.7       1.1X
join_condition: Comet (Scan, Exec)                 1688           1688           1        321.9           3.1       0.9X

Running benchmark: TPCDS Micro Benchmarks
  Running case: join_exploding_output
  Stopped after 2 iterations, 2796 ms
  Running case: join_exploding_output: Comet (Scan)
  Stopped after 2 iterations, 2732 ms
  Running case: join_exploding_output: Comet (Scan, Exec)
  Stopped after 2 iterations, 3220 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------------------------------
join_exploding_output                               1395           1398           5        389.5           2.6       1.0X
join_exploding_output: Comet (Scan)                 1352           1366          20        401.8           2.5       1.0X
join_exploding_output: Comet (Scan, Exec)           1608           1610           4        338.0           3.0       0.9X

Running benchmark: TPCDS Micro Benchmarks
  Running case: join_inner
  Stopped after 4 iterations, 2045 ms
  Running case: join_inner: Comet (Scan)
  Stopped after 5 iterations, 2359 ms
  Running case: join_inner: Comet (Scan, Exec)
  Stopped after 3 iterations, 2083 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
join_inner                                          497            511          12        579.1           1.7       1.0X
join_inner: Comet (Scan)                            460            472           8        626.9           1.6       1.1X
join_inner: Comet (Scan, Exec)                      687            694           8        419.2           2.4       0.7X

Running benchmark: TPCDS Micro Benchmarks
  Running case: join_left_outer
  Stopped after 2 iterations, 193508 ms
  Running case: join_left_outer: Comet (Scan)
[1025.721s][warning][gc,alloc] Executor task launch worker for task 1.0 in stage 1178.0 (TID 21368): Retried waiting for GCLocker too often allocating 134217730 words
  Stopped after 2 iterations, 192356 ms
  Running case: join_left_outer: Comet (Scan, Exec)
  Stopped after 2 iterations, 191410 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
join_left_outer                                   96613          96754         200          3.3         305.0       1.0X
join_left_outer: Comet (Scan)                     95176          96178        1418          3.3         300.4       1.0X
join_left_outer: Comet (Scan, Exec)               94835          95705        1231          3.3         299.3       1.0X

Running benchmark: TPCDS Micro Benchmarks
  Running case: join_semi
[1485.285s][warning][gc,alloc] Executor task launch worker for task 3.0 in stage 1221.0 (TID 21917): Retried waiting for GCLocker too often allocating 134217730 words
[1510.652s][warning][gc,alloc] Executor task launch worker for task 3.0 in stage 1233.0 (TID 22027): Retried waiting for GCLocker too often allocating 134217730 words
  Stopped after 2 iterations, 25421 ms
  Running case: join_semi: Comet (Scan)
[1520.632s][warning][gc,alloc] Executor task launch worker for task 3.0 in stage 1239.0 (TID 22082): Retried waiting for GCLocker too often allocating 134217730 words
[1520.632s][warning][gc,alloc] Executor task launch worker for task 1.0 in stage 1239.0 (TID 22080): Retried waiting for GCLocker too often allocating 134217730 words
[1547.750s][warning][gc,alloc] Executor task launch worker for task 3.0 in stage 1251.0 (TID 22192): Retried waiting for GCLocker too often allocating 134217730 words
  Stopped after 2 iterations, 26686 ms
  Running case: join_semi: Comet (Scan, Exec)
[1573.897s][warning][gc,alloc] Executor task launch worker for task 2.0 in stage 1263.0 (TID 22301): Retried waiting for GCLocker too often allocating 134217730 words
  Stopped after 2 iterations, 25646 ms

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.5
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
join_semi                                         11986          12711        1025          6.0         166.5       1.0X
join_semi: Comet (Scan)                           12957          13343         546          5.6         180.0       0.9X
join_semi: Comet (Scan, Exec)                     12694          12823         183          5.7         176.3       0.9X

kazuyukitanimura added a commit that referenced this issue Aug 1, 2024
…#741)

## Which issue does this PR close?

Part of #670

## Rationale for this change

This PR improves the native execution performance on decimals with a small precision

## What changes are included in this PR?

This PR changes not to promote decimal128 to decimal256 if the precisions are small enough

## How are these changes tested?

Existing tests
kazuyukitanimura added a commit that referenced this issue Aug 2, 2024
## Which issue does this PR close?

Part of #679 and #670

## Rationale for this change

The improvement could be negligible in real use cases, but I see some improvements in micro benchmarks 

## What changes are included in this PR?

Optimizations in some bit functions

## How are these changes tested?

Existing tests
@kazuyukitanimura
Copy link
Contributor

We created many fixes. I think decimals are no longer issues. closing for now

himadripal pushed a commit to himadripal/datafusion-comet that referenced this issue Sep 7, 2024
## Which issue does this PR close?

Part of apache#679 and apache#670
Related apache#490

## Rationale for this change

For dictionary decimal vectors, it was unpacking even for Int and Long decimals that used more memory than necessary.

## What changes are included in this PR?

Unpack only for Decimal 128

## How are these changes tested?

Existing test

(cherry picked from commit c1b7c7d)
himadripal pushed a commit to himadripal/datafusion-comet that referenced this issue Sep 7, 2024
…apache#741)

## Which issue does this PR close?

Part of apache#670

## Rationale for this change

This PR improves the native execution performance on decimals with a small precision

## What changes are included in this PR?

This PR changes not to promote decimal128 to decimal256 if the precisions are small enough

## How are these changes tested?

Existing tests

(cherry picked from commit 25957dd)
himadripal pushed a commit to himadripal/datafusion-comet that referenced this issue Sep 7, 2024
## Which issue does this PR close?

Part of apache#679 and apache#670

## Rationale for this change

The improvement could be negligible in real use cases, but I see some improvements in micro benchmarks

## What changes are included in this PR?

Optimizations in some bit functions

## How are these changes tested?

Existing tests

(cherry picked from commit ffb96c3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
Development

No branches or pull requests

2 participants