[SPARK-42483][TESTS] Regenerate benchmark results #40072

dongjoon-hyun · 2023-02-17T22:53:33Z

What changes were proposed in this pull request?

This aims to regenerate benchmark results on master branch as a baseline for Spark 3.5.0 and a way to comparing Apache Spark 3.4.0 branch.

Why are the changes needed?

These are reference values with minor changes.

- OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Linux 5.15.0-1023-azure
+ OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1031-azure

- OpenJDK 64-Bit Server VM 11.0.17+8 on Linux 5.15.0-1023-azure
+ OpenJDK 64-Bit Server VM 11.0.18+10 on Linux 5.15.0-1031-azure

- OpenJDK 64-Bit Server VM 17.0.5+8 on Linux 5.15.0-1023-azure
+ OpenJDK 64-Bit Server VM 17.0.6+10 on Linux 5.15.0-1031-azure

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manual review.

dongjoon-hyun · 2023-02-18T00:20:00Z

core/benchmarks/ZStandardBenchmark-jdk11-results.txt

+Compression 10000 times at level 1 without buffer pool            605            812         220          0.0       60521.0       1.0X
+Compression 10000 times at level 2 without buffer pool            665            678          20          0.0       66512.5       0.9X
+Compression 10000 times at level 3 without buffer pool            890            903          20          0.0       88961.3       0.7X
+Compression 10000 times at level 1 with buffer pool               829            839          11          0.0       82940.2       0.7X


I'll take a look at this after this PR.

Java 8/17 doesn't have this regression.

dongjoon-hyun · 2023-02-18T00:24:05Z

sql/catalyst/benchmarks/EnumTypeSetBenchmark-jdk11-results.txt

-Use HashSet                                           4              4           0        226.9           4.4       1.0X
-Use EnumSet                                           1              1           0        737.3           1.4       3.2X
+Use HashSet                                           0              1           0       2440.2           0.4       1.0X
+Use EnumSet                                           1              1           0        884.8           1.1       0.4X


We need to investigate this reversed ratio.

HashSet seems to get some improvements in this case, contains use empty Set:. The other cases looks in a reasonable range.

dongjoon-hyun · 2023-02-18T00:36:12Z

sql/catalyst/benchmarks/EnumTypeSetBenchmark-results.txt

-Use HashSet                                           5              5           0        209.4           4.8       1.0X
-Use EnumSet                                           2              2           0        459.8           2.2       2.2X
+Use HashSet                                           1              1           1       1972.0           0.5       1.0X
+Use EnumSet                                           2              2           0        444.0           2.3       0.2X


dongjoon-hyun · 2023-02-18T00:38:05Z

sql/catalyst/benchmarks/HashBenchmark-jdk11-results.txt

+interpreted version                                4933           4935           2        108.8           9.2       1.0X
+codegen version                                    5135           5141           9        104.6           9.6       1.0X
+codegen version 64-bit                             5071           5079          10        105.9           9.4       1.0X
+codegen HiveHash version                           4326           4326           0        124.1           8.1       1.1X


Now, this is the fastest.

dongjoon-hyun · 2023-02-18T01:05:09Z

sql/core/benchmarks/UpdateFieldsBenchmark-results.txt

-To non-nullable StructTypes using performant method                             5520           5639         168          0.0      Infinity       1.0X
-To nullable StructTypes using performant method                                 2657           2708          72          0.0      Infinity       2.1X
+To non-nullable StructTypes using performant method                             3126           3150          34          0.0      Infinity       1.0X
+To nullable StructTypes using performant method                                 3136           4768        2309          0.0      Infinity       1.0X


This looks like a regression in Java 8. We need to take a look at this later.

dongjoon-hyun · 2023-02-18T01:13:20Z

sql/core/benchmarks/TPCDSQueryBenchmark-jdk11-results.txt

 Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
 TPCDS Snappy:                             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-q3                                                  718            759          41          4.1         241.8       1.0X
+q3                                                  996           1035          55          3.0         335.3       1.0X


Maybe, slower?

dongjoon-hyun · 2023-02-18T01:22:36Z

sql/core/benchmarks/SortBenchmark-jdk17-results.txt

+radix sort one byte                                 197            197           0        127.0           7.9      61.5X
+radix sort two bytes                                371            372           0         67.4          14.8      32.6X
+radix sort eight bytes                             1391           1397           8         18.0          55.7       8.7X
+radix sort key prefix array                        1914           1951          52         13.1          76.6       6.3X


In this benchmark, all Java 17 results are faster than Java 8.

dongjoon-hyun · 2023-02-18T02:07:16Z

sql/core/benchmarks/DataSourceReadBenchmark-results.txt

-SQL ORC MR                                         1654           1661           9          9.5         105.2       6.3X
-
-OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Linux 5.15.0-1023-azure
+SQL CSV                                           13143          13363         311          1.2         835.6       1.0X


CSV seems to become 30% slower.

Hmm, it's significant.

dongjoon-hyun · 2023-02-18T05:10:46Z

When you have some time, could you review this, @viirya ? I want to merge this to proceed the further investigations.

dongjoon-hyun · 2023-02-18T06:51:22Z

Thank you so much always for your help, @viirya !
Merged to master for Apache Spark 3.5.

[SPARK-42483][TESTS] Regenerate benchmark results

5e96c3e

github-actions bot added AVRO CORE MLLIB SQL labels Feb 17, 2023

dongjoon-hyun commented Feb 18, 2023

View reviewed changes

viirya approved these changes Feb 18, 2023

View reviewed changes

dongjoon-hyun closed this in 2552a79 Feb 18, 2023

dongjoon-hyun deleted the SPARK-42483 branch February 18, 2023 06:52

[SPARK-42483][TESTS] Regenerate benchmark results #40072

[SPARK-42483][TESTS] Regenerate benchmark results #40072

Uh oh!

Conversation

dongjoon-hyun commented Feb 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Feb 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Feb 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Feb 18, 2023

Uh oh!

dongjoon-hyun commented Feb 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dongjoon-hyun commented Feb 17, 2023 •

edited

Loading

dongjoon-hyun Feb 18, 2023 •

edited

Loading

dongjoon-hyun Feb 18, 2023 •

edited

Loading