From 0f82794c67749c4dad65d5672a853d04096a9785 Mon Sep 17 00:00:00 2001 From: Peter Toth Date: Sun, 7 Oct 2018 17:01:05 +0200 Subject: [PATCH 1/5] [SPARK-25662][TEST] Refactor DataSourceReadBenchmark to use main method Change-Id: Icfd0484c8e0fef2ed0b184e09e52db9432e0a250 --- .../DataSourceReadBenchmark-results.txt | 290 +++++++++++++++++ .../benchmark/DataSourceReadBenchmark.scala | 299 +++--------------- 2 files changed, 336 insertions(+), 253 deletions(-) create mode 100644 sql/core/benchmarks/DataSourceReadBenchmark-results.txt diff --git a/sql/core/benchmarks/DataSourceReadBenchmark-results.txt b/sql/core/benchmarks/DataSourceReadBenchmark-results.txt new file mode 100644 index 000000000000..e4b83f5c4ebd --- /dev/null +++ b/sql/core/benchmarks/DataSourceReadBenchmark-results.txt @@ -0,0 +1,290 @@ +================================================================================================ +Single column scan +================================================================================================ + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz + +SQL Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +SQL CSV 18420 / 18627 0.9 1171.1 1.0X +SQL Json 7195 / 7199 2.2 457.4 2.6X +SQL Parquet Vectorized 118 / 125 133.8 7.5 156.7X +SQL Parquet MR 1607 / 1624 9.8 102.1 11.5X +SQL ORC Vectorized 180 / 205 87.2 11.5 102.1X +SQL ORC Vectorized with copy 219 / 266 71.8 13.9 84.0X +SQL ORC MR 1251 / 1263 12.6 79.5 14.7X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz + +Parquet Reader Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +ParquetReader Vectorized 164 / 180 96.1 10.4 1.0X +ParquetReader Vectorized -> Row 90 / 92 174.8 5.7 1.8X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz + +SQL Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +SQL CSV 18303 / 20552 0.9 1163.7 1.0X +SQL Json 7744 / 7917 2.0 492.3 2.4X +SQL Parquet Vectorized 144 / 168 109.2 9.2 127.1X +SQL Parquet MR 1653 / 1773 9.5 105.1 11.1X +SQL ORC Vectorized 168 / 177 93.5 10.7 108.8X +SQL ORC Vectorized with copy 256 / 334 61.4 16.3 71.4X +SQL ORC MR 1531 / 1574 10.3 97.3 12.0X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz + +Parquet Reader Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +ParquetReader Vectorized 218 / 249 72.3 13.8 1.0X +ParquetReader Vectorized -> Row 172 / 182 91.4 10.9 1.3X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz + +SQL Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +SQL CSV 21818 / 22203 0.7 1387.2 1.0X +SQL Json 7667 / 7706 2.1 487.5 2.8X +SQL Parquet Vectorized 121 / 140 129.6 7.7 179.8X +SQL Parquet MR 1802 / 1959 8.7 114.6 12.1X +SQL ORC Vectorized 223 / 242 70.4 14.2 97.7X +SQL ORC Vectorized with copy 224 / 234 70.2 14.2 97.4X +SQL ORC MR 1389 / 1492 11.3 88.3 15.7X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz + +Parquet Reader Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +ParquetReader Vectorized 209 / 236 75.2 13.3 1.0X +ParquetReader Vectorized -> Row 195 / 206 80.5 12.4 1.1X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz + +SQL Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +SQL CSV 24302 / 25372 0.6 1545.1 1.0X +SQL Json 10114 / 10220 1.6 643.0 2.4X +SQL Parquet Vectorized 192 / 199 82.0 12.2 126.7X +SQL Parquet MR 1950 / 1975 8.1 124.0 12.5X +SQL ORC Vectorized 277 / 284 56.8 17.6 87.8X +SQL ORC Vectorized with copy 281 / 288 55.9 17.9 86.4X +SQL ORC MR 1415 / 1444 11.1 90.0 17.2X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz + +Parquet Reader Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +ParquetReader Vectorized 276 / 310 57.0 17.5 1.0X +ParquetReader Vectorized -> Row 262 / 271 60.1 16.6 1.1X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz + +SQL Single FLOAT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +SQL CSV 20107 / 20228 0.8 1278.3 1.0X +SQL Json 9748 / 9917 1.6 619.8 2.1X +SQL Parquet Vectorized 117 / 122 134.8 7.4 172.3X +SQL Parquet MR 1745 / 1757 9.0 110.9 11.5X +SQL ORC Vectorized 308 / 345 51.1 19.6 65.4X +SQL ORC Vectorized with copy 317 / 345 49.7 20.1 63.5X +SQL ORC MR 1437 / 1449 10.9 91.4 14.0X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz + +Parquet Reader Single FLOAT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +ParquetReader Vectorized 192 / 224 81.9 12.2 1.0X +ParquetReader Vectorized -> Row 186 / 206 84.7 11.8 1.0X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz + +SQL Single DOUBLE Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +SQL CSV 24884 / 24896 0.6 1582.1 1.0X +SQL Json 13202 / 13262 1.2 839.4 1.9X +SQL Parquet Vectorized 191 / 201 82.2 12.2 130.1X +SQL Parquet MR 1908 / 1951 8.2 121.3 13.0X +SQL ORC Vectorized 378 / 394 41.6 24.0 65.9X +SQL ORC Vectorized with copy 396 / 402 39.7 25.2 62.9X +SQL ORC MR 1704 / 1709 9.2 108.3 14.6X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz + +Parquet Reader Single DOUBLE Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +ParquetReader Vectorized 273 / 339 57.7 17.3 1.0X +ParquetReader Vectorized -> Row 259 / 275 60.7 16.5 1.1X + + +================================================================================================ +Int and String scan +================================================================================================ + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz + +Int and String Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +SQL CSV 17383 / 17428 0.6 1657.8 1.0X +SQL Json 9170 / 9249 1.1 874.6 1.9X +SQL Parquet Vectorized 1826 / 1853 5.7 174.2 9.5X +SQL Parquet MR 3773 / 3881 2.8 359.8 4.6X +SQL ORC Vectorized 1975 / 2111 5.3 188.4 8.8X +SQL ORC Vectorized with copy 2050 / 2122 5.1 195.5 8.5X +SQL ORC MR 3521 / 3617 3.0 335.8 4.9X + + +================================================================================================ +Repeated String scan +================================================================================================ + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz + +Repeated String: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +SQL CSV 9976 / 10083 1.1 951.4 1.0X +SQL Json 5550 / 5560 1.9 529.3 1.8X +SQL Parquet Vectorized 609 / 626 17.2 58.0 16.4X +SQL Parquet MR 1435 / 1490 7.3 136.8 7.0X +SQL ORC Vectorized 377 / 391 27.8 35.9 26.5X +SQL ORC Vectorized with copy 564 / 593 18.6 53.8 17.7X +SQL ORC MR 1646 / 1654 6.4 157.0 6.1X + + +================================================================================================ +Partitioned Table scan +================================================================================================ + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz + +Partitioned Table: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +Data column - CSV 24069 / 24070 0.7 1530.3 1.0X +Data column - Json 9732 / 9879 1.6 618.7 2.5X +Data column - Parquet Vectorized 188 / 207 83.5 12.0 127.8X +Data column - Parquet MR 2797 / 2818 5.6 177.8 8.6X +Data column - ORC Vectorized 282 / 300 55.7 17.9 85.3X +Data column - ORC Vectorized with copy 281 / 295 56.0 17.9 85.7X +Data column - ORC MR 1954 / 1958 8.1 124.2 12.3X +Partition column - CSV 5538 / 5575 2.8 352.1 4.3X +Partition column - Json 3919 / 3972 4.0 249.2 6.1X +Partition column - Parquet Vectorized 49 / 57 318.2 3.1 486.9X +Partition column - Parquet MR 1411 / 1415 11.1 89.7 17.1X +Partition column - ORC Vectorized 50 / 65 311.8 3.2 477.1X +Partition column - ORC Vectorized with copy 50 / 60 315.0 3.2 482.0X +Partition column - ORC MR 1305 / 1318 12.1 83.0 18.4X +Both columns - CSV 23659 / 24426 0.7 1504.2 1.0X +Both columns - Json 12312 / 12494 1.3 782.8 2.0X +Both columns - Parquet Vectorized 227 / 237 69.4 14.4 106.2X +Both columns - Parquet MR 3090 / 3157 5.1 196.5 7.8X +Both columns - ORC Vectorized 321 / 335 49.0 20.4 75.0X +Both column - ORC Vectorized with copy 397 / 424 39.7 25.2 60.7X +Both columns - ORC MR 2081 / 2153 7.6 132.3 11.6X + + +================================================================================================ +String with Nulls scan +================================================================================================ + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz + +String with Nulls Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +SQL CSV 12699 / 12724 0.8 1211.1 1.0X +SQL Json 8031 / 8272 1.3 765.9 1.6X +SQL Parquet Vectorized 1173 / 1174 8.9 111.8 10.8X +SQL Parquet MR 3294 / 3382 3.2 314.1 3.9X +ParquetReader Vectorized 868 / 886 12.1 82.8 14.6X +SQL ORC Vectorized 882 / 915 11.9 84.1 14.4X +SQL ORC Vectorized with copy 1303 / 1379 8.0 124.3 9.7X +SQL ORC MR 3100 / 3243 3.4 295.6 4.1X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz + +String with Nulls Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +SQL CSV 12150 / 12299 0.9 1158.8 1.0X +SQL Json 6260 / 6318 1.7 597.0 1.9X +SQL Parquet Vectorized 869 / 924 12.1 82.9 14.0X +SQL Parquet MR 2310 / 2326 4.5 220.3 5.3X +ParquetReader Vectorized 847 / 869 12.4 80.8 14.3X +SQL ORC Vectorized 953 / 1012 11.0 90.9 12.7X +SQL ORC Vectorized with copy 1359 / 1381 7.7 129.6 8.9X +SQL ORC MR 2607 / 2651 4.0 248.6 4.7X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz + +String with Nulls Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +SQL CSV 10223 / 10363 1.0 974.9 1.0X +SQL Json 3945 / 4019 2.7 376.2 2.6X +SQL Parquet Vectorized 184 / 200 57.1 17.5 55.7X +SQL Parquet MR 1433 / 1497 7.3 136.7 7.1X +ParquetReader Vectorized 175 / 201 60.1 16.6 58.6X +SQL ORC Vectorized 323 / 350 32.5 30.8 31.7X +SQL ORC Vectorized with copy 424 / 460 24.7 40.5 24.1X +SQL ORC MR 1444 / 1495 7.3 137.7 7.1X + + +================================================================================================ +Single Column Scan from multiple columns +================================================================================================ + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz + +Single Column Scan from 10 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +SQL CSV 2436 / 2475 0.4 2322.8 1.0X +SQL Json 2089 / 2104 0.5 1992.4 1.2X +SQL Parquet Vectorized 43 / 47 24.3 41.2 56.4X +SQL Parquet MR 184 / 209 5.7 175.7 13.2X +SQL ORC Vectorized 51 / 65 20.5 48.7 47.7X +SQL ORC Vectorized with copy 50 / 57 21.0 47.6 48.8X +SQL ORC MR 248 / 292 4.2 236.2 9.8X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz + +Single Column Scan from 50 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +SQL CSV 5685 / 5808 0.2 5421.6 1.0X +SQL Json 7570 / 7632 0.1 7219.7 0.8X +SQL Parquet Vectorized 60 / 68 17.5 57.0 95.1X +SQL Parquet MR 191 / 201 5.5 182.2 29.8X +SQL ORC Vectorized 70 / 80 15.1 66.3 81.8X +SQL ORC Vectorized with copy 71 / 81 14.9 67.3 80.6X +SQL ORC MR 738 / 800 1.4 704.1 7.7X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz + +Single Column Scan from 100 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +SQL CSV 9131 / 9214 0.1 8707.9 1.0X +SQL Json 13728 / 13861 0.1 13092.1 0.7X +SQL Parquet Vectorized 86 / 91 12.2 82.1 106.1X +SQL Parquet MR 202 / 219 5.2 192.4 45.2X +SQL ORC Vectorized 94 / 101 11.2 89.2 97.6X +SQL ORC Vectorized with copy 89 / 96 11.8 84.8 102.6X +SQL ORC MR 1532 / 1540 0.7 1460.8 6.0X + + diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala index 51a7f9f1ef09..b38e3f4b8a0d 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala @@ -22,7 +22,7 @@ import scala.collection.JavaConverters._ import scala.util.Random import org.apache.spark.SparkConf -import org.apache.spark.benchmark.Benchmark +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} import org.apache.spark.sql.{DataFrame, DataFrameWriter, Row, SparkSession} import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.plans.SQLHelper @@ -34,10 +34,15 @@ import org.apache.spark.sql.vectorized.ColumnVector /** * Benchmark to measure data source read performance. - * To run this: - * spark-submit --class + * To run this benchmark: + * {{{ + * 1. without sbt: bin/spark-submit --class + * 2. build/sbt "sql/test:runMain " + * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain " + * Results will be written to "benchmarks/DataSourceReadBenchmark-results.txt". + * }}} */ -object DataSourceReadBenchmark extends SQLHelper { +object DataSourceReadBenchmark extends BenchmarkBase with SQLHelper { val conf = new SparkConf() .setAppName("DataSourceReadBenchmark") // Since `spark.master` always exists, overrides this value @@ -93,11 +98,16 @@ object DataSourceReadBenchmark extends SQLHelper { def numericScanBenchmark(values: Int, dataType: DataType): Unit = { // Benchmarks running through spark sql. - val sqlBenchmark = new Benchmark(s"SQL Single ${dataType.sql} Column Scan", values) + val sqlBenchmark = new Benchmark( + s"SQL Single ${dataType.sql} Column Scan", + values, + output = output) // Benchmarks driving reader component directly. val parquetReaderBenchmark = new Benchmark( - s"Parquet Reader Single ${dataType.sql} Column Scan", values) + s"Parquet Reader Single ${dataType.sql} Column Scan", + values, + output = output) withTempPath { dir => withTempTable("t1", "csvTable", "jsonTable", "parquetTable", "orcTable") { @@ -140,74 +150,6 @@ object DataSourceReadBenchmark extends SQLHelper { } } - /* - OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.33-51.37.amzn1.x86_64 - Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz - SQL Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -------------------------------------------------------------------------------------------- - SQL CSV 22964 / 23096 0.7 1460.0 1.0X - SQL Json 8469 / 8593 1.9 538.4 2.7X - SQL Parquet Vectorized 164 / 177 95.8 10.4 139.9X - SQL Parquet MR 1687 / 1706 9.3 107.2 13.6X - SQL ORC Vectorized 191 / 197 82.3 12.2 120.2X - SQL ORC Vectorized with copy 215 / 219 73.2 13.7 106.9X - SQL ORC MR 1392 / 1412 11.3 88.5 16.5X - - - SQL Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -------------------------------------------------------------------------------------------- - SQL CSV 24090 / 24097 0.7 1531.6 1.0X - SQL Json 8791 / 8813 1.8 558.9 2.7X - SQL Parquet Vectorized 204 / 212 77.0 13.0 117.9X - SQL Parquet MR 1813 / 1850 8.7 115.3 13.3X - SQL ORC Vectorized 226 / 230 69.7 14.4 106.7X - SQL ORC Vectorized with copy 295 / 298 53.3 18.8 81.6X - SQL ORC MR 1526 / 1549 10.3 97.1 15.8X - - - SQL Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -------------------------------------------------------------------------------------------- - SQL CSV 25637 / 25791 0.6 1629.9 1.0X - SQL Json 9532 / 9570 1.7 606.0 2.7X - SQL Parquet Vectorized 181 / 191 86.8 11.5 141.5X - SQL Parquet MR 2210 / 2227 7.1 140.5 11.6X - SQL ORC Vectorized 309 / 317 50.9 19.6 83.0X - SQL ORC Vectorized with copy 316 / 322 49.8 20.1 81.2X - SQL ORC MR 1650 / 1680 9.5 104.9 15.5X - - - SQL Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -------------------------------------------------------------------------------------------- - SQL CSV 31617 / 31764 0.5 2010.1 1.0X - SQL Json 12440 / 12451 1.3 790.9 2.5X - SQL Parquet Vectorized 284 / 315 55.4 18.0 111.4X - SQL Parquet MR 2382 / 2390 6.6 151.5 13.3X - SQL ORC Vectorized 398 / 403 39.5 25.3 79.5X - SQL ORC Vectorized with copy 410 / 413 38.3 26.1 77.1X - SQL ORC MR 1783 / 1813 8.8 113.4 17.7X - - - SQL Single FLOAT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -------------------------------------------------------------------------------------------- - SQL CSV 26679 / 26742 0.6 1696.2 1.0X - SQL Json 12490 / 12541 1.3 794.1 2.1X - SQL Parquet Vectorized 174 / 183 90.4 11.1 153.3X - SQL Parquet MR 2201 / 2223 7.1 140.0 12.1X - SQL ORC Vectorized 415 / 429 37.9 26.4 64.3X - SQL ORC Vectorized with copy 422 / 428 37.2 26.9 63.2X - SQL ORC MR 1767 / 1773 8.9 112.3 15.1X - - - SQL Single DOUBLE Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -------------------------------------------------------------------------------------------- - SQL CSV 34223 / 34324 0.5 2175.8 1.0X - SQL Json 17784 / 17785 0.9 1130.7 1.9X - SQL Parquet Vectorized 277 / 283 56.7 17.6 123.4X - SQL Parquet MR 2356 / 2386 6.7 149.8 14.5X - SQL ORC Vectorized 533 / 536 29.5 33.9 64.2X - SQL ORC Vectorized with copy 541 / 546 29.1 34.4 63.3X - SQL ORC MR 2166 / 2177 7.3 137.7 15.8X - */ sqlBenchmark.run() // Driving the parquet reader in batch mode directly. @@ -279,51 +221,13 @@ object DataSourceReadBenchmark extends SQLHelper { } } - /* - OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.33-51.37.amzn1.x86_64 - Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz - Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -------------------------------------------------------------------------------------------- - ParquetReader Vectorized 198 / 202 79.4 12.6 1.0X - ParquetReader Vectorized -> Row 119 / 121 132.3 7.6 1.7X - - - Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -------------------------------------------------------------------------------------------- - ParquetReader Vectorized 282 / 287 55.8 17.9 1.0X - ParquetReader Vectorized -> Row 246 / 247 64.0 15.6 1.1X - - - Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -------------------------------------------------------------------------------------------- - ParquetReader Vectorized 258 / 262 60.9 16.4 1.0X - ParquetReader Vectorized -> Row 259 / 260 60.8 16.5 1.0X - - - Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -------------------------------------------------------------------------------------------- - ParquetReader Vectorized 361 / 369 43.6 23.0 1.0X - ParquetReader Vectorized -> Row 361 / 371 43.6 22.9 1.0X - - - Single FLOAT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -------------------------------------------------------------------------------------------- - ParquetReader Vectorized 253 / 261 62.2 16.1 1.0X - ParquetReader Vectorized -> Row 254 / 256 61.9 16.2 1.0X - - - Single DOUBLE Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -------------------------------------------------------------------------------------------- - ParquetReader Vectorized 357 / 364 44.0 22.7 1.0X - ParquetReader Vectorized -> Row 358 / 366 44.0 22.7 1.0X - */ parquetReaderBenchmark.run() } } } def intStringScanBenchmark(values: Int): Unit = { - val benchmark = new Benchmark("Int and String Scan", values) + val benchmark = new Benchmark("Int and String Scan", values, output = output) withTempPath { dir => withTempTable("t1", "csvTable", "jsonTable", "parquetTable", "orcTable") { @@ -368,26 +272,13 @@ object DataSourceReadBenchmark extends SQLHelper { } } - /* - OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.33-51.37.amzn1.x86_64 - Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz - Int and String Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -------------------------------------------------------------------------------------------- - SQL CSV 27145 / 27158 0.4 2588.7 1.0X - SQL Json 12969 / 13337 0.8 1236.8 2.1X - SQL Parquet Vectorized 2419 / 2448 4.3 230.7 11.2X - SQL Parquet MR 4631 / 4633 2.3 441.7 5.9X - SQL ORC Vectorized 2412 / 2465 4.3 230.0 11.3X - SQL ORC Vectorized with copy 2633 / 2675 4.0 251.1 10.3X - SQL ORC MR 4280 / 4350 2.4 408.2 6.3X - */ benchmark.run() } } } def repeatedStringScanBenchmark(values: Int): Unit = { - val benchmark = new Benchmark("Repeated String", values) + val benchmark = new Benchmark("Repeated String", values, output = output) withTempPath { dir => withTempTable("t1", "csvTable", "jsonTable", "parquetTable", "orcTable") { @@ -432,26 +323,13 @@ object DataSourceReadBenchmark extends SQLHelper { } } - /* - OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.33-51.37.amzn1.x86_64 - Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz - Repeated String: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -------------------------------------------------------------------------------------------- - SQL CSV 17345 / 17424 0.6 1654.1 1.0X - SQL Json 8639 / 8664 1.2 823.9 2.0X - SQL Parquet Vectorized 839 / 854 12.5 80.0 20.7X - SQL Parquet MR 1771 / 1775 5.9 168.9 9.8X - SQL ORC Vectorized 550 / 569 19.1 52.4 31.6X - SQL ORC Vectorized with copy 785 / 849 13.4 74.9 22.1X - SQL ORC MR 2168 / 2202 4.8 206.7 8.0X - */ benchmark.run() } } } def partitionTableScanBenchmark(values: Int): Unit = { - val benchmark = new Benchmark("Partitioned Table", values) + val benchmark = new Benchmark("Partitioned Table", values, output = output) withTempPath { dir => withTempTable("t1", "csvTable", "jsonTable", "parquetTable", "orcTable") { @@ -562,40 +440,13 @@ object DataSourceReadBenchmark extends SQLHelper { } } - /* - OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.33-51.37.amzn1.x86_64 - Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz - Partitioned Table: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -------------------------------------------------------------------------------------------- - Data column - CSV 32613 / 32841 0.5 2073.4 1.0X - Data column - Json 13343 / 13469 1.2 848.3 2.4X - Data column - Parquet Vectorized 302 / 318 52.1 19.2 108.0X - Data column - Parquet MR 2908 / 2924 5.4 184.9 11.2X - Data column - ORC Vectorized 412 / 425 38.1 26.2 79.1X - Data column - ORC Vectorized with copy 442 / 446 35.6 28.1 73.8X - Data column - ORC MR 2390 / 2396 6.6 152.0 13.6X - Partition column - CSV 9626 / 9683 1.6 612.0 3.4X - Partition column - Json 10909 / 10923 1.4 693.6 3.0X - Partition column - Parquet Vectorized 69 / 76 228.4 4.4 473.6X - Partition column - Parquet MR 1898 / 1933 8.3 120.7 17.2X - Partition column - ORC Vectorized 67 / 74 236.0 4.2 489.4X - Partition column - ORC Vectorized with copy 65 / 72 241.9 4.1 501.6X - Partition column - ORC MR 1743 / 1749 9.0 110.8 18.7X - Both columns - CSV 35523 / 35552 0.4 2258.5 0.9X - Both columns - Json 13676 / 13681 1.2 869.5 2.4X - Both columns - Parquet Vectorized 317 / 326 49.5 20.2 102.7X - Both columns - Parquet MR 3333 / 3336 4.7 211.9 9.8X - Both columns - ORC Vectorized 441 / 446 35.6 28.1 73.9X - Both column - ORC Vectorized with copy 517 / 524 30.4 32.9 63.1X - Both columns - ORC MR 2574 / 2577 6.1 163.6 12.7X - */ benchmark.run() } } } def stringWithNullsScanBenchmark(values: Int, fractionOfNulls: Double): Unit = { - val benchmark = new Benchmark("String with Nulls Scan", values) + val benchmark = new Benchmark("String with Nulls Scan", values, output = output) withTempPath { dir => withTempTable("t1", "csvTable", "jsonTable", "parquetTable", "orcTable") { @@ -673,51 +524,16 @@ object DataSourceReadBenchmark extends SQLHelper { } } - /* - OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.33-51.37.amzn1.x86_64 - Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz - String with Nulls Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -------------------------------------------------------------------------------------------- - SQL CSV 14875 / 14920 0.7 1418.6 1.0X - SQL Json 10974 / 10992 1.0 1046.5 1.4X - SQL Parquet Vectorized 1711 / 1750 6.1 163.2 8.7X - SQL Parquet MR 3838 / 3884 2.7 366.0 3.9X - ParquetReader Vectorized 1155 / 1168 9.1 110.2 12.9X - SQL ORC Vectorized 1341 / 1380 7.8 127.9 11.1X - SQL ORC Vectorized with copy 1659 / 1716 6.3 158.2 9.0X - SQL ORC MR 3594 / 3634 2.9 342.7 4.1X - - - String with Nulls Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -------------------------------------------------------------------------------------------- - SQL CSV 17219 / 17264 0.6 1642.1 1.0X - SQL Json 8843 / 8864 1.2 843.3 1.9X - SQL Parquet Vectorized 1169 / 1178 9.0 111.4 14.7X - SQL Parquet MR 2676 / 2697 3.9 255.2 6.4X - ParquetReader Vectorized 1068 / 1071 9.8 101.8 16.1X - SQL ORC Vectorized 1319 / 1319 7.9 125.8 13.1X - SQL ORC Vectorized with copy 1638 / 1639 6.4 156.2 10.5X - SQL ORC MR 3230 / 3257 3.2 308.1 5.3X - - - String with Nulls Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -------------------------------------------------------------------------------------------- - SQL CSV 13976 / 14053 0.8 1332.8 1.0X - SQL Json 5166 / 5176 2.0 492.6 2.7X - SQL Parquet Vectorized 274 / 282 38.2 26.2 50.9X - SQL Parquet MR 1553 / 1555 6.8 148.1 9.0X - ParquetReader Vectorized 241 / 246 43.5 23.0 57.9X - SQL ORC Vectorized 476 / 479 22.0 45.4 29.3X - SQL ORC Vectorized with copy 584 / 588 17.9 55.7 23.9X - SQL ORC MR 1720 / 1734 6.1 164.1 8.1X - */ benchmark.run() } } } def columnsBenchmark(values: Int, width: Int): Unit = { - val benchmark = new Benchmark(s"Single Column Scan from $width columns", values) + val benchmark = new Benchmark( + s"Single Column Scan from $width columns", + values, + output = output) withTempPath { dir => withTempTable("t1", "csvTable", "jsonTable", "parquetTable", "orcTable") { @@ -763,58 +579,35 @@ object DataSourceReadBenchmark extends SQLHelper { } } - /* - OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.33-51.37.amzn1.x86_64 - Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz - Single Column Scan from 10 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -------------------------------------------------------------------------------------------- - SQL CSV 3478 / 3481 0.3 3316.4 1.0X - SQL Json 2646 / 2654 0.4 2523.6 1.3X - SQL Parquet Vectorized 67 / 72 15.8 63.5 52.2X - SQL Parquet MR 207 / 214 5.1 197.6 16.8X - SQL ORC Vectorized 69 / 76 15.2 66.0 50.3X - SQL ORC Vectorized with copy 70 / 76 15.0 66.5 49.9X - SQL ORC MR 299 / 303 3.5 285.1 11.6X - - - Single Column Scan from 50 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -------------------------------------------------------------------------------------------- - SQL CSV 9214 / 9236 0.1 8786.7 1.0X - SQL Json 9943 / 9978 0.1 9482.7 0.9X - SQL Parquet Vectorized 77 / 86 13.6 73.3 119.8X - SQL Parquet MR 229 / 235 4.6 218.6 40.2X - SQL ORC Vectorized 84 / 96 12.5 80.0 109.9X - SQL ORC Vectorized with copy 83 / 91 12.6 79.4 110.7X - SQL ORC MR 843 / 854 1.2 804.0 10.9X - - - Single Column Scan from 100 columns Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -------------------------------------------------------------------------------------------- - SQL CSV 16503 / 16622 0.1 15738.9 1.0X - SQL Json 19109 / 19184 0.1 18224.2 0.9X - SQL Parquet Vectorized 99 / 108 10.6 94.3 166.8X - SQL Parquet MR 253 / 264 4.1 241.6 65.1X - SQL ORC Vectorized 107 / 114 9.8 101.6 154.8X - SQL ORC Vectorized with copy 107 / 118 9.8 102.1 154.1X - SQL ORC MR 1526 / 1529 0.7 1455.3 10.8X - */ benchmark.run() } } } - def main(args: Array[String]): Unit = { - Seq(ByteType, ShortType, IntegerType, LongType, FloatType, DoubleType).foreach { dataType => - numericScanBenchmark(1024 * 1024 * 15, dataType) + override def runBenchmarkSuite(): Unit = { + runBenchmark("SQL Single Numeric Column Scan") { + Seq(ByteType, ShortType, IntegerType, LongType, FloatType, DoubleType).foreach { + dataType => numericScanBenchmark(1024 * 1024 * 15, dataType) + } + } + runBenchmark("Int and String Scan") { + intStringScanBenchmark(1024 * 1024 * 10) } - intStringScanBenchmark(1024 * 1024 * 10) - repeatedStringScanBenchmark(1024 * 1024 * 10) - partitionTableScanBenchmark(1024 * 1024 * 15) - for (fractionOfNulls <- List(0.0, 0.50, 0.95)) { - stringWithNullsScanBenchmark(1024 * 1024 * 10, fractionOfNulls) + runBenchmark("Repeated String Scan") { + repeatedStringScanBenchmark(1024 * 1024 * 10) } - for (columnWidth <- List(10, 50, 100)) { - columnsBenchmark(1024 * 1024 * 1, columnWidth) + runBenchmark("Partitioned Table Scan") { + partitionTableScanBenchmark(1024 * 1024 * 15) + } + runBenchmark("String with Nulls Scan") { + for (fractionOfNulls <- List(0.0, 0.50, 0.95)) { + stringWithNullsScanBenchmark(1024 * 1024 * 10, fractionOfNulls) + } + } + runBenchmark("Single Column Scan From Wide Columns") { + for (columnWidth <- List(10, 50, 100)) { + columnsBenchmark(1024 * 1024 * 1, columnWidth) + } } } } From 2479e1a5eb25e52eb8bc3af498e1b4df20b828d2 Mon Sep 17 00:00:00 2001 From: Peter Toth Date: Sun, 7 Oct 2018 20:55:09 +0200 Subject: [PATCH 2/5] [SPARK-25662][TEST] refresh results Change-Id: I49fd66b225fa4cee6ed163a16f55b32506c00e59 --- .../DataSourceReadBenchmark-results.txt | 280 +++++++++--------- 1 file changed, 140 insertions(+), 140 deletions(-) diff --git a/sql/core/benchmarks/DataSourceReadBenchmark-results.txt b/sql/core/benchmarks/DataSourceReadBenchmark-results.txt index e4b83f5c4ebd..7c6f346d4843 100644 --- a/sql/core/benchmarks/DataSourceReadBenchmark-results.txt +++ b/sql/core/benchmarks/DataSourceReadBenchmark-results.txt @@ -1,5 +1,5 @@ ================================================================================================ -Single column scan +SQL Single Numeric Column Scan ================================================================================================ Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 @@ -7,130 +7,130 @@ Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz SQL Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 18420 / 18627 0.9 1171.1 1.0X -SQL Json 7195 / 7199 2.2 457.4 2.6X -SQL Parquet Vectorized 118 / 125 133.8 7.5 156.7X -SQL Parquet MR 1607 / 1624 9.8 102.1 11.5X -SQL ORC Vectorized 180 / 205 87.2 11.5 102.1X -SQL ORC Vectorized with copy 219 / 266 71.8 13.9 84.0X -SQL ORC MR 1251 / 1263 12.6 79.5 14.7X +SQL CSV 17061 / 17127 0.9 1084.7 1.0X +SQL Json 7182 / 7351 2.2 456.6 2.4X +SQL Parquet Vectorized 121 / 146 130.4 7.7 141.4X +SQL Parquet MR 1406 / 1412 11.2 89.4 12.1X +SQL ORC Vectorized 118 / 148 133.2 7.5 144.5X +SQL ORC Vectorized with copy 162 / 196 96.9 10.3 105.2X +SQL ORC MR 1176 / 1250 13.4 74.7 14.5X Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz Parquet Reader Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 164 / 180 96.1 10.4 1.0X -ParquetReader Vectorized -> Row 90 / 92 174.8 5.7 1.8X +ParquetReader Vectorized 159 / 199 99.2 10.1 1.0X +ParquetReader Vectorized -> Row 84 / 94 186.3 5.4 1.9X Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz SQL Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 18303 / 20552 0.9 1163.7 1.0X -SQL Json 7744 / 7917 2.0 492.3 2.4X -SQL Parquet Vectorized 144 / 168 109.2 9.2 127.1X -SQL Parquet MR 1653 / 1773 9.5 105.1 11.1X -SQL ORC Vectorized 168 / 177 93.5 10.7 108.8X -SQL ORC Vectorized with copy 256 / 334 61.4 16.3 71.4X -SQL ORC MR 1531 / 1574 10.3 97.3 12.0X +SQL CSV 17556 / 17671 0.9 1116.2 1.0X +SQL Json 7260 / 7344 2.2 461.6 2.4X +SQL Parquet Vectorized 144 / 172 109.2 9.2 121.9X +SQL Parquet MR 1526 / 1526 10.3 97.0 11.5X +SQL ORC Vectorized 169 / 187 92.8 10.8 103.6X +SQL ORC Vectorized with copy 215 / 229 73.1 13.7 81.5X +SQL ORC MR 1472 / 1582 10.7 93.6 11.9X Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz Parquet Reader Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 218 / 249 72.3 13.8 1.0X -ParquetReader Vectorized -> Row 172 / 182 91.4 10.9 1.3X +ParquetReader Vectorized 215 / 246 73.2 13.7 1.0X +ParquetReader Vectorized -> Row 168 / 175 93.8 10.7 1.3X Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz SQL Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 21818 / 22203 0.7 1387.2 1.0X -SQL Json 7667 / 7706 2.1 487.5 2.8X -SQL Parquet Vectorized 121 / 140 129.6 7.7 179.8X -SQL Parquet MR 1802 / 1959 8.7 114.6 12.1X -SQL ORC Vectorized 223 / 242 70.4 14.2 97.7X -SQL ORC Vectorized with copy 224 / 234 70.2 14.2 97.4X -SQL ORC MR 1389 / 1492 11.3 88.3 15.7X +SQL CSV 18629 / 20491 0.8 1184.4 1.0X +SQL Json 8763 / 9045 1.8 557.1 2.1X +SQL Parquet Vectorized 140 / 181 112.1 8.9 132.7X +SQL Parquet MR 2057 / 2171 7.6 130.8 9.1X +SQL ORC Vectorized 271 / 294 58.0 17.2 68.7X +SQL ORC Vectorized with copy 272 / 317 57.8 17.3 68.5X +SQL ORC MR 1858 / 1941 8.5 118.1 10.0X Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz Parquet Reader Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 209 / 236 75.2 13.3 1.0X -ParquetReader Vectorized -> Row 195 / 206 80.5 12.4 1.1X +ParquetReader Vectorized 277 / 338 56.8 17.6 1.0X +ParquetReader Vectorized -> Row 250 / 335 63.0 15.9 1.1X Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz SQL Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 24302 / 25372 0.6 1545.1 1.0X -SQL Json 10114 / 10220 1.6 643.0 2.4X -SQL Parquet Vectorized 192 / 199 82.0 12.2 126.7X -SQL Parquet MR 1950 / 1975 8.1 124.0 12.5X -SQL ORC Vectorized 277 / 284 56.8 17.6 87.8X -SQL ORC Vectorized with copy 281 / 288 55.9 17.9 86.4X -SQL ORC MR 1415 / 1444 11.1 90.0 17.2X +SQL CSV 22969 / 23041 0.7 1460.3 1.0X +SQL Json 9781 / 9900 1.6 621.8 2.3X +SQL Parquet Vectorized 213 / 229 73.7 13.6 107.6X +SQL Parquet MR 2026 / 2038 7.8 128.8 11.3X +SQL ORC Vectorized 298 / 348 52.8 19.0 77.1X +SQL ORC Vectorized with copy 293 / 335 53.7 18.6 78.4X +SQL ORC MR 1735 / 1766 9.1 110.3 13.2X Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz Parquet Reader Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 276 / 310 57.0 17.5 1.0X -ParquetReader Vectorized -> Row 262 / 271 60.1 16.6 1.1X +ParquetReader Vectorized 376 / 442 41.9 23.9 1.0X +ParquetReader Vectorized -> Row 287 / 377 54.8 18.2 1.3X Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz SQL Single FLOAT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 20107 / 20228 0.8 1278.3 1.0X -SQL Json 9748 / 9917 1.6 619.8 2.1X -SQL Parquet Vectorized 117 / 122 134.8 7.4 172.3X -SQL Parquet MR 1745 / 1757 9.0 110.9 11.5X -SQL ORC Vectorized 308 / 345 51.1 19.6 65.4X -SQL ORC Vectorized with copy 317 / 345 49.7 20.1 63.5X -SQL ORC MR 1437 / 1449 10.9 91.4 14.0X +SQL CSV 19398 / 19410 0.8 1233.3 1.0X +SQL Json 9516 / 9612 1.7 605.0 2.0X +SQL Parquet Vectorized 135 / 157 116.4 8.6 143.6X +SQL Parquet MR 1770 / 1772 8.9 112.5 11.0X +SQL ORC Vectorized 325 / 343 48.4 20.7 59.6X +SQL ORC Vectorized with copy 336 / 372 46.8 21.4 57.7X +SQL ORC MR 1612 / 1635 9.8 102.5 12.0X Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz Parquet Reader Single FLOAT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 192 / 224 81.9 12.2 1.0X -ParquetReader Vectorized -> Row 186 / 206 84.7 11.8 1.0X +ParquetReader Vectorized 222 / 267 71.0 14.1 1.0X +ParquetReader Vectorized -> Row 185 / 194 85.0 11.8 1.2X Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz SQL Single DOUBLE Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 24884 / 24896 0.6 1582.1 1.0X -SQL Json 13202 / 13262 1.2 839.4 1.9X -SQL Parquet Vectorized 191 / 201 82.2 12.2 130.1X -SQL Parquet MR 1908 / 1951 8.2 121.3 13.0X -SQL ORC Vectorized 378 / 394 41.6 24.0 65.9X -SQL ORC Vectorized with copy 396 / 402 39.7 25.2 62.9X -SQL ORC MR 1704 / 1709 9.2 108.3 14.6X +SQL CSV 23579 / 23621 0.7 1499.1 1.0X +SQL Json 13196 / 13234 1.2 839.0 1.8X +SQL Parquet Vectorized 244 / 327 64.4 15.5 96.6X +SQL Parquet MR 2066 / 2113 7.6 131.3 11.4X +SQL ORC Vectorized 404 / 427 39.0 25.7 58.4X +SQL ORC Vectorized with copy 414 / 462 38.0 26.3 57.0X +SQL ORC MR 1677 / 1772 9.4 106.6 14.1X Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz Parquet Reader Single DOUBLE Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 273 / 339 57.7 17.3 1.0X -ParquetReader Vectorized -> Row 259 / 275 60.7 16.5 1.1X +ParquetReader Vectorized 409 / 474 38.4 26.0 1.0X +ParquetReader Vectorized -> Row 288 / 356 54.5 18.3 1.4X ================================================================================================ -Int and String scan +Int and String Scan ================================================================================================ Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 @@ -138,17 +138,17 @@ Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz Int and String Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 17383 / 17428 0.6 1657.8 1.0X -SQL Json 9170 / 9249 1.1 874.6 1.9X -SQL Parquet Vectorized 1826 / 1853 5.7 174.2 9.5X -SQL Parquet MR 3773 / 3881 2.8 359.8 4.6X -SQL ORC Vectorized 1975 / 2111 5.3 188.4 8.8X -SQL ORC Vectorized with copy 2050 / 2122 5.1 195.5 8.5X -SQL ORC MR 3521 / 3617 3.0 335.8 4.9X +SQL CSV 17376 / 17595 0.6 1657.1 1.0X +SQL Json 9431 / 9511 1.1 899.4 1.8X +SQL Parquet Vectorized 2028 / 2070 5.2 193.4 8.6X +SQL Parquet MR 4025 / 4057 2.6 383.9 4.3X +SQL ORC Vectorized 2448 / 2549 4.3 233.4 7.1X +SQL ORC Vectorized with copy 2594 / 2598 4.0 247.4 6.7X +SQL ORC MR 3500 / 3700 3.0 333.8 5.0X ================================================================================================ -Repeated String scan +Repeated String Scan ================================================================================================ Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 @@ -156,17 +156,17 @@ Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz Repeated String: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 9976 / 10083 1.1 951.4 1.0X -SQL Json 5550 / 5560 1.9 529.3 1.8X -SQL Parquet Vectorized 609 / 626 17.2 58.0 16.4X -SQL Parquet MR 1435 / 1490 7.3 136.8 7.0X -SQL ORC Vectorized 377 / 391 27.8 35.9 26.5X -SQL ORC Vectorized with copy 564 / 593 18.6 53.8 17.7X -SQL ORC MR 1646 / 1654 6.4 157.0 6.1X +SQL CSV 10550 / 10706 1.0 1006.1 1.0X +SQL Json 5747 / 5751 1.8 548.1 1.8X +SQL Parquet Vectorized 651 / 671 16.1 62.1 16.2X +SQL Parquet MR 1417 / 1445 7.4 135.2 7.4X +SQL ORC Vectorized 406 / 423 25.9 38.7 26.0X +SQL ORC Vectorized with copy 650 / 677 16.1 62.0 16.2X +SQL ORC MR 1705 / 1764 6.2 162.6 6.2X ================================================================================================ -Partitioned Table scan +Partitioned Table Scan ================================================================================================ Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 @@ -174,31 +174,31 @@ Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz Partitioned Table: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Data column - CSV 24069 / 24070 0.7 1530.3 1.0X -Data column - Json 9732 / 9879 1.6 618.7 2.5X -Data column - Parquet Vectorized 188 / 207 83.5 12.0 127.8X -Data column - Parquet MR 2797 / 2818 5.6 177.8 8.6X -Data column - ORC Vectorized 282 / 300 55.7 17.9 85.3X -Data column - ORC Vectorized with copy 281 / 295 56.0 17.9 85.7X -Data column - ORC MR 1954 / 1958 8.1 124.2 12.3X -Partition column - CSV 5538 / 5575 2.8 352.1 4.3X -Partition column - Json 3919 / 3972 4.0 249.2 6.1X -Partition column - Parquet Vectorized 49 / 57 318.2 3.1 486.9X -Partition column - Parquet MR 1411 / 1415 11.1 89.7 17.1X -Partition column - ORC Vectorized 50 / 65 311.8 3.2 477.1X -Partition column - ORC Vectorized with copy 50 / 60 315.0 3.2 482.0X -Partition column - ORC MR 1305 / 1318 12.1 83.0 18.4X -Both columns - CSV 23659 / 24426 0.7 1504.2 1.0X -Both columns - Json 12312 / 12494 1.3 782.8 2.0X -Both columns - Parquet Vectorized 227 / 237 69.4 14.4 106.2X -Both columns - Parquet MR 3090 / 3157 5.1 196.5 7.8X -Both columns - ORC Vectorized 321 / 335 49.0 20.4 75.0X -Both column - ORC Vectorized with copy 397 / 424 39.7 25.2 60.7X -Both columns - ORC MR 2081 / 2153 7.6 132.3 11.6X +Data column - CSV 24900 / 25018 0.6 1583.1 1.0X +Data column - Json 10203 / 10224 1.5 648.7 2.4X +Data column - Parquet Vectorized 248 / 270 63.5 15.7 100.5X +Data column - Parquet MR 2700 / 2774 5.8 171.6 9.2X +Data column - ORC Vectorized 314 / 377 50.1 20.0 79.3X +Data column - ORC Vectorized with copy 348 / 350 45.3 22.1 71.6X +Data column - ORC MR 2149 / 2150 7.3 136.6 11.6X +Partition column - CSV 5350 / 5452 2.9 340.1 4.7X +Partition column - Json 4050 / 4096 3.9 257.5 6.1X +Partition column - Parquet Vectorized 98 / 100 159.8 6.3 252.9X +Partition column - Parquet MR 1395 / 1422 11.3 88.7 17.8X +Partition column - ORC Vectorized 96 / 105 163.2 6.1 258.3X +Partition column - ORC Vectorized with copy 97 / 105 161.6 6.2 255.9X +Partition column - ORC MR 1393 / 1400 11.3 88.6 17.9X +Both columns - CSV 23599 / 23897 0.7 1500.4 1.1X +Both columns - Json 10743 / 10794 1.5 683.0 2.3X +Both columns - Parquet Vectorized 252 / 268 62.5 16.0 98.9X +Both columns - Parquet MR 2981 / 3007 5.3 189.5 8.4X +Both columns - ORC Vectorized 337 / 353 46.7 21.4 74.0X +Both column - ORC Vectorized with copy 385 / 394 40.9 24.5 64.7X +Both columns - ORC MR 2163 / 2241 7.3 137.5 11.5X ================================================================================================ -String with Nulls scan +String with Nulls Scan ================================================================================================ Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 @@ -206,46 +206,46 @@ Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz String with Nulls Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 12699 / 12724 0.8 1211.1 1.0X -SQL Json 8031 / 8272 1.3 765.9 1.6X -SQL Parquet Vectorized 1173 / 1174 8.9 111.8 10.8X -SQL Parquet MR 3294 / 3382 3.2 314.1 3.9X -ParquetReader Vectorized 868 / 886 12.1 82.8 14.6X -SQL ORC Vectorized 882 / 915 11.9 84.1 14.4X -SQL ORC Vectorized with copy 1303 / 1379 8.0 124.3 9.7X -SQL ORC MR 3100 / 3243 3.4 295.6 4.1X +SQL CSV 13422 / 13552 0.8 1280.0 1.0X +SQL Json 8135 / 8330 1.3 775.8 1.6X +SQL Parquet Vectorized 1253 / 1310 8.4 119.5 10.7X +SQL Parquet MR 3163 / 3230 3.3 301.7 4.2X +ParquetReader Vectorized 851 / 931 12.3 81.1 15.8X +SQL ORC Vectorized 880 / 1005 11.9 83.9 15.3X +SQL ORC Vectorized with copy 1670 / 1718 6.3 159.3 8.0X +SQL ORC MR 3348 / 3384 3.1 319.3 4.0X Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz String with Nulls Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 12150 / 12299 0.9 1158.8 1.0X -SQL Json 6260 / 6318 1.7 597.0 1.9X -SQL Parquet Vectorized 869 / 924 12.1 82.9 14.0X -SQL Parquet MR 2310 / 2326 4.5 220.3 5.3X -ParquetReader Vectorized 847 / 869 12.4 80.8 14.3X -SQL ORC Vectorized 953 / 1012 11.0 90.9 12.7X -SQL ORC Vectorized with copy 1359 / 1381 7.7 129.6 8.9X -SQL ORC MR 2607 / 2651 4.0 248.6 4.7X +SQL CSV 10922 / 10952 1.0 1041.6 1.0X +SQL Json 6010 / 6039 1.7 573.2 1.8X +SQL Parquet Vectorized 903 / 1022 11.6 86.1 12.1X +SQL Parquet MR 2458 / 2479 4.3 234.4 4.4X +ParquetReader Vectorized 773 / 822 13.6 73.8 14.1X +SQL ORC Vectorized 1123 / 1129 9.3 107.1 9.7X +SQL ORC Vectorized with copy 1449 / 1461 7.2 138.2 7.5X +SQL ORC MR 2737 / 2810 3.8 261.0 4.0X Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz String with Nulls Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 10223 / 10363 1.0 974.9 1.0X -SQL Json 3945 / 4019 2.7 376.2 2.6X -SQL Parquet Vectorized 184 / 200 57.1 17.5 55.7X -SQL Parquet MR 1433 / 1497 7.3 136.7 7.1X -ParquetReader Vectorized 175 / 201 60.1 16.6 58.6X -SQL ORC Vectorized 323 / 350 32.5 30.8 31.7X -SQL ORC Vectorized with copy 424 / 460 24.7 40.5 24.1X -SQL ORC MR 1444 / 1495 7.3 137.7 7.1X +SQL CSV 9403 / 9438 1.1 896.7 1.0X +SQL Json 3809 / 3813 2.8 363.3 2.5X +SQL Parquet Vectorized 181 / 190 57.9 17.3 51.9X +SQL Parquet MR 1442 / 1446 7.3 137.5 6.5X +ParquetReader Vectorized 176 / 187 59.5 16.8 53.3X +SQL ORC Vectorized 342 / 352 30.6 32.6 27.5X +SQL ORC Vectorized with copy 425 / 465 24.7 40.5 22.1X +SQL ORC MR 1382 / 1388 7.6 131.8 6.8X ================================================================================================ -Single Column Scan from multiple columns +Single Column Scan From Wide Columns ================================================================================================ Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 @@ -253,38 +253,38 @@ Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz Single Column Scan from 10 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 2436 / 2475 0.4 2322.8 1.0X -SQL Json 2089 / 2104 0.5 1992.4 1.2X -SQL Parquet Vectorized 43 / 47 24.3 41.2 56.4X -SQL Parquet MR 184 / 209 5.7 175.7 13.2X -SQL ORC Vectorized 51 / 65 20.5 48.7 47.7X -SQL ORC Vectorized with copy 50 / 57 21.0 47.6 48.8X -SQL ORC MR 248 / 292 4.2 236.2 9.8X +SQL CSV 2544 / 2572 0.4 2426.5 1.0X +SQL Json 2015 / 2018 0.5 1921.7 1.3X +SQL Parquet Vectorized 48 / 57 21.8 45.9 52.9X +SQL Parquet MR 180 / 198 5.8 171.4 14.2X +SQL ORC Vectorized 55 / 66 18.9 52.9 45.9X +SQL ORC Vectorized with copy 56 / 67 18.6 53.8 45.1X +SQL ORC MR 262 / 319 4.0 250.2 9.7X Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz Single Column Scan from 50 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 5685 / 5808 0.2 5421.6 1.0X -SQL Json 7570 / 7632 0.1 7219.7 0.8X -SQL Parquet Vectorized 60 / 68 17.5 57.0 95.1X -SQL Parquet MR 191 / 201 5.5 182.2 29.8X -SQL ORC Vectorized 70 / 80 15.1 66.3 81.8X -SQL ORC Vectorized with copy 71 / 81 14.9 67.3 80.6X -SQL ORC MR 738 / 800 1.4 704.1 7.7X +SQL CSV 5721 / 5724 0.2 5456.2 1.0X +SQL Json 7332 / 7334 0.1 6992.6 0.8X +SQL Parquet Vectorized 64 / 74 16.4 60.8 89.8X +SQL Parquet MR 200 / 204 5.2 190.6 28.6X +SQL ORC Vectorized 73 / 83 14.4 69.4 78.7X +SQL ORC Vectorized with copy 72 / 91 14.6 68.7 79.4X +SQL ORC MR 930 / 962 1.1 887.0 6.2X Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz Single Column Scan from 100 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 9131 / 9214 0.1 8707.9 1.0X -SQL Json 13728 / 13861 0.1 13092.1 0.7X -SQL Parquet Vectorized 86 / 91 12.2 82.1 106.1X -SQL Parquet MR 202 / 219 5.2 192.4 45.2X -SQL ORC Vectorized 94 / 101 11.2 89.2 97.6X -SQL ORC Vectorized with copy 89 / 96 11.8 84.8 102.6X -SQL ORC MR 1532 / 1540 0.7 1460.8 6.0X +SQL CSV 9475 / 9485 0.1 9035.7 1.0X +SQL Json 13623 / 13695 0.1 12991.8 0.7X +SQL Parquet Vectorized 94 / 100 11.2 89.4 101.0X +SQL Parquet MR 226 / 234 4.6 215.3 42.0X +SQL ORC Vectorized 96 / 103 10.9 91.9 98.3X +SQL ORC Vectorized with copy 98 / 122 10.7 93.5 96.6X +SQL ORC MR 1409 / 1467 0.7 1343.3 6.7X From cf61f1c4df40b681f2db8cf233b8fbc0df88598b Mon Sep 17 00:00:00 2001 From: Peter Toth Date: Wed, 10 Oct 2018 17:40:55 +0200 Subject: [PATCH 3/5] [SPARK-25662][TEST] fix spark-submit command --- .../spark/sql/execution/benchmark/DataSourceReadBenchmark.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala index b38e3f4b8a0d..e1d21940617e 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala @@ -36,7 +36,7 @@ import org.apache.spark.sql.vectorized.ColumnVector * Benchmark to measure data source read performance. * To run this benchmark: * {{{ - * 1. without sbt: bin/spark-submit --class + * 1. without sbt: bin/spark-submit --class --jars , * 2. build/sbt "sql/test:runMain " * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain " * Results will be written to "benchmarks/DataSourceReadBenchmark-results.txt". From aee046de54e9c75fc97ea44439544d6f57c2696d Mon Sep 17 00:00:00 2001 From: Dongjoon Hyun Date: Thu, 11 Oct 2018 01:25:49 +0000 Subject: [PATCH 4/5] Update result --- .../DataSourceReadBenchmark-results.txt | 393 +++++++++--------- 1 file changed, 186 insertions(+), 207 deletions(-) diff --git a/sql/core/benchmarks/DataSourceReadBenchmark-results.txt b/sql/core/benchmarks/DataSourceReadBenchmark-results.txt index 7c6f346d4843..2d3bae442cc5 100644 --- a/sql/core/benchmarks/DataSourceReadBenchmark-results.txt +++ b/sql/core/benchmarks/DataSourceReadBenchmark-results.txt @@ -2,289 +2,268 @@ SQL Single Numeric Column Scan ================================================================================================ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz - +OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz SQL Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 17061 / 17127 0.9 1084.7 1.0X -SQL Json 7182 / 7351 2.2 456.6 2.4X -SQL Parquet Vectorized 121 / 146 130.4 7.7 141.4X -SQL Parquet MR 1406 / 1412 11.2 89.4 12.1X -SQL ORC Vectorized 118 / 148 133.2 7.5 144.5X -SQL ORC Vectorized with copy 162 / 196 96.9 10.3 105.2X -SQL ORC MR 1176 / 1250 13.4 74.7 14.5X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz - +SQL CSV 21508 / 22112 0.7 1367.5 1.0X +SQL Json 8705 / 8825 1.8 553.4 2.5X +SQL Parquet Vectorized 157 / 186 100.0 10.0 136.7X +SQL Parquet MR 1789 / 1794 8.8 113.8 12.0X +SQL ORC Vectorized 156 / 166 100.9 9.9 138.0X +SQL ORC Vectorized with copy 218 / 225 72.1 13.9 98.6X +SQL ORC MR 1448 / 1492 10.9 92.0 14.9X + +OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Parquet Reader Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 159 / 199 99.2 10.1 1.0X -ParquetReader Vectorized -> Row 84 / 94 186.3 5.4 1.9X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz +ParquetReader Vectorized 202 / 211 77.7 12.9 1.0X +ParquetReader Vectorized -> Row 118 / 120 133.5 7.5 1.7X +OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz SQL Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 17556 / 17671 0.9 1116.2 1.0X -SQL Json 7260 / 7344 2.2 461.6 2.4X -SQL Parquet Vectorized 144 / 172 109.2 9.2 121.9X -SQL Parquet MR 1526 / 1526 10.3 97.0 11.5X -SQL ORC Vectorized 169 / 187 92.8 10.8 103.6X -SQL ORC Vectorized with copy 215 / 229 73.1 13.7 81.5X -SQL ORC MR 1472 / 1582 10.7 93.6 11.9X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz - +SQL CSV 23282 / 23312 0.7 1480.2 1.0X +SQL Json 9187 / 9189 1.7 584.1 2.5X +SQL Parquet Vectorized 204 / 218 77.0 13.0 114.0X +SQL Parquet MR 1941 / 1953 8.1 123.4 12.0X +SQL ORC Vectorized 217 / 225 72.6 13.8 107.5X +SQL ORC Vectorized with copy 279 / 289 56.3 17.8 83.4X +SQL ORC MR 1541 / 1549 10.2 98.0 15.1X + +OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Parquet Reader Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 215 / 246 73.2 13.7 1.0X -ParquetReader Vectorized -> Row 168 / 175 93.8 10.7 1.3X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz +ParquetReader Vectorized 288 / 297 54.6 18.3 1.0X +ParquetReader Vectorized -> Row 255 / 257 61.7 16.2 1.1X +OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz SQL Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 18629 / 20491 0.8 1184.4 1.0X -SQL Json 8763 / 9045 1.8 557.1 2.1X -SQL Parquet Vectorized 140 / 181 112.1 8.9 132.7X -SQL Parquet MR 2057 / 2171 7.6 130.8 9.1X -SQL ORC Vectorized 271 / 294 58.0 17.2 68.7X -SQL ORC Vectorized with copy 272 / 317 57.8 17.3 68.5X -SQL ORC MR 1858 / 1941 8.5 118.1 10.0X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz - +SQL CSV 24990 / 25012 0.6 1588.8 1.0X +SQL Json 9837 / 9865 1.6 625.4 2.5X +SQL Parquet Vectorized 170 / 180 92.3 10.8 146.6X +SQL Parquet MR 2319 / 2328 6.8 147.4 10.8X +SQL ORC Vectorized 293 / 301 53.7 18.6 85.3X +SQL ORC Vectorized with copy 297 / 309 52.9 18.9 84.0X +SQL ORC MR 1667 / 1674 9.4 106.0 15.0X + +OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Parquet Reader Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 277 / 338 56.8 17.6 1.0X -ParquetReader Vectorized -> Row 250 / 335 63.0 15.9 1.1X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz +ParquetReader Vectorized 257 / 274 61.3 16.3 1.0X +ParquetReader Vectorized -> Row 259 / 264 60.8 16.4 1.0X +OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz SQL Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 22969 / 23041 0.7 1460.3 1.0X -SQL Json 9781 / 9900 1.6 621.8 2.3X -SQL Parquet Vectorized 213 / 229 73.7 13.6 107.6X -SQL Parquet MR 2026 / 2038 7.8 128.8 11.3X -SQL ORC Vectorized 298 / 348 52.8 19.0 77.1X -SQL ORC Vectorized with copy 293 / 335 53.7 18.6 78.4X -SQL ORC MR 1735 / 1766 9.1 110.3 13.2X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz - +SQL CSV 32537 / 32554 0.5 2068.7 1.0X +SQL Json 12610 / 12668 1.2 801.7 2.6X +SQL Parquet Vectorized 258 / 276 61.0 16.4 126.2X +SQL Parquet MR 2422 / 2435 6.5 154.0 13.4X +SQL ORC Vectorized 378 / 385 41.6 24.0 86.2X +SQL ORC Vectorized with copy 381 / 389 41.3 24.2 85.4X +SQL ORC MR 1797 / 1819 8.8 114.3 18.1X + +OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Parquet Reader Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 376 / 442 41.9 23.9 1.0X -ParquetReader Vectorized -> Row 287 / 377 54.8 18.2 1.3X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz +ParquetReader Vectorized 352 / 368 44.7 22.4 1.0X +ParquetReader Vectorized -> Row 351 / 359 44.8 22.3 1.0X +OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz SQL Single FLOAT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 19398 / 19410 0.8 1233.3 1.0X -SQL Json 9516 / 9612 1.7 605.0 2.0X -SQL Parquet Vectorized 135 / 157 116.4 8.6 143.6X -SQL Parquet MR 1770 / 1772 8.9 112.5 11.0X -SQL ORC Vectorized 325 / 343 48.4 20.7 59.6X -SQL ORC Vectorized with copy 336 / 372 46.8 21.4 57.7X -SQL ORC MR 1612 / 1635 9.8 102.5 12.0X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz - +SQL CSV 27179 / 27184 0.6 1728.0 1.0X +SQL Json 12578 / 12585 1.3 799.7 2.2X +SQL Parquet Vectorized 161 / 171 97.5 10.3 168.5X +SQL Parquet MR 2361 / 2395 6.7 150.1 11.5X +SQL ORC Vectorized 473 / 480 33.3 30.0 57.5X +SQL ORC Vectorized with copy 478 / 483 32.9 30.4 56.8X +SQL ORC MR 1858 / 1859 8.5 118.2 14.6X + +OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Parquet Reader Single FLOAT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 222 / 267 71.0 14.1 1.0X -ParquetReader Vectorized -> Row 185 / 194 85.0 11.8 1.2X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz +ParquetReader Vectorized 251 / 255 62.7 15.9 1.0X +ParquetReader Vectorized -> Row 255 / 259 61.8 16.2 1.0X +OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz SQL Single DOUBLE Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 23579 / 23621 0.7 1499.1 1.0X -SQL Json 13196 / 13234 1.2 839.0 1.8X -SQL Parquet Vectorized 244 / 327 64.4 15.5 96.6X -SQL Parquet MR 2066 / 2113 7.6 131.3 11.4X -SQL ORC Vectorized 404 / 427 39.0 25.7 58.4X -SQL ORC Vectorized with copy 414 / 462 38.0 26.3 57.0X -SQL ORC MR 1677 / 1772 9.4 106.6 14.1X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz - +SQL CSV 34797 / 34830 0.5 2212.3 1.0X +SQL Json 17806 / 17828 0.9 1132.1 2.0X +SQL Parquet Vectorized 260 / 269 60.6 16.5 134.0X +SQL Parquet MR 2512 / 2534 6.3 159.7 13.9X +SQL ORC Vectorized 582 / 593 27.0 37.0 59.8X +SQL ORC Vectorized with copy 576 / 584 27.3 36.6 60.4X +SQL ORC MR 2309 / 2313 6.8 146.8 15.1X + +OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Parquet Reader Single DOUBLE Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 409 / 474 38.4 26.0 1.0X -ParquetReader Vectorized -> Row 288 / 356 54.5 18.3 1.4X +ParquetReader Vectorized 350 / 363 44.9 22.3 1.0X +ParquetReader Vectorized -> Row 350 / 366 44.9 22.3 1.0X ================================================================================================ Int and String Scan ================================================================================================ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz - +OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Int and String Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 17376 / 17595 0.6 1657.1 1.0X -SQL Json 9431 / 9511 1.1 899.4 1.8X -SQL Parquet Vectorized 2028 / 2070 5.2 193.4 8.6X -SQL Parquet MR 4025 / 4057 2.6 383.9 4.3X -SQL ORC Vectorized 2448 / 2549 4.3 233.4 7.1X -SQL ORC Vectorized with copy 2594 / 2598 4.0 247.4 6.7X -SQL ORC MR 3500 / 3700 3.0 333.8 5.0X +SQL CSV 22486 / 22590 0.5 2144.5 1.0X +SQL Json 14124 / 14195 0.7 1347.0 1.6X +SQL Parquet Vectorized 2342 / 2347 4.5 223.4 9.6X +SQL Parquet MR 4660 / 4664 2.2 444.4 4.8X +SQL ORC Vectorized 2378 / 2379 4.4 226.8 9.5X +SQL ORC Vectorized with copy 2548 / 2571 4.1 243.0 8.8X +SQL ORC MR 4206 / 4211 2.5 401.1 5.3X ================================================================================================ Repeated String Scan ================================================================================================ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz - +OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Repeated String: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 10550 / 10706 1.0 1006.1 1.0X -SQL Json 5747 / 5751 1.8 548.1 1.8X -SQL Parquet Vectorized 651 / 671 16.1 62.1 16.2X -SQL Parquet MR 1417 / 1445 7.4 135.2 7.4X -SQL ORC Vectorized 406 / 423 25.9 38.7 26.0X -SQL ORC Vectorized with copy 650 / 677 16.1 62.0 16.2X -SQL ORC MR 1705 / 1764 6.2 162.6 6.2X +SQL CSV 12150 / 12178 0.9 1158.7 1.0X +SQL Json 7012 / 7014 1.5 668.7 1.7X +SQL Parquet Vectorized 792 / 796 13.2 75.5 15.3X +SQL Parquet MR 1961 / 1975 5.3 187.0 6.2X +SQL ORC Vectorized 482 / 485 21.8 46.0 25.2X +SQL ORC Vectorized with copy 710 / 715 14.8 67.7 17.1X +SQL ORC MR 2081 / 2083 5.0 198.5 5.8X ================================================================================================ Partitioned Table Scan ================================================================================================ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz - +OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Partitioned Table: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Data column - CSV 24900 / 25018 0.6 1583.1 1.0X -Data column - Json 10203 / 10224 1.5 648.7 2.4X -Data column - Parquet Vectorized 248 / 270 63.5 15.7 100.5X -Data column - Parquet MR 2700 / 2774 5.8 171.6 9.2X -Data column - ORC Vectorized 314 / 377 50.1 20.0 79.3X -Data column - ORC Vectorized with copy 348 / 350 45.3 22.1 71.6X -Data column - ORC MR 2149 / 2150 7.3 136.6 11.6X -Partition column - CSV 5350 / 5452 2.9 340.1 4.7X -Partition column - Json 4050 / 4096 3.9 257.5 6.1X -Partition column - Parquet Vectorized 98 / 100 159.8 6.3 252.9X -Partition column - Parquet MR 1395 / 1422 11.3 88.7 17.8X -Partition column - ORC Vectorized 96 / 105 163.2 6.1 258.3X -Partition column - ORC Vectorized with copy 97 / 105 161.6 6.2 255.9X -Partition column - ORC MR 1393 / 1400 11.3 88.6 17.9X -Both columns - CSV 23599 / 23897 0.7 1500.4 1.1X -Both columns - Json 10743 / 10794 1.5 683.0 2.3X -Both columns - Parquet Vectorized 252 / 268 62.5 16.0 98.9X -Both columns - Parquet MR 2981 / 3007 5.3 189.5 8.4X -Both columns - ORC Vectorized 337 / 353 46.7 21.4 74.0X -Both column - ORC Vectorized with copy 385 / 394 40.9 24.5 64.7X -Both columns - ORC MR 2163 / 2241 7.3 137.5 11.5X +Data column - CSV 31789 / 31791 0.5 2021.1 1.0X +Data column - Json 12873 / 12918 1.2 818.4 2.5X +Data column - Parquet Vectorized 267 / 280 58.9 17.0 119.1X +Data column - Parquet MR 3387 / 3402 4.6 215.3 9.4X +Data column - ORC Vectorized 391 / 453 40.2 24.9 81.2X +Data column - ORC Vectorized with copy 392 / 398 40.2 24.9 81.2X +Data column - ORC MR 2508 / 2512 6.3 159.4 12.7X +Partition column - CSV 6965 / 6977 2.3 442.8 4.6X +Partition column - Json 5563 / 5576 2.8 353.7 5.7X +Partition column - Parquet Vectorized 65 / 78 241.1 4.1 487.2X +Partition column - Parquet MR 1811 / 1811 8.7 115.1 17.6X +Partition column - ORC Vectorized 66 / 73 239.0 4.2 483.0X +Partition column - ORC Vectorized with copy 65 / 70 241.1 4.1 487.3X +Partition column - ORC MR 1775 / 1778 8.9 112.8 17.9X +Both columns - CSV 30032 / 30113 0.5 1909.4 1.1X +Both columns - Json 13941 / 13959 1.1 886.3 2.3X +Both columns - Parquet Vectorized 312 / 330 50.3 19.9 101.7X +Both columns - Parquet MR 3858 / 3862 4.1 245.3 8.2X +Both columns - ORC Vectorized 431 / 437 36.5 27.4 73.8X +Both column - ORC Vectorized with copy 523 / 529 30.1 33.3 60.7X +Both columns - ORC MR 2712 / 2805 5.8 172.4 11.7X ================================================================================================ String with Nulls Scan ================================================================================================ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz - +OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz String with Nulls Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 13422 / 13552 0.8 1280.0 1.0X -SQL Json 8135 / 8330 1.3 775.8 1.6X -SQL Parquet Vectorized 1253 / 1310 8.4 119.5 10.7X -SQL Parquet MR 3163 / 3230 3.3 301.7 4.2X -ParquetReader Vectorized 851 / 931 12.3 81.1 15.8X -SQL ORC Vectorized 880 / 1005 11.9 83.9 15.3X -SQL ORC Vectorized with copy 1670 / 1718 6.3 159.3 8.0X -SQL ORC MR 3348 / 3384 3.1 319.3 4.0X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz - +SQL CSV 13525 / 13823 0.8 1289.9 1.0X +SQL Json 9913 / 9921 1.1 945.3 1.4X +SQL Parquet Vectorized 1517 / 1517 6.9 144.7 8.9X +SQL Parquet MR 3996 / 4008 2.6 381.1 3.4X +ParquetReader Vectorized 1120 / 1128 9.4 106.8 12.1X +SQL ORC Vectorized 1203 / 1224 8.7 114.7 11.2X +SQL ORC Vectorized with copy 1639 / 1646 6.4 156.3 8.3X +SQL ORC MR 3720 / 3780 2.8 354.7 3.6X + +OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz String with Nulls Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 10922 / 10952 1.0 1041.6 1.0X -SQL Json 6010 / 6039 1.7 573.2 1.8X -SQL Parquet Vectorized 903 / 1022 11.6 86.1 12.1X -SQL Parquet MR 2458 / 2479 4.3 234.4 4.4X -ParquetReader Vectorized 773 / 822 13.6 73.8 14.1X -SQL ORC Vectorized 1123 / 1129 9.3 107.1 9.7X -SQL ORC Vectorized with copy 1449 / 1461 7.2 138.2 7.5X -SQL ORC MR 2737 / 2810 3.8 261.0 4.0X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz - +SQL CSV 15860 / 15877 0.7 1512.5 1.0X +SQL Json 7676 / 7688 1.4 732.0 2.1X +SQL Parquet Vectorized 1072 / 1084 9.8 102.2 14.8X +SQL Parquet MR 2890 / 2897 3.6 275.6 5.5X +ParquetReader Vectorized 1052 / 1053 10.0 100.4 15.1X +SQL ORC Vectorized 1248 / 1248 8.4 119.0 12.7X +SQL ORC Vectorized with copy 1627 / 1637 6.4 155.2 9.7X +SQL ORC MR 3365 / 3369 3.1 320.9 4.7X + +OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz String with Nulls Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 9403 / 9438 1.1 896.7 1.0X -SQL Json 3809 / 3813 2.8 363.3 2.5X -SQL Parquet Vectorized 181 / 190 57.9 17.3 51.9X -SQL Parquet MR 1442 / 1446 7.3 137.5 6.5X -ParquetReader Vectorized 176 / 187 59.5 16.8 53.3X -SQL ORC Vectorized 342 / 352 30.6 32.6 27.5X -SQL ORC Vectorized with copy 425 / 465 24.7 40.5 22.1X -SQL ORC MR 1382 / 1388 7.6 131.8 6.8X +SQL CSV 13401 / 13561 0.8 1278.1 1.0X +SQL Json 5253 / 5303 2.0 500.9 2.6X +SQL Parquet Vectorized 233 / 242 45.0 22.2 57.6X +SQL Parquet MR 1791 / 1796 5.9 170.8 7.5X +ParquetReader Vectorized 236 / 238 44.4 22.5 56.7X +SQL ORC Vectorized 453 / 473 23.2 43.2 29.6X +SQL ORC Vectorized with copy 573 / 577 18.3 54.7 23.4X +SQL ORC MR 1846 / 1850 5.7 176.0 7.3X ================================================================================================ Single Column Scan From Wide Columns ================================================================================================ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz - +OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Single Column Scan from 10 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 2544 / 2572 0.4 2426.5 1.0X -SQL Json 2015 / 2018 0.5 1921.7 1.3X -SQL Parquet Vectorized 48 / 57 21.8 45.9 52.9X -SQL Parquet MR 180 / 198 5.8 171.4 14.2X -SQL ORC Vectorized 55 / 66 18.9 52.9 45.9X -SQL ORC Vectorized with copy 56 / 67 18.6 53.8 45.1X -SQL ORC MR 262 / 319 4.0 250.2 9.7X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz - +SQL CSV 3147 / 3148 0.3 3001.1 1.0X +SQL Json 2666 / 2693 0.4 2542.9 1.2X +SQL Parquet Vectorized 54 / 58 19.5 51.3 58.5X +SQL Parquet MR 220 / 353 4.8 209.9 14.3X +SQL ORC Vectorized 63 / 77 16.8 59.7 50.3X +SQL ORC Vectorized with copy 63 / 66 16.7 59.8 50.2X +SQL ORC MR 317 / 321 3.3 302.2 9.9X + +OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Single Column Scan from 50 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 5721 / 5724 0.2 5456.2 1.0X -SQL Json 7332 / 7334 0.1 6992.6 0.8X -SQL Parquet Vectorized 64 / 74 16.4 60.8 89.8X -SQL Parquet MR 200 / 204 5.2 190.6 28.6X -SQL ORC Vectorized 73 / 83 14.4 69.4 78.7X -SQL ORC Vectorized with copy 72 / 91 14.6 68.7 79.4X -SQL ORC MR 930 / 962 1.1 887.0 6.2X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz - +SQL CSV 7902 / 7921 0.1 7536.2 1.0X +SQL Json 9467 / 9491 0.1 9028.6 0.8X +SQL Parquet Vectorized 73 / 79 14.3 69.8 108.0X +SQL Parquet MR 239 / 247 4.4 228.0 33.1X +SQL ORC Vectorized 78 / 84 13.4 74.6 101.0X +SQL ORC Vectorized with copy 78 / 88 13.4 74.4 101.3X +SQL ORC MR 910 / 918 1.2 867.6 8.7X + +OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Single Column Scan from 100 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 9475 / 9485 0.1 9035.7 1.0X -SQL Json 13623 / 13695 0.1 12991.8 0.7X -SQL Parquet Vectorized 94 / 100 11.2 89.4 101.0X -SQL Parquet MR 226 / 234 4.6 215.3 42.0X -SQL ORC Vectorized 96 / 103 10.9 91.9 98.3X -SQL ORC Vectorized with copy 98 / 122 10.7 93.5 96.6X -SQL ORC MR 1409 / 1467 0.7 1343.3 6.7X +SQL CSV 13539 / 13543 0.1 12912.0 1.0X +SQL Json 17420 / 17446 0.1 16613.1 0.8X +SQL Parquet Vectorized 103 / 120 10.2 98.1 131.6X +SQL Parquet MR 250 / 258 4.2 238.9 54.1X +SQL ORC Vectorized 99 / 104 10.6 94.6 136.5X +SQL ORC Vectorized with copy 100 / 106 10.5 95.6 135.1X +SQL ORC MR 1653 / 1659 0.6 1576.3 8.2X From 0b6f11729340a7ddb4fa13382f9e7d74456a0f7a Mon Sep 17 00:00:00 2001 From: Peter Toth Date: Thu, 11 Oct 2018 08:29:56 +0200 Subject: [PATCH 5/5] [SPARK-25662][TEST] fix scalastyle Change-Id: If4fcfc27eb808c08246a8f7779fbe38a437a41a4 --- .../sql/execution/benchmark/DataSourceReadBenchmark.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala index e1d21940617e..a1e7f9e36f4b 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala @@ -36,7 +36,8 @@ import org.apache.spark.sql.vectorized.ColumnVector * Benchmark to measure data source read performance. * To run this benchmark: * {{{ - * 1. without sbt: bin/spark-submit --class --jars , + * 1. without sbt: bin/spark-submit --class + * --jars , * 2. build/sbt "sql/test:runMain " * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain " * Results will be written to "benchmarks/DataSourceReadBenchmark-results.txt".