diff --git a/README.md b/README.md index f128d17d3..b166f41d1 100644 --- a/README.md +++ b/README.md @@ -43,16 +43,16 @@ speedup compared to Spark. Comet is not yet achieving full DataFusion speeds in all cases, but with future work we aim to provide a 2x-4x speedup for a broader set of queries. -![](docs/source/_static/images/tpch_allqueries.png) +![](docs/source/_static/images/benchmark-results/2024-07-19/tpch_allqueries.png) Here is a breakdown showing relative performance of Spark, Comet, and DataFusion for each TPC-H query. -![](docs/source/_static/images/tpch_queries_compare.png) +![](docs/source/_static/images/benchmark-results/2024-07-19/tpch_queries_compare.png) The following chart shows how much Comet currently accelerates each query from the benchmark. Performance optimization is an ongoing task, and we welcome contributions from the community to help achieve even greater speedups in the future. -![](docs/source/_static/images/tpch_queries_speedup.png) +![](docs/source/_static/images/benchmark-results/2024-07-19/tpch_queries_speedup.png) These benchmarks can be reproduced in any environment using the documentation in the [Comet Benchmarking Guide](https://datafusion.apache.org/comet/contributor-guide/benchmarking.html). We encourage diff --git a/docs/source/_static/images/benchmark-results/2024-07-19/tpcds_allqueries.png b/docs/source/_static/images/benchmark-results/2024-07-19/tpcds_allqueries.png new file mode 100644 index 000000000..3a7017a09 Binary files /dev/null and b/docs/source/_static/images/benchmark-results/2024-07-19/tpcds_allqueries.png differ diff --git a/docs/source/_static/images/benchmark-results/2024-07-19/tpcds_queries_compare.png b/docs/source/_static/images/benchmark-results/2024-07-19/tpcds_queries_compare.png new file mode 100644 index 000000000..f49ef68bd Binary files /dev/null and b/docs/source/_static/images/benchmark-results/2024-07-19/tpcds_queries_compare.png differ diff --git a/docs/source/_static/images/benchmark-results/2024-07-19/tpcds_queries_speedup.png b/docs/source/_static/images/benchmark-results/2024-07-19/tpcds_queries_speedup.png new file mode 100644 index 000000000..4a6042737 Binary files /dev/null and b/docs/source/_static/images/benchmark-results/2024-07-19/tpcds_queries_speedup.png differ diff --git a/docs/source/_static/images/benchmark-results/2024-07-19/tpch_allqueries.png b/docs/source/_static/images/benchmark-results/2024-07-19/tpch_allqueries.png new file mode 100644 index 000000000..dccee77b8 Binary files /dev/null and b/docs/source/_static/images/benchmark-results/2024-07-19/tpch_allqueries.png differ diff --git a/docs/source/_static/images/benchmark-results/2024-07-19/tpch_queries_compare.png b/docs/source/_static/images/benchmark-results/2024-07-19/tpch_queries_compare.png new file mode 100644 index 000000000..ce0a5870e Binary files /dev/null and b/docs/source/_static/images/benchmark-results/2024-07-19/tpch_queries_compare.png differ diff --git a/docs/source/_static/images/benchmark-results/2024-07-19/tpch_queries_speedup.png b/docs/source/_static/images/benchmark-results/2024-07-19/tpch_queries_speedup.png new file mode 100644 index 000000000..57a102faf Binary files /dev/null and b/docs/source/_static/images/benchmark-results/2024-07-19/tpch_queries_speedup.png differ diff --git a/docs/source/_static/images/tpch_allqueries.png b/docs/source/_static/images/tpch_allqueries.png deleted file mode 100644 index c6fa4e065..000000000 Binary files a/docs/source/_static/images/tpch_allqueries.png and /dev/null differ diff --git a/docs/source/_static/images/tpch_queries_compare.png b/docs/source/_static/images/tpch_queries_compare.png deleted file mode 100644 index a74c1acca..000000000 Binary files a/docs/source/_static/images/tpch_queries_compare.png and /dev/null differ diff --git a/docs/source/_static/images/tpch_queries_speedup.png b/docs/source/_static/images/tpch_queries_speedup.png deleted file mode 100644 index 69c29ac9d..000000000 Binary files a/docs/source/_static/images/tpch_queries_speedup.png and /dev/null differ diff --git a/docs/source/contributor-guide/benchmark-results/2024-06-29/spark-8-exec-5-runs.json b/docs/source/contributor-guide/benchmark-results/2024-06-29/spark-8-exec-5-runs.json deleted file mode 100644 index 012b05c3a..000000000 --- a/docs/source/contributor-guide/benchmark-results/2024-06-29/spark-8-exec-5-runs.json +++ /dev/null @@ -1,184 +0,0 @@ -{ - "engine": "datafusion-comet", - "benchmark": "tpch", - "data_path": "/mnt/bigdata/tpch/sf100/", - "query_path": "../../tpch/queries", - "spark_conf": { - "spark.driver.extraJavaOptions": "-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false", - "spark.sql.warehouse.dir": "file:/home/andy/git/apache/datafusion-benchmarks/runners/datafusion-comet/spark-warehouse", - "spark.app.id": "app-20240528090804-0041", - "spark.app.submitTime": "1716908883258", - "spark.executor.memory": "8G", - "spark.master": "spark://woody:7077", - "spark.executor.id": "driver", - "spark.executor.instances": "8", - "spark.app.name": "DataFusion Comet Benchmark derived from TPC-H / TPC-DS", - "spark.driver.memory": "8G", - "spark.rdd.compress": "True", - "spark.executor.extraJavaOptions": "-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false", - "spark.serializer.objectStreamReset": "100", - "spark.cores.max": "8", - "spark.submit.pyFiles": "", - "spark.executor.cores": "1", - "spark.submit.deployMode": "client", - "spark.sql.autoBroadcastJoinThreshold": "-1", - "spark.eventLog.enabled": "false", - "spark.app.startTime": "1716908883579", - "spark.driver.port": "33725", - "spark.driver.host": "woody.lan" - }, - "1": [ - 76.91316103935242, - 79.55859923362732, - 81.10397529602051, - 79.01998662948608, - 79.1286551952362 - ], - "2": [ - 23.977370262145996, - 22.214473247528076, - 22.686659812927246, - 22.016682386398315, - 21.766324520111084 - ], - "3": [ - 22.700742721557617, - 21.980144739151, - 21.876065969467163, - 21.661516189575195, - 21.69345998764038 - ], - "4": [ - 17.377647638320923, - 16.249598264694214, - 16.15747308731079, - 16.128843069076538, - 16.04338026046753 - ], - "5": [ - 44.38863182067871, - 45.47764492034912, - 45.76063895225525, - 45.16393995285034, - 60.848369121551514 - ], - "6": [ - 3.2041075229644775, - 2.970944881439209, - 2.891291856765747, - 2.9719409942626953, - 3.0702600479125977 - ], - "7": [ - 24.369274377822876, - 24.684266567230225, - 24.146574020385742, - 24.023175716400146, - 30.56047773361206 - ], - "8": [ - 46.46081209182739, - 45.9838604927063, - 46.341185092926025, - 45.833823919296265, - 46.61182403564453 - ], - "9": [ - 67.67960548400879, - 67.34667444229126, - 70.34601259231567, - 71.24095153808594, - 84.38811421394348 - ], - "10": [ - 19.16477870941162, - 19.081010580062866, - 19.501060009002686, - 19.165698528289795, - 20.216782331466675 - ], - "11": [ - 17.158706426620483, - 17.05184030532837, - 17.714542150497437, - 17.004602909088135, - 17.700096130371094 - ], - "12": [ - 11.654477834701538, - 11.805298805236816, - 11.822469234466553, - 12.79678750038147, - 13.64478850364685 - ], - "13": [ - 20.430822372436523, - 20.18759250640869, - 21.26596975326538, - 21.234288454055786, - 20.189200162887573 - ], - "14": [ - 5.60215950012207, - 5.160705089569092, - 5.080057382583618, - 4.937625408172607, - 5.853632688522339 - ], - "15": [ - 14.17775845527649, - 13.898571729660034, - 14.215840578079224, - 14.316090106964111, - 14.356236457824707 - ], - "16": [ - 6.252386808395386, - 6.010213375091553, - 6.054978370666504, - 5.886059522628784, - 5.923115253448486 - ], - "17": [ - 71.41593313217163, - 70.25399804115295, - 72.07622528076172, - 72.27566242218018, - 72.20579051971436 - ], - "18": [ - 65.72738265991211, - 65.47461080551147, - 67.14260482788086, - 65.95489883422852, - 69.51795554161072 - ], - "19": [ - 7.1520891189575195, - 6.516514301300049, - 6.580992698669434, - 6.486274242401123, - 6.418147087097168 - ], - "20": [ - 12.619760036468506, - 12.235978126525879, - 12.116347551345825, - 12.161245584487915, - 12.30910348892212 - ], - "21": [ - 60.795483350753784, - 60.484593629837036, - 61.27316427230835, - 60.475560426712036, - 81.21473670005798 - ], - "22": [ - 8.926804065704346, - 8.113754034042358, - 8.029133796691895, - 7.99291467666626, - 8.439452648162842 - ] -} \ No newline at end of file diff --git a/docs/source/contributor-guide/benchmark-results/2024-07-19/comet-tpcds.json b/docs/source/contributor-guide/benchmark-results/2024-07-19/comet-tpcds.json new file mode 100644 index 000000000..a4bab0a94 --- /dev/null +++ b/docs/source/contributor-guide/benchmark-results/2024-07-19/comet-tpcds.json @@ -0,0 +1,342 @@ +{ + "engine": "datafusion-comet", + "benchmark": "tpcds", + "data_path": "/mnt/bigdata/tpcds/sf100/", + "query_path": "../../tpcds/queries-spark", + "spark_conf": { + "spark.eventLog.enabled": "true", + "spark.jars": "file:///home/andy/git/apache/datafusion-comet/spark/target/comet-spark-spark3.4_2.12-0.1.0-SNAPSHOT.jar", + "spark.comet.cast.allowIncompatible": "true", + "spark.app.initial.jar.urls": "spark://woody.lan:41193/jars/comet-spark-spark3.4_2.12-0.1.0-SNAPSHOT.jar", + "spark.app.id": "app-20240718163301-0003", + "spark.executor.extraClassPath": "/home/andy/git/apache/datafusion-comet/spark/target/comet-spark-spark3.4_2.12-0.1.0-SNAPSHOT.jar", + "spark.comet.exec.shuffle.enabled": "true", + "spark.app.name": "DataFusion Comet Benchmark derived from TPC-H / TPC-DS", + "spark.serializer.objectStreamReset": "100", + "spark.submit.deployMode": "client", + "spark.sql.autoBroadcastJoinThreshold": "-1", + "spark.comet.exec.all.enabled": "true", + "spark.executor.cores": "8", + "spark.driver.port": "41193", + "spark.driver.host": "woody.lan", + "spark.driver.extraJavaOptions": "-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false", + "spark.comet.shuffle.enforceMode.enabled": "true", + "spark.sql.warehouse.dir": "file:/home/andy/git/apache/datafusion-benchmarks/runners/datafusion-comet/spark-warehouse", + "spark.shuffle.manager": "org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager", + "spark.comet.exec.enabled": "true", + "spark.app.submitTime": "1721341980775", + "spark.repl.local.jars": "file:///home/andy/git/apache/datafusion-comet/spark/target/comet-spark-spark3.4_2.12-0.1.0-SNAPSHOT.jar", + "spark.executor.id": "driver", + "spark.master": "spark://woody:7077", + "spark.app.startTime": "1721341981073", + "spark.comet.exec.shuffle.mode": "auto", + "spark.sql.extensions": "org.apache.comet.CometSparkSessionExtensions", + "spark.driver.memory": "8G", + "spark.driver.extraClassPath": "/home/andy/git/apache/datafusion-comet/spark/target/comet-spark-spark3.4_2.12-0.1.0-SNAPSHOT.jar", + "spark.sql.adaptive.coalescePartitions.enabled": "true", + "spark.executor.memory": "32G", + "spark.rdd.compress": "True", + "spark.executor.extraJavaOptions": "-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false", + "spark.executor.instances": "1", + "spark.cores.max": "8", + "spark.comet.enabled": "true", + "spark.submit.pyFiles": "" + }, + "1": [ + 5.936143159866333 + ], + "2": [ + 8.26771068572998 + ], + "3": [ + 3.143160104751587 + ], + "4": [ + 99.57721495628357 + ], + "5": [ + 20.31959581375122 + ], + "6": [ + 2.659390687942505 + ], + "7": [ + 6.321574449539185 + ], + "8": [ + 3.3775765895843506 + ], + "9": [ + 10.297355890274048 + ], + "10": [ + 5.794115304946899 + ], + "11": [ + 41.01704978942871 + ], + "12": [ + 6.774289131164551 + ], + "13": [ + 8.915542364120483 + ], + "14": [ + 86.61133527755737 + ], + "15": [ + 13.07849407196045 + ], + "16": [ + 19.245990991592407 + ], + "17": [ + 5.388644218444824 + ], + "18": [ + 13.891231536865234 + ], + "19": [ + 3.869602680206299 + ], + "20": [ + 11.179648160934448 + ], + "21": [ + 26.43477487564087 + ], + "22": [ + 12.071606397628784 + ], + "23": [ + 217.38866710662842 + ], + "24": [ + 22.737252235412598 + ], + "25": [ + 6.0180275440216064 + ], + "26": [ + 10.12047791481018 + ], + "27": [ + 5.269314527511597 + ], + "28": [ + 13.336988687515259 + ], + "29": [ + 9.294469594955444 + ], + "30": [ + 2.487182378768921 + ], + "31": [ + 12.829366683959961 + ], + "32": [ + 8.359683275222778 + ], + "33": [ + 9.587389707565308 + ], + "34": [ + 4.090600252151489 + ], + "35": [ + 7.087569713592529 + ], + "36": [ + 5.502429962158203 + ], + "37": [ + 5.975481033325195 + ], + "38": [ + 8.321046113967896 + ], + "39": [ + 86.69561338424683 + ], + "40": [ + 18.024808168411255 + ], + "41": [ + 0.14025306701660156 + ], + "42": [ + 2.7543857097625732 + ], + "43": [ + 5.10432767868042 + ], + "44": [ + 3.023483991622925 + ], + "45": [ + 8.575555086135864 + ], + "46": [ + 6.691994905471802 + ], + "47": [ + 14.728558540344238 + ], + "48": [ + 6.201830148696899 + ], + "49": [ + 11.18767786026001 + ], + "50": [ + 10.928320407867432 + ], + "51": [ + 16.93299388885498 + ], + "52": [ + 2.7706453800201416 + ], + "53": [ + 3.6064443588256836 + ], + "54": [ + 8.893964052200317 + ], + "55": [ + 2.645658254623413 + ], + "56": [ + 9.719430446624756 + ], + "57": [ + 17.113372802734375 + ], + "58": [ + 30.33391761779785 + ], + "59": [ + 14.553935050964355 + ], + "60": [ + 10.117067813873291 + ], + "61": [ + 9.30464506149292 + ], + "62": [ + 11.765451192855835 + ], + "63": [ + 3.6449074745178223 + ], + "64": [ + 23.456274271011353 + ], + "65": [ + 18.001898050308228 + ], + "66": [ + 37.000144958496094 + ], + "67": [ + 46.503966331481934 + ], + "68": [ + 7.1533238887786865 + ], + "69": [ + 5.305586814880371 + ], + "70": [ + 12.200140714645386 + ], + "71": [ + 6.394218683242798 + ], + "72": [ + 51.827319860458374 + ], + "73": [ + 4.081210136413574 + ], + "74": [ + 27.13532018661499 + ], + "75": [ + 26.004110097885132 + ], + "76": [ + 4.606779336929321 + ], + "77": [ + 13.787659645080566 + ], + "78": [ + 41.01244926452637 + ], + "79": [ + 6.395780801773071 + ], + "80": [ + 36.78436040878296 + ], + "81": [ + 2.570908784866333 + ], + "82": [ + 8.755934238433838 + ], + "83": [ + 2.8941328525543213 + ], + "84": [ + 1.1297607421875 + ], + "85": [ + 2.922498941421509 + ], + "86": [ + 3.0235934257507324 + ], + "87": [ + 8.466246366500854 + ], + "88": [ + 21.329773664474487 + ], + "89": [ + 3.9919168949127197 + ], + "90": [ + 2.072303295135498 + ], + "91": [ + 2.06138014793396 + ], + "92": [ + 5.129335880279541 + ], + "93": [ + 15.211457967758179 + ], + "94": [ + 10.024892091751099 + ], + "95": [ + 34.040268898010254 + ], + "96": [ + 2.6338326930999756 + ], + "97": [ + 12.165382146835327 + ], + "98": [ + 3.7391271591186523 + ], + "99": [ + 20.151211977005005 + ] +} \ No newline at end of file diff --git a/docs/source/contributor-guide/benchmark-results/2024-06-29/comet-8-exec-5-runs.json b/docs/source/contributor-guide/benchmark-results/2024-07-19/comet-tpch.json similarity index 53% rename from docs/source/contributor-guide/benchmark-results/2024-06-29/comet-8-exec-5-runs.json rename to docs/source/contributor-guide/benchmark-results/2024-07-19/comet-tpch.json index f139578eb..3f8ace8ec 100644 --- a/docs/source/contributor-guide/benchmark-results/2024-06-29/comet-8-exec-5-runs.json +++ b/docs/source/contributor-guide/benchmark-results/2024-07-19/comet-tpch.json @@ -4,199 +4,150 @@ "data_path": "/mnt/bigdata/tpch/sf100/", "query_path": "../../tpch/queries", "spark_conf": { - "spark.comet.explainFallback.enabled": "true", "spark.eventLog.enabled": "true", "spark.jars": "file:///home/andy/git/apache/datafusion-comet/spark/target/comet-spark-spark3.4_2.12-0.1.0-SNAPSHOT.jar", "spark.comet.cast.allowIncompatible": "true", - "spark.app.startTime": "1719691158901", + "spark.app.startTime": "1721402269156", + "spark.app.id": "app-20240719151749-0002", "spark.executor.extraClassPath": "/home/andy/git/apache/datafusion-comet/spark/target/comet-spark-spark3.4_2.12-0.1.0-SNAPSHOT.jar", "spark.comet.exec.shuffle.enabled": "true", + "spark.driver.port": "38495", "spark.app.name": "DataFusion Comet Benchmark derived from TPC-H / TPC-DS", - "spark.app.id": "app-20240629135919-0008", - "spark.comet.batchSize": "8192", + "spark.app.submitTime": "1721402268875", "spark.serializer.objectStreamReset": "100", - "spark.driver.host": "10.0.0.118", "spark.submit.deployMode": "client", - "spark.sql.autoBroadcastJoinThreshold": "-1", "spark.comet.exec.all.enabled": "true", "spark.executor.cores": "8", + "spark.driver.host": "woody.lan", "spark.driver.extraJavaOptions": "-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false", "spark.comet.shuffle.enforceMode.enabled": "true", "spark.sql.warehouse.dir": "file:/home/andy/git/apache/datafusion-benchmarks/runners/datafusion-comet/spark-warehouse", "spark.shuffle.manager": "org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager", - "spark.app.submitTime": "1719691158623", - "spark.repl.local.jars": "file:///home/andy/git/apache/datafusion-comet/spark/target/comet-spark-spark3.4_2.12-0.1.0-SNAPSHOT.jar", "spark.comet.exec.enabled": "true", + "spark.repl.local.jars": "file:///home/andy/git/apache/datafusion-comet/spark/target/comet-spark-spark3.4_2.12-0.1.0-SNAPSHOT.jar", "spark.executor.id": "driver", + "spark.app.initial.jar.urls": "spark://woody.lan:38495/jars/comet-spark-spark3.4_2.12-0.1.0-SNAPSHOT.jar", "spark.master": "spark://woody:7077", "spark.comet.exec.shuffle.mode": "auto", - "spark.driver.port": "34629", "spark.sql.extensions": "org.apache.comet.CometSparkSessionExtensions", "spark.driver.memory": "8G", "spark.driver.extraClassPath": "/home/andy/git/apache/datafusion-comet/spark/target/comet-spark-spark3.4_2.12-0.1.0-SNAPSHOT.jar", - "spark.sql.adaptive.coalescePartitions.enabled": "true", "spark.executor.memory": "32G", "spark.rdd.compress": "True", "spark.executor.extraJavaOptions": "-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false", "spark.executor.instances": "1", "spark.cores.max": "8", "spark.comet.enabled": "true", - "spark.submit.pyFiles": "", - "spark.app.initial.jar.urls": "spark://10.0.0.118:34629/jars/comet-spark-spark3.4_2.12-0.1.0-SNAPSHOT.jar", - "spark.comet.cbo.enabled": "false" + "spark.submit.pyFiles": "" }, "1": [ - 28.735982179641724, - 27.904003858566284, - 27.98918342590332, - 27.998026847839355, - 27.7985897064209 + 28.722949028015137, + 27.5392906665802, + 27.496358633041382 ], "2": [ - 15.840301513671875, - 15.137918710708618, - 15.086657047271729, - 15.252221584320068, - 15.093742370605469 + 9.163376569747925, + 8.315577268600464, + 8.380951404571533 ], "3": [ - 18.124080181121826, - 18.498253345489502, - 18.420130252838135, - 18.309802055358887, - 18.46897006034851 + 17.556873083114624, + 17.497890949249268, + 17.48395323753357 ], "4": [ - 9.55617070198059, - 9.518851518630981, - 9.514896392822266, - 9.583910465240479, - 9.444581985473633 + 9.459511756896973, + 9.35229229927063, + 9.526184320449829 ], "5": [ - 33.23771286010742, - 33.053247690200806, - 32.84638738632202, - 32.790276765823364, - 32.90981197357178 + 32.742212772369385, + 32.53394556045532, + 32.72231674194336 ], "6": [ - 3.34500789642334, - 2.9966821670532227, - 3.0137181282043457, - 2.9657068252563477, - 2.919524908065796 + 3.11678147315979, + 2.9885473251342773, + 3.089282274246216 ], "7": [ - 20.84096646308899, - 20.373249053955078, - 20.337918519973755, - 20.32623314857483, - 20.321190357208252 + 19.92447853088379, + 19.889236450195312, + 20.12239646911621 ], "8": [ - 36.99943470954895, - 36.097434520721436, - 36.08603119850159, - 36.26709461212158, - 36.22776746749878 + 36.376123666763306, + 36.56628465652466, + 36.08057117462158 ], "9": [ - 58.00954031944275, - 56.75375247001648, - 57.23253607749939, - 57.04572892189026, - 57.06179666519165 + 55.983471155166626, + 55.55835008621216, + 55.05353879928589 ], "10": [ - 19.51328682899475, - 19.17092227935791, - 19.110991716384888, - 19.05888819694519, - 19.292072534561157 + 18.320690870285034, + 17.953410148620605, + 17.839956283569336 ], "11": [ - 12.222111463546753, - 12.186187267303467, - 12.177972316741943, - 12.100908517837524, - 12.061741828918457 + 7.945342302322388, + 7.947471618652344, + 7.870803356170654 ], "12": [ - 7.657347679138184, - 7.598176002502441, - 7.568347930908203, - 7.4833292961120605, - 7.551736116409302 + 7.797280550003052, + 7.611949920654297, + 7.625390291213989 ], "13": [ - 9.64631199836731, - 9.536576509475708, - 9.564186096191406, - 9.570204496383667, - 9.662892580032349 + 11.711121082305908, + 11.817307949066162, + 11.80425763130188 ], "14": [ - 6.022975921630859, - 5.84771203994751, - 6.049532175064087, - 5.998222827911377, - 5.899066925048828 + 6.106103897094727, + 5.755920171737671, + 5.686336040496826 ], "15": [ - 10.946545600891113, - 10.68128228187561, - 10.473867416381836, - 10.72830843925476, - 10.45834231376648 + 11.505656242370605, + 11.023053884506226, + 10.784588813781738 ], "16": [ - 7.951048851013184, - 6.773421049118042, - 6.630566120147705, - 6.826274633407593, - 6.515024185180664 + 8.015074729919434, + 6.408633232116699, + 6.315148591995239 ], "17": [ - 46.03706979751587, - 42.801599740982056, - 42.59856081008911, - 42.84500861167908, - 42.899412870407104 + 34.54281425476074, + 35.28704261779785, + 34.809094190597534 ], "18": [ - 34.244925022125244, - 31.239882469177246, - 31.353251695632935, - 31.224499940872192, - 31.53875970840454 + 31.605079889297485, + 31.74153995513916, + 31.957621574401855 ], "19": [ - 7.07506251335144, - 6.813824892044067, - 6.79759407043457, - 6.941055059432983, - 6.83566427230835 + 6.488569974899292, + 6.363597869873047, + 6.369234323501587 ], "20": [ - 10.964829683303833, - 10.757019996643066, - 10.806366205215454, - 10.990953922271729, - 10.887315273284912 + 8.480429410934448, + 8.437273502349854, + 8.329411745071411 ], "21": [ - 44.07762622833252, - 44.03535461425781, - 43.978052377700806, - 43.928617000579834, - 43.93204379081726 + 41.91543126106262, + 41.532346963882446, + 42.07075548171997 ], "22": [ - 4.871158599853516, - 4.810696601867676, - 4.873842239379883, - 4.817774534225464, - 4.927582740783691 + 4.895018100738525, + 4.865969657897949, + 4.903195858001709 ] } \ No newline at end of file diff --git a/docs/source/contributor-guide/benchmark-results/2024-06-29/datafusion-python-8-cores.json b/docs/source/contributor-guide/benchmark-results/2024-07-19/datafusion-tpch.json similarity index 100% rename from docs/source/contributor-guide/benchmark-results/2024-06-29/datafusion-python-8-cores.json rename to docs/source/contributor-guide/benchmark-results/2024-07-19/datafusion-tpch.json diff --git a/docs/source/contributor-guide/benchmark-results/2024-07-19/spark-tpcds.json b/docs/source/contributor-guide/benchmark-results/2024-07-19/spark-tpcds.json new file mode 100644 index 000000000..f7ab268be --- /dev/null +++ b/docs/source/contributor-guide/benchmark-results/2024-07-19/spark-tpcds.json @@ -0,0 +1,327 @@ +{ + "engine": "datafusion-comet", + "benchmark": "tpcds", + "data_path": "/mnt/bigdata/tpcds/sf100/", + "query_path": "../../tpcds/queries-spark", + "spark_conf": { + "spark.eventLog.enabled": "true", + "spark.driver.extraJavaOptions": "-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false", + "spark.sql.warehouse.dir": "file:/home/andy/git/apache/datafusion-benchmarks/runners/datafusion-comet/spark-warehouse", + "spark.driver.port": "39959", + "spark.app.startTime": "1721343961227", + "spark.executor.id": "driver", + "spark.master": "spark://woody:7077", + "spark.app.name": "DataFusion Comet Benchmark derived from TPC-H / TPC-DS", + "spark.driver.memory": "8G", + "spark.app.submitTime": "1721343960912", + "spark.app.id": "app-20240718170601-0004", + "spark.executor.memory": "32G", + "spark.rdd.compress": "True", + "spark.executor.extraJavaOptions": "-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false", + "spark.serializer.objectStreamReset": "100", + "spark.executor.instances": "1", + "spark.cores.max": "8", + "spark.submit.pyFiles": "", + "spark.submit.deployMode": "client", + "spark.sql.autoBroadcastJoinThreshold": "-1", + "spark.executor.cores": "8", + "spark.driver.host": "woody.lan" + }, + "1": [ + 7.800227403640747 + ], + "2": [ + 10.467795133590698 + ], + "3": [ + 2.8007187843322754 + ], + "4": [ + 91.76162981987 + ], + "5": [ + 23.099835872650146 + ], + "6": [ + 3.365507125854492 + ], + "7": [ + 5.432553291320801 + ], + "8": [ + 3.8576300144195557 + ], + "9": [ + 6.3153862953186035 + ], + "10": [ + 9.874432802200317 + ], + "11": [ + 33.87704801559448 + ], + "12": [ + 8.301852226257324 + ], + "13": [ + 5.269356966018677 + ], + "14": [ + 150.8186900615692 + ], + "15": [ + 15.294623136520386 + ], + "16": [ + 19.310945749282837 + ], + "17": [ + 6.411248207092285 + ], + "18": [ + 14.292846918106079 + ], + "19": [ + 3.2587430477142334 + ], + "20": [ + 13.411643266677856 + ], + "21": [ + 35.572487115859985 + ], + "22": [ + 22.02106499671936 + ], + "23": [ + 249.43153548240662 + ], + "24": [ + 21.229645490646362 + ], + "25": [ + 5.656827211380005 + ], + "26": [ + 11.043184041976929 + ], + "27": [ + 3.837435722351074 + ], + "28": [ + 10.440573692321777 + ], + "29": [ + 13.529588222503662 + ], + "30": [ + 2.973015308380127 + ], + "31": [ + 12.835312604904175 + ], + "32": [ + 12.979064464569092 + ], + "33": [ + 10.316180229187012 + ], + "34": [ + 3.0397789478302 + ], + "35": [ + 11.558360815048218 + ], + "36": [ + 4.719287872314453 + ], + "37": [ + 9.003387689590454 + ], + "38": [ + 17.37705898284912 + ], + "39": [ + 113.10317611694336 + ], + "40": [ + 25.56372880935669 + ], + "41": [ + 0.2145369052886963 + ], + "42": [ + 2.0899407863616943 + ], + "43": [ + 5.081417798995972 + ], + "44": [ + 1.0103275775909424 + ], + "45": [ + 11.752943992614746 + ], + "46": [ + 4.572746276855469 + ], + "47": [ + 17.91774606704712 + ], + "48": [ + 9.562321901321411 + ], + "49": [ + 10.740519046783447 + ], + "50": [ + 15.341339349746704 + ], + "51": [ + 23.100194454193115 + ], + "52": [ + 2.0887205600738525 + ], + "53": [ + 3.6534769535064697 + ], + "54": [ + 14.758203268051147 + ], + "55": [ + 1.812988042831421 + ], + "56": [ + 10.357929229736328 + ], + "57": [ + 21.42304801940918 + ], + "58": [ + 46.502179861068726 + ], + "59": [ + 15.086335182189941 + ], + "60": [ + 10.964382886886597 + ], + "61": [ + 6.710279703140259 + ], + "62": [ + 15.012513160705566 + ], + "63": [ + 3.904587984085083 + ], + "64": [ + 37.42494583129883 + ], + "65": [ + 22.118192434310913 + ], + "66": [ + 33.776007890701294 + ], + "67": [ + 83.05218243598938 + ], + "68": [ + 4.215189456939697 + ], + "69": [ + 9.373996257781982 + ], + "70": [ + 13.401167631149292 + ], + "71": [ + 5.064984560012817 + ], + "72": [ + 55.99225473403931 + ], + "73": [ + 2.3760015964508057 + ], + "74": [ + 30.810500144958496 + ], + "75": [ + 31.74701237678528 + ], + "76": [ + 4.408271789550781 + ], + "77": [ + 13.593222379684448 + ], + "78": [ + 55.4714629650116 + ], + "79": [ + 4.276215314865112 + ], + "80": [ + 51.73041367530823 + ], + "81": [ + 3.35856556892395 + ], + "82": [ + 15.728789806365967 + ], + "83": [ + 5.831252336502075 + ], + "84": [ + 2.2465932369232178 + ], + "85": [ + 4.967155694961548 + ], + "86": [ + 4.595327377319336 + ], + "87": [ + 15.662192344665527 + ], + "88": [ + 28.15371608734131 + ], + "89": [ + 5.115787744522095 + ], + "90": [ + 4.091581583023071 + ], + "91": [ + 3.3242697715759277 + ], + "92": [ + 7.603022813796997 + ], + "93": [ + 34.54590129852295 + ], + "94": [ + 10.030377388000488 + ], + "95": [ + 31.767929792404175 + ], + "96": [ + 1.7932329177856445 + ], + "97": [ + 19.545964002609253 + ], + "98": [ + 4.293541431427002 + ], + "99": [ + 26.280267000198364 + ] +} \ No newline at end of file diff --git a/docs/source/contributor-guide/benchmark-results/2024-07-19/spark-tpch.json b/docs/source/contributor-guide/benchmark-results/2024-07-19/spark-tpch.json new file mode 100644 index 000000000..cf50da5cf --- /dev/null +++ b/docs/source/contributor-guide/benchmark-results/2024-07-19/spark-tpch.json @@ -0,0 +1,138 @@ +{ + "engine": "datafusion-comet", + "benchmark": "tpch", + "data_path": "/mnt/bigdata/tpch/sf100/", + "query_path": "../../tpch/queries", + "spark_conf": { + "spark.driver.extraJavaOptions": "-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false", + "spark.sql.warehouse.dir": "file:/home/andy/git/apache/datafusion-benchmarks/runners/datafusion-comet/spark-warehouse", + "spark.executor.id": "driver", + "spark.master": "spark://woody:7077", + "spark.app.name": "DataFusion Comet Benchmark derived from TPC-H / TPC-DS", + "spark.driver.memory": "8G", + "spark.app.submitTime": "1721400283873", + "spark.executor.memory": "32G", + "spark.app.id": "app-20240719144444-0001", + "spark.rdd.compress": "True", + "spark.executor.extraJavaOptions": "-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false", + "spark.serializer.objectStreamReset": "100", + "spark.executor.instances": "1", + "spark.driver.port": "36055", + "spark.cores.max": "8", + "spark.submit.pyFiles": "", + "spark.app.startTime": "1721400284146", + "spark.submit.deployMode": "client", + "spark.executor.cores": "8", + "spark.driver.host": "woody.lan" + }, + "1": [ + 102.50705313682556, + 102.79903650283813, + 102.37824273109436 + ], + "2": [ + 11.939732789993286, + 11.57072114944458, + 11.101150751113892 + ], + "3": [ + 22.83867835998535, + 22.278138160705566, + 21.799394607543945 + ], + "4": [ + 17.29210591316223, + 16.73092555999756, + 16.37605381011963 + ], + "5": [ + 44.469959020614624, + 44.03915023803711, + 43.95619344711304 + ], + "6": [ + 3.5601518154144287, + 3.5911924839019775, + 3.436091423034668 + ], + "7": [ + 20.750574588775635, + 20.520857334136963, + 19.94058084487915 + ], + "8": [ + 32.82963514328003, + 33.52391338348389, + 33.31922125816345 + ], + "9": [ + 67.38808369636536, + 67.7869827747345, + 66.43161416053772 + ], + "10": [ + 19.35655426979065, + 18.340266466140747, + 18.23655128479004 + ], + "11": [ + 12.413910865783691, + 11.624657392501831, + 11.516592264175415 + ], + "12": [ + 13.084538698196411, + 12.356904029846191, + 12.170117616653442 + ], + "13": [ + 21.03442144393921, + 20.202510356903076, + 20.323060274124146 + ], + "14": [ + 5.823769569396973, + 5.6380836963653564, + 5.555809497833252 + ], + "15": [ + 16.080876350402832, + 15.773152112960815, + 15.420385599136353 + ], + "16": [ + 6.3961169719696045, + 6.1880176067352295, + 6.201446056365967 + ], + "17": [ + 57.85423827171326, + 58.72210907936096, + 56.86491131782532 + ], + "18": [ + 71.47632312774658, + 71.85094976425171, + 70.76138210296631 + ], + "19": [ + 6.9238176345825195, + 6.903416633605957, + 6.856832504272461 + ], + "20": [ + 9.688754320144653, + 9.626489400863647, + 9.615416049957275 + ], + "21": [ + 59.68360757827759, + 59.52283453941345, + 59.335086822509766 + ], + "22": [ + 8.247804880142212, + 8.269869565963745, + 8.185250282287598 + ] +} \ No newline at end of file diff --git a/docs/source/contributor-guide/benchmarking.md b/docs/source/contributor-guide/benchmarking.md index d315c559e..713f47b45 100644 --- a/docs/source/contributor-guide/benchmarking.md +++ b/docs/source/contributor-guide/benchmarking.md @@ -22,6 +22,8 @@ under the License. To track progress on performance, we regularly run benchmarks derived from TPC-H and TPC-DS. Data generation and benchmarking documentation and scripts are available in the [DataFusion Benchmarks](https://github.com/apache/datafusion-benchmarks) GitHub repository. +We also have many micro benchmarks that can be run from an IDE located [here]()https://github.com/apache/datafusion-comet/tree/main/spark/src/test/scala/org/apache/spark/sql/benchmark). + Here are example commands for running the benchmarks against a Spark cluster. This command will need to be adapted based on the Spark environment and location of data files. @@ -34,15 +36,15 @@ repository. $SPARK_HOME/bin/spark-submit \ --master $SPARK_MASTER \ --conf spark.driver.memory=8G \ + --conf spark.executor.instances=1 \ --conf spark.executor.memory=32G \ --conf spark.executor.cores=8 \ --conf spark.cores.max=8 \ - --conf spark.sql.autoBroadcastJoinThreshold=-1 \ tpcbench.py \ --benchmark tpch \ --data /mnt/bigdata/tpch/sf100/ \ --queries ../../tpch/queries \ - --iterations 5 + --iterations 3 ``` ## Running Benchmarks Against Apache Spark with Apache DataFusion Comet Enabled @@ -55,7 +57,6 @@ $SPARK_HOME/bin/spark-submit \ --conf spark.executor.memory=32G \ --conf spark.executor.cores=8 \ --conf spark.cores.max=8 \ - --conf spark.sql.autoBroadcastJoinThreshold=-1 \ --jars $COMET_JAR \ --conf spark.driver.extraClassPath=$COMET_JAR \ --conf spark.executor.extraClassPath=$COMET_JAR \ @@ -64,9 +65,6 @@ $SPARK_HOME/bin/spark-submit \ --conf spark.comet.exec.enabled=true \ --conf spark.comet.exec.all.enabled=true \ --conf spark.comet.cast.allowIncompatible=true \ - --conf spark.comet.explainFallback.enabled=true \ - --conf spark.comet.parquet.io.enabled=false \ - --conf spark.comet.batchSize=8192 \ --conf spark.comet.exec.shuffle.enabled=true \ --conf spark.comet.exec.shuffle.mode=auto \ --conf spark.comet.shuffle.enforceMode.enabled=true \ @@ -75,7 +73,7 @@ $SPARK_HOME/bin/spark-submit \ --benchmark tpch \ --data /mnt/bigdata/tpch/sf100/ \ --queries ../../tpch/queries \ - --iterations 5 + --iterations 3 ``` ## Current Performance @@ -87,20 +85,44 @@ The following benchmarks were performed on a Linux workstation with PCIe 5, AMD data stored locally on NVMe storage. Performance characteristics will vary in different environments and we encourage you to run these benchmarks in your own environments. -![](../../_static/images/tpch_allqueries.png) +### TPC-H + +Comet currently provides a 35% speedup for TPC-H @ SF=100GB. + +![](../../_static/images/benchmark-results/2024-07-19/tpch_allqueries.png) -Here is a breakdown showing relative performance of Spark, Comet, and DataFusion for each TPC-H query. +Here is a breakdown showing relative performance of Spark, Comet, and DataFusion for each query. -![](../../_static/images/tpch_queries_compare.png) +![](../../_static/images/benchmark-results/2024-07-19/tpch_queries_compare.png) -The following chart shows how much Comet currently accelerates each query from the benchmark. Performance optimization -is an ongoing task, and we welcome contributions from the community to help achieve even greater speedups in the future. +The following chart shows how much Comet currently accelerates each query from the benchmark. -![](../../_static/images/tpch_queries_speedup.png) +![](../../_static/images/benchmark-results/2024-07-19/tpch_queries_speedup.png) The raw results of these benchmarks in JSON format is available here: -- [Spark](./benchmark-results/2024-06-29/spark-8-exec-5-runs.json) -- [Comet](./benchmark-results/2024-06-29/comet-8-exec-5-runs.json) -- [DataFusion](./benchmark-results/2024-06-29/datafusion-python-8-cores.json) +- [Spark](./benchmark-results/2024-07-19/spark-tpch.json) +- [Comet](./benchmark-results/2024-07-19/comet-tpch.json) +- [DataFusion](./benchmark-results/2024-07-19/datafusion-tpch.json) +### TPC-DS + +Comet currently provides an 18% speedup for TPC-DS @ SF=100GB. Note that we used an optimized version of +query 72 with a better join order for these benchmarks since the focus of Spark (and Comet) is not on join +reordering algorithms but raw execution speed. + +![](../../_static/images/benchmark-results/2024-07-19/tpcds_allqueries.png) + +Here is a breakdown showing relative performance of Spark and Comet for each query. DataFusion is +not included here because it currently only supports around 90% of the TPC-DS queries. + +![](../../_static/images/benchmark-results/2024-07-19/tpcds_queries_compare.png) + +The following chart shows how much Comet currently accelerates each query from the benchmark. + +![](../../_static/images/benchmark-results/2024-07-19/tpcds_queries_speedup.png) + +The raw results of these benchmarks in JSON format is available here: + +- [Spark](./benchmark-results/2024-07-19/spark-tpcds.json) +- [Comet](./benchmark-results/2024-07-19/comet-tpcds.json)