Skip to content

CometHashAggregate prefixed with ! in explain plan #2214

@rishvin

Description

@rishvin

Describe the bug

I invoked the following in the spark-shell, version 3.5.6.

$SPARK_HOME/bin/spark-shell $COMET/spark/target/comet-spark-spark3.5_2.12-0.10.0-SNAPSHOT --conf spark.plugins=org.apache.spark.CometPlugin --conf spark.comet.enabled=true --conf spark.comet.exec.enabled=true
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.Row

val schema = StructType(Seq(StructField("id", IntegerType, nullable = false), StructField("value", IntegerType, nullable = false)))
val data = Seq(Row(1, 10), Row(2, 20), Row(3, 10), Row(4, 30), Row(5, 20), Row(6, 10))
val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
val out = "groupby"
df.write.mode("overwrite").parquet(out)
val parquetDF = spark.read.parquet(out)
val grouped = parquetDF.groupBy("id").count()
grouped.explain()

And saw that the explain output has CometHashAggregate prefixed with !.

== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[id#137], functions=[count(1)])
   +- Exchange hashpartitioning(id#137, 4), ENSURE_REQUIREMENTS, [plan_id=420]
      +- !CometHashAggregate [id#137], Partial, [id#137], [partial_count(1)]
         +- CometNativeScan parquet [id#137] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/home/testing/groupby], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:int>

There was no correctness/failures issues.

The spark codebase here says - they use "!" to indicate an invalid plan, and "'" to indicate an unresolved plan. However, I haven't verified, if this is the only place in the code from where ! gets added to the plan.

I filed this issue after seeing that comment in the spark code to bring this to notice.

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions