Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50235][SQL] Clean up ColumnVector resource after processing all rows in ColumnarToRowExec #48767

Closed
wants to merge 1 commit into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Nov 5, 2024

What changes were proposed in this pull request?

This patch cleans up ColumnVector resource after processing all rows in ColumnarToRowExec. This patch only focus on codeben implementation of ColumnarToRowExec. For non-codegen, it should be relatively rare to use, and currently no good way has proposed, so leaving it to a follow up.

Why are the changes needed?

Currently we only assign null to ColumnarBatch object but it doesn't release the resources hold by the vectors in the batch. For OnHeapColumnVector, the Java arrays may be automatically collected by JVM, but for OffHeapColumnVector, the allocated off-heap memory will be leaked.

For custom ColumnVector implementations like Arrow-based, it also possibly causes issues on memory safety if the underlying buffers are reused across batches. Because when ColumnarToRowExec begins to fill values for next batch, the arrays in previous batch are still hold.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Nov 5, 2024
@viirya viirya changed the title [SPARK-XXXXX][SQL] Clean up ColumnVector resource after processing all rows in ColumnarToRowExec [SPARK-50235][SQL] Clean up ColumnVector resource after processing all rows in ColumnarToRowExec Nov 5, 2024
Comment on lines +202 to +204
|if ($batch != null) {
| $batch.close();
|}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like memory leak for off-heap case.

@@ -194,9 +194,14 @@ case class ColumnarToRowExec(child: SparkPlan) extends ColumnarToRowTransition w
| $shouldStop
| }
| $idx = $numRows;
| $batch.closeIfNotWritable();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For writable column vectors, they are reused across batches, so we cannot close them until finishing all batches (see below).

@yaooqinn yaooqinn closed this in 800faf0 Nov 6, 2024
yaooqinn pushed a commit that referenced this pull request Nov 6, 2024
…l rows in ColumnarToRowExec

### What changes were proposed in this pull request?

This patch cleans up ColumnVector resource after processing all rows in ColumnarToRowExec. This patch only focus on codeben implementation of ColumnarToRowExec. For non-codegen, it should be relatively rare to use, and currently no good way has proposed, so leaving it to a follow up.

### Why are the changes needed?

Currently we only assign null to ColumnarBatch object but it doesn't release the resources hold by the vectors in the batch. For OnHeapColumnVector, the Java arrays may be automatically collected by JVM, but for OffHeapColumnVector, the allocated off-heap memory will be leaked.

For custom ColumnVector implementations like Arrow-based, it also possibly causes issues on memory safety if the underlying buffers are reused across batches. Because when ColumnarToRowExec begins to fill values for next batch, the arrays in previous batch are still hold.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #48767 from viirya/close_if_not_writable.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Kent Yao <yao@apache.org>
(cherry picked from commit 800faf0)
Signed-off-by: Kent Yao <yao@apache.org>
@yaooqinn
Copy link
Member

yaooqinn commented Nov 6, 2024

Merged to master and 3.5

Thank you @viirya @dongjoon-hyun

@viirya
Copy link
Member Author

viirya commented Nov 6, 2024

Thanks @dongjoon-hyun @yaooqinn

@viirya viirya deleted the close_if_not_writable branch November 6, 2024 14:47
@bersprockets
Copy link
Contributor

Starting with this commit (800faf0), I get an error with the following commands:

val testDf = spark.range(200000).selectExpr("id as a", "concat('x', string(id % 2)) as b")
testDf.write.mode("overwrite").partitionBy("b").format("parquet").save("test1")

spark.read.parquet("test1").createOrReplaceTempView("test1")
sql("select * from test1 limit 12 offset 20000").collect

The collect results in this error:

Exception in task 0.0 in stage 2.0 (TID 17)
java.lang.NullPointerException: Cannot invoke "org.apache.spark.unsafe.types.UTF8String.getBaseObject()" because "input" is null
	at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:111) ~[spark-catalyst_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) ~[?:?]
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) ~[spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50) ~[spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]

If I build with the commit previous to 800faf0, I get actual results rather than an error.

@viirya
Copy link
Member Author

viirya commented Dec 5, 2024

It was fixed by #49021.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Dec 10, 2024

Just for the record, Iceberg community seems to report a bug against this patch.

@viirya
Copy link
Member Author

viirya commented Dec 10, 2024

Thanks @dongjoon-hyun. Please see #49131 (comment) for some discussion.

@dongjoon-hyun
Copy link
Member

Thank you for clarifying that swiftly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants