Commit bb26bdb
[SPARK-23399][SQL] Register a task completion listener first for OrcColumnarBatchReader
This PR aims to resolve an open file leakage issue reported at [SPARK-23390](https://issues.apache.org/jira/browse/SPARK-23390) by moving the listener registration position. Currently, the sequence is like the following.
1. Create `batchReader`
2. `batchReader.initialize` opens a ORC file.
3. `batchReader.initBatch` may take a long time to alloc memory in some environment and cause errors.
4. `Option(TaskContext.get()).foreach(_.addTaskCompletionListener(_ => iter.close()))`
This PR moves 4 before 2 and 3. To sum up, the new sequence is 1 -> 4 -> 2 -> 3.
Manual. The following test case makes OOM intentionally to cause leaked filesystem connection in the current code base. With this patch, leakage doesn't occurs.
```scala
// This should be tested manually because it raises OOM intentionally
// in order to cause `Leaked filesystem connection`.
test("SPARK-23399 Register a task completion listener first for OrcColumnarBatchReader") {
withSQLConf(SQLConf.ORC_VECTORIZED_READER_BATCH_SIZE.key -> s"${Int.MaxValue}") {
withTempDir { dir =>
val basePath = dir.getCanonicalPath
Seq(0).toDF("a").write.format("orc").save(new Path(basePath, "first").toString)
Seq(1).toDF("a").write.format("orc").save(new Path(basePath, "second").toString)
val df = spark.read.orc(
new Path(basePath, "first").toString,
new Path(basePath, "second").toString)
val e = intercept[SparkException] {
df.collect()
}
assert(e.getCause.isInstanceOf[OutOfMemoryError])
}
}
}
```
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #20590 from dongjoon-hyun/SPARK-23399.
(cherry picked from commit 357babd)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>1 parent 4f6a457 commit bb26bdb
File tree
1 file changed
+6
-2
lines changed- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc
1 file changed
+6
-2
lines changedLines changed: 6 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
187 | 187 | | |
188 | 188 | | |
189 | 189 | | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
190 | 196 | | |
191 | 197 | | |
192 | 198 | | |
| |||
195 | 201 | | |
196 | 202 | | |
197 | 203 | | |
198 | | - | |
199 | | - | |
200 | 204 | | |
201 | 205 | | |
202 | 206 | | |
| |||
0 commit comments