Commit 2c54aae
[SPARK-24809][SQL] Serializing LongToUnsafeRowMap in executor may result in data error
When join key is long or int in broadcast join, Spark will use `LongToUnsafeRowMap` to store key-values of the table witch will be broadcasted. But, when `LongToUnsafeRowMap` is broadcasted to executors, and it is too big to hold in memory, it will be stored in disk. At that time, because `write` uses a variable `cursor` to determine how many bytes in `page` of `LongToUnsafeRowMap` will be write out and the `cursor` was not restore when deserializing, executor will write out nothing from page into disk.
## What changes were proposed in this pull request?
Restore cursor value when deserializing.
Author: liulijia <liutang123@yeah.net>
Closes #21772 from liutang123/SPARK-24809.1 parent 8fe5d2c commit 2c54aae
File tree
2 files changed
+31
-0
lines changed- sql/core/src
- main/scala/org/apache/spark/sql/execution/joins
- test/scala/org/apache/spark/sql/execution/joins
2 files changed
+31
-0
lines changedLines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
772 | 772 | | |
773 | 773 | | |
774 | 774 | | |
| 775 | + | |
| 776 | + | |
775 | 777 | | |
776 | 778 | | |
777 | 779 | | |
| |||
Lines changed: 29 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
278 | 278 | | |
279 | 279 | | |
280 | 280 | | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
281 | 310 | | |
282 | 311 | | |
283 | 312 | | |
| |||
0 commit comments