-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
When I run ParquetWriter.getDataSize(), it works normally. But after I call ParquetWriter.close(), subsequent calls to ParquetWriter.getDataSize result in a NullPointerException.
java.lang.NullPointerException
at org.apache.parquet.hadoop.InternalParquetRecordWriter.getDataSize(InternalParquetRecordWriter.java:132)
at org.apache.parquet.hadoop.ParquetWriter.getDataSize(ParquetWriter.java:314)
at FileBufferState.getFileSizeInBytes(FileBufferState.scala:83)
The reason for the NPE appears to be in InternalParquetRecordWriter.getDataSize, where it assumes that columnStore is not null.
But the close() method calls flushRowGroupToStore() which sets columnStore = null.
I'm guessing that once the file is closed, we can just return lastRowGroupEndPos since there should be no more buffered data, but I don't fully understand how this class works.
Environment: Linux prim 4.8.13-1-ARCH #1 SMP PREEMPT Fri Dec 9 07:24:34 CET 2016 x86_64 GNU/Linux
openjdk version "1.8.0_112"
OpenJDK Runtime Environment (build 1.8.0_112-b15)
OpenJDK 64-Bit Server VM (build 25.112-b15, mixed mode)
Reporter: Mike Mintz
Related issues:
Note: This issue was originally created as PARQUET-860. Please see the migration documentation for further details.