PARQUET-77: ByteBuffer use in read and write paths#267
PARQUET-77: ByteBuffer use in read and write paths#267jaltekruse wants to merge 102 commits intoapache:masterfrom
Conversation
…copy through read path.
Use reflect to call new API to keep compatible.
Fix bugs in Binary.
add compatible method initFromPage in ValueReaders. add toByteBuffer method in ByteBufferInputStream. add V21FileAPI class to encapsulate v21 APIs and make it a singlton. add ByteBuffer based equal and compareto method in Binary.
Add compatibility function to read directly into a byte buffer
… memory can be released before stats are written.
…tor to allocate the ByteBuffer. Conflicts: parquet-column/src/main/java/parquet/column/ColumnWriteStore.java parquet-column/src/main/java/parquet/column/ColumnWriter.java parquet-column/src/main/java/parquet/column/ParquetProperties.java parquet-column/src/main/java/parquet/column/impl/ColumnWriteStoreV1.java parquet-column/src/main/java/parquet/column/impl/ColumnWriterV1.java parquet-column/src/main/java/parquet/column/values/dictionary/DictionaryValuesWriter.java parquet-column/src/main/java/parquet/column/values/rle/RunLengthBitPackingHybridValuesWriter.java parquet-column/src/test/java/parquet/column/values/dictionary/TestDictionary.java parquet-column/src/test/java/parquet/io/TestColumnIO.java parquet-hadoop/src/main/java/parquet/hadoop/ColumnChunkPageWriteStore.java parquet-hadoop/src/main/java/parquet/hadoop/InternalParquetRecordWriter.java parquet-hadoop/src/main/java/parquet/hadoop/ParquetRecordWriter.java parquet-hadoop/src/test/java/parquet/hadoop/TestParquetFileWriter.java
Conflicts: parquet-column/src/main/java/parquet/column/values/dictionary/DictionaryValuesWriter.java
There was a problem hiding this comment.
this will lose the original cause of the error.
There was a problem hiding this comment.
I'm not sure what you mean here. We didn't get an error out of the method we called, throwing here will at least give a stacktrace, but I don't see where we are going to get more information about why it failed.
There was a problem hiding this comment.
Sorry that was not very clear. I was referring to the catch block line 98. if we catch an exception there then res == 0
And we throw an exception without the cause.
The other case is line 104 returned 0. In which case maybe there a better message than "Null ByteBuffer returned".
There was a problem hiding this comment.
I don't think the condition checking if res == 0 is valid considering what the docs on this method says about how this method should work concerning 0 length requests. In this case it is for zero length result, possibly when there was a non-zero length request, but it seems like we should not be considering this return value erroneous. https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataInputStream.html#read%28java.nio.ByteBuffer%29
There was a problem hiding this comment.
I have removed this check
…to Hadoop 2.0 compression APIs.
…od for getting a compressor.
…e got lost somewhere.
…red in the newer version.
There was a problem hiding this comment.
does it need to be public? make it package protected if possible.
There was a problem hiding this comment.
Drill creates one of these to compress and decompress page data itself. I could add a factory method that takes an allocator on the CodecFactory to allow creating without exposing the whole class. We don't need to subclass this over in Drill or anything.
There was a problem hiding this comment.
I made the change described above and pushed a new commit
|
This looks fine to me. |
… on the allocators used by a DirectCodecFactory. Moved the DirectCodecFactory class to package private access and added a factory method to create one on the CodecFactory class.
… and is no longer accessible in this class.
|
+1 |
I had been a little too aggressive hiding things from the outside world, we still need access to the codec factory itself in Drill. Most of the new code has been hidden from the public interface.
…m that does not implement the byte buffer based read method in the Hadoop 2.x API.
… name for the class that was being used to detect if the Hadoop 2.x API was available. Additionally the check for actual implementation of the read method was not functioning properly. The UnsupportedOperationException that will be thrown from the method will actually be wrapped in an InvocationTargetException now that the method is being invoked with reflection. The code to detect if a call fails has been moved back down to where the actual read method is called, because making it work properly in the static block was too much of a headache, creating an instance of an FSDataInputStream that fulfilled the correct interfaces would have required more reflection hacks. I do properly set the flag used to track availability of the API to avoid the previous behavior of always relying on exception based control flow for fallback, it just happens more lazily than was attempted with the earlier work to simplify this class.
…g is very wrong, so it shouldn't get wrapped in a ShouldNeverHappenException. This invocationTargetException will wrap any kind of exception coming out of the method, including an IOException.
This work is based on the GSOC project from the summer of 2014. We have expanded on it to fix bugs and change the write path to use ByteBuffers as well. This PR replaces several earlier PRs.
closes #6, closes #49, closes #50, closes #267