Skip to content

PARQUET-77: ByteBuffer use in read and write paths#267

Closed
jaltekruse wants to merge 102 commits intoapache:masterfrom
jaltekruse:1.6.0rc3-drill-r0.3-merge
Closed

PARQUET-77: ByteBuffer use in read and write paths#267
jaltekruse wants to merge 102 commits intoapache:masterfrom
jaltekruse:1.6.0rc3-drill-r0.3-merge

Conversation

@jaltekruse
Copy link
Contributor

This work is based on the GSOC project from the summer of 2014. We have expanded on it to fix bugs and change the write path to use ByteBuffers as well. This PR replaces several earlier PRs.

closes #6, closes #49, closes #50, closes #267

gerashegalov and others added 30 commits February 10, 2015 14:18
Use reflect to call new API to keep compatible.
add compatible method initFromPage in ValueReaders.
add toByteBuffer method in ByteBufferInputStream.
add V21FileAPI class to encapsulate v21 APIs and make it a singlton.
add ByteBuffer based equal and compareto method in Binary.
Add compatibility function to read directly into a byte buffer
… memory can be released before stats are written.
…tor to allocate the ByteBuffer.

Conflicts:
	parquet-column/src/main/java/parquet/column/ColumnWriteStore.java
	parquet-column/src/main/java/parquet/column/ColumnWriter.java
	parquet-column/src/main/java/parquet/column/ParquetProperties.java
	parquet-column/src/main/java/parquet/column/impl/ColumnWriteStoreV1.java
	parquet-column/src/main/java/parquet/column/impl/ColumnWriterV1.java
	parquet-column/src/main/java/parquet/column/values/dictionary/DictionaryValuesWriter.java
	parquet-column/src/main/java/parquet/column/values/rle/RunLengthBitPackingHybridValuesWriter.java
	parquet-column/src/test/java/parquet/column/values/dictionary/TestDictionary.java
	parquet-column/src/test/java/parquet/io/TestColumnIO.java
	parquet-hadoop/src/main/java/parquet/hadoop/ColumnChunkPageWriteStore.java
	parquet-hadoop/src/main/java/parquet/hadoop/InternalParquetRecordWriter.java
	parquet-hadoop/src/main/java/parquet/hadoop/ParquetRecordWriter.java
	parquet-hadoop/src/test/java/parquet/hadoop/TestParquetFileWriter.java
Conflicts:
	parquet-column/src/main/java/parquet/column/values/dictionary/DictionaryValuesWriter.java
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will lose the original cause of the error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean here. We didn't get an error out of the method we called, throwing here will at least give a stacktrace, but I don't see where we are going to get more information about why it failed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that was not very clear. I was referring to the catch block line 98. if we catch an exception there then res == 0
And we throw an exception without the cause.
The other case is line 104 returned 0. In which case maybe there a better message than "Null ByteBuffer returned".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the condition checking if res == 0 is valid considering what the docs on this method says about how this method should work concerning 0 length requests. In this case it is for zero length result, possibly when there was a non-zero length request, but it seems like we should not be considering this return value erroneous. https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataInputStream.html#read%28java.nio.ByteBuffer%29

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this check

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it need to be public? make it package protected if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drill creates one of these to compress and decompress page data itself. I could add a factory method that takes an allocator on the CodecFactory to allow creating without exposing the whole class. We don't need to subclass this over in Drill or anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change described above and pushed a new commit

@julienledem
Copy link
Member

This looks fine to me.
@jaltekruse I made a few final comments. please take a look.
I'm planning to merge this soon after you take care of those.
Thanks!

… on the allocators used by a DirectCodecFactory.

Moved the DirectCodecFactory class to package private access and added a factory method to create one on the CodecFactory class.
@julienledem
Copy link
Member

+1

I had been a little too aggressive hiding things from the outside world, we still need access to the codec factory itself in Drill. Most of the new code has been hidden from the public interface.
@jaltekruse jaltekruse changed the title ByteBuffer use in read and write paths PARQUET-77: ByteBuffer use in read and write paths Nov 3, 2015
…m that does not implement the byte buffer based read method in the Hadoop 2.x API.
… name for the class that was being used to detect if the Hadoop 2.x API was available.

Additionally the check for actual implementation of the read method was not functioning properly. The UnsupportedOperationException that will be thrown from the method will actually be wrapped in an InvocationTargetException now that the method is being invoked with reflection. The code to detect if a call fails has been moved back down to where the actual read method is called, because making it work properly in the static block was too much of a headache, creating an instance of an FSDataInputStream that fulfilled the correct interfaces would have required more reflection hacks. I do properly set the flag used to track availability of the API to avoid the previous behavior of always relying on exception based control flow for fallback, it just happens more lazily than was attempted with the earlier work to simplify this class.
…g is very wrong, so it shouldn't get wrapped in a ShouldNeverHappenException. This invocationTargetException will wrap any kind of exception coming out of the method, including an IOException.
@asfgit asfgit closed this in 6b605a4 Nov 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants

Comments