Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restore read split lzo file with index #136

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AbnerSunyh
Copy link

I use presto to read hive table which stores as DeprecatedLzoTextInputFormat.

DeprecatedLzoTextInputFormat.isSplitable function throws NullPointerException, because indexesMap does not contain the path in 'isSplitable' function. Maybe 'listStatus' function is not executed before.

Caused by: java.lang.NullPointerException
        at com.hadoop.mapred.DeprecatedLzoTextInputFormat.isSplitable(DeprecatedLzoTextInputFormat.java:103)

When I solve the NullPointerException, it throws IOException in LzopInputStream.getCompressedData, says

Caused by: java.io.IOException: Compressed length 916527927 exceeds max block size 67108864 (probably corrupt file)
        at com.hadoop.compression.lzo.LzopInputStream.getCompressedData(LzopInputStream.java:295)

I guess it is wrong that LzopInputStream calculate the size of the compressed chunk.

After careful examination, I find FSDataInputStream seek a wrong 'start' which is not start-stop offset of lzo block in DeprecatedLzoLineRecordReader constructor.

It can work normally, when I seek a right 'start' which is start offset of lzo block.

Please review the code, is my modification rigth?

Thanks.

@CLAassistant
Copy link

CLAassistant commented Apr 23, 2018

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants