restore read split lzo file with index #136

AbnerSunyh · 2018-04-23T06:44:30Z

I use presto to read hive table which stores as DeprecatedLzoTextInputFormat.

DeprecatedLzoTextInputFormat.isSplitable function throws NullPointerException, because indexesMap does not contain the path in 'isSplitable' function. Maybe 'listStatus' function is not executed before.

Caused by: java.lang.NullPointerException
        at com.hadoop.mapred.DeprecatedLzoTextInputFormat.isSplitable(DeprecatedLzoTextInputFormat.java:103)

When I solve the NullPointerException, it throws IOException in LzopInputStream.getCompressedData, says

Caused by: java.io.IOException: Compressed length 916527927 exceeds max block size 67108864 (probably corrupt file)
        at com.hadoop.compression.lzo.LzopInputStream.getCompressedData(LzopInputStream.java:295)

I guess it is wrong that LzopInputStream calculate the size of the compressed chunk.

After careful examination, I find FSDataInputStream seek a wrong 'start' which is not start-stop offset of lzo block in DeprecatedLzoLineRecordReader constructor.

It can work normally, when I seek a right 'start' which is start offset of lzo block.

Please review the code, is my modification rigth?

Thanks.

CLAassistant · 2018-04-23T06:44:37Z

All committers have signed the CLA.

release_0_4_20-restore_read_split_lzo_file_with_index.patch

a0ab7cf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

restore read split lzo file with index #136

restore read split lzo file with index #136

AbnerSunyh commented Apr 23, 2018

CLAassistant commented Apr 23, 2018 •

edited

Loading

restore read split lzo file with index #136

Are you sure you want to change the base?

restore read split lzo file with index #136

Conversation

AbnerSunyh commented Apr 23, 2018

CLAassistant commented Apr 23, 2018 • edited Loading

CLAassistant commented Apr 23, 2018 •

edited

Loading