Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Off-by-1 error in FASTQ InputFormat start positioning code #1383

Closed
fnothaft opened this issue Feb 6, 2017 · 2 comments
Closed

Off-by-1 error in FASTQ InputFormat start positioning code #1383

fnothaft opened this issue Feb 6, 2017 · 2 comments
Assignees
Labels
Milestone

Comments

@fnothaft
Copy link
Member

fnothaft commented Feb 6, 2017

See: https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/java/org/bdgenomics/adam/io/SingleFastqInputFormat.java#L65. Causes an AIOOBE:

java.lang.ArrayIndexOutOfBoundsException: 0
	at org.bdgenomics.adam.io.SingleFastqInputFormat$SingleFastqRecordReader.checkBuffer(SingleFastqInputFormat.java:66)
	at org.bdgenomics.adam.io.FastqRecordReader.positionAtFirstRecord(FastqRecordReader.java:169)
	at org.bdgenomics.adam.io.FastqRecordReader.<init>(FastqRecordReader.java:126)
	at org.bdgenomics.adam.io.SingleFastqInputFormat$SingleFastqRecordReader.<init>(SingleFastqInputFormat.java:49)
	at org.bdgenomics.adam.io.SingleFastqInputFormat.createRecordReader(SingleFastqInputFormat.java:107)
	at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:156)

This bug is in both the single and interleaved code.

@fnothaft fnothaft added the bug label Feb 6, 2017
@fnothaft fnothaft added this to the 0.21.1 milestone Feb 6, 2017
@fnothaft fnothaft self-assigned this Feb 6, 2017
@A-Tsai
Copy link
Contributor

A-Tsai commented Feb 17, 2017

I found it happens when the first character of the block accessed by FileSplit is '\n'. At that scenario, bufferLength=0 and buffer is empty. It causes an exception due to ArrayIndexOutOfBounds because try to get buffer.getBytes()[0] in Line 66 on SingleFastqInputFormat.java.
if we can remove '=' in Line 65 of SingleFastqInputFormat.java when checking "bufferLength >= 0", the issue can be solved. I'm not sure it is a right solution or not, but it works on my pipeline.

@heuermh
Copy link
Member

heuermh commented Feb 21, 2017

Thank you for the feedback, @A-Tsai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants