You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Added in 985e5d8. I made some tweak between when I tested this last before merging and when we merged it that borked this. A BGZF'ed file will get properly split by the input format, but then the record reader will read the config and see false for the FILE_SPLITTABLE flag, and read the whole file.
The text was updated successfully, but these errors were encountered:
Resolvesbigdatagenomics#1635. Instead of passing whether a FASTQ was splittable via config,
checks to see if the compression codec is splittable. This is more reliable.
In the case of a .gz file, the BGZFEnhancedGZipCodec properly handles this
edge case by checking the stream type; this coupled with us explicitly
checking the stream when split picking ensures that we don't try to create an
invalid GZIP split. Additionally, I identified and fixed an error in the old
FASTQ code that did a seek on the uncompressed input stream to backtrack if
seeing a line of quality scores that began with @ when identifying the position
of the first valid record in a split. Instead, we check for two successive lines
that start with an @, which indicates that the first line contains quality
scores, while the second line contains read names.
Resolves#1635. Instead of passing whether a FASTQ was splittable via config,
checks to see if the compression codec is splittable. This is more reliable.
In the case of a .gz file, the BGZFEnhancedGZipCodec properly handles this
edge case by checking the stream type; this coupled with us explicitly
checking the stream when split picking ensures that we don't try to create an
invalid GZIP split. Additionally, I identified and fixed an error in the old
FASTQ code that did a seek on the uncompressed input stream to backtrack if
seeing a line of quality scores that began with @ when identifying the position
of the first valid record in a split. Instead, we check for two successive lines
that start with an @, which indicates that the first line contains quality
scores, while the second line contains read names.
Added in 985e5d8. I made some tweak between when I tested this last before merging and when we merged it that borked this. A BGZF'ed file will get properly split by the input format, but then the record reader will read the config and see
false
for the FILE_SPLITTABLE flag, and read the whole file.The text was updated successfully, but these errors were encountered: