scala.MatchError RegExp does not catch colons in value part properly #1061

pauca · 2016-06-30T15:03:20Z

line

adam/adam-core/src/main/scala/org/bdgenomics/adam/util/AttributeUtils.scala

Line 31 in b29d204

val attrRegex = RegExp("([^:]{2,4}):([AifZHB]):([cCiIsSf]{1},)?(.*)")

with
val attrRegex = RegExp("([^:]{2,4}):([AifZHB]):([cCiIsSf]{1},)?(.*)")

does not handle properly alignmentrecords with attributes like
OQ:Z:C55/15D:::::::.7GFFAFDA442.40F=AGHHE
ie. have colons in the value part

some problematic reads are contained in gatk bundle file CEUTrio.HiSeq.WGS.b37.NA12878.bam

scala> BamWriter.adamSAMSave( "output.bam", bam.sequences, bam.recordGroups , true, true ,false)
2016-06-30 17:01:41 ERROR Utils:95 - Aborting task
scala.MatchError: Z:C, (of class java.lang.String)
    at org.bdgenomics.adam.util.AttributeUtils$.createAttribute(AttributeUtils.scala:92)
    at org.bdgenomics.adam.util.AttributeUtils$.parseAttribute(AttributeUtils.scala:74)
    at org.bdgenomics.adam.util.AttributeUtils$$anonfun$parseAttributes$2.apply(AttributeUtils.scala:61)

The text was updated successfully, but these errors were encountered:

fnothaft · 2016-07-01T15:34:42Z

Thanks for reporting this @pauca! We will look into this in the next week. We have some separate logic to extract the OQ field, and I think this isn't getting handled properly.

We had a bug in `org.bdgenomics.adam.util.AttributeUtils` where the regex for splitting out the formatting string for array attributes was applied to all attributes. In an array attribute (SAM "B" tags), the type of the array elements is encoded before the attribute values, and is split off by commas. E.g., "B:i,1,2,3". If the attribute is a string (SAM "Z" tags), commas are allowed. To resolve this, I split this regex into two regexes. We only apply the regex for splitting out the array type if we are working on an array attribute. This resolves bigdatagenomics#1061.

fnothaft added the bug label Jul 1, 2016

fnothaft added this to the 0.20.0 milestone Jul 1, 2016

fnothaft self-assigned this Jul 16, 2016

fnothaft mentioned this issue Jul 17, 2016

ADAM to BAM conversion failing on 1000G file #1013

Closed

fnothaft mentioned this issue Jul 17, 2016

[ADAM-1061] Clean up attributes regex and denormalized fields #1080

Merged

heuermh closed this as completed in #1080 Jul 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scala.MatchError RegExp does not catch colons in value part properly #1061

scala.MatchError RegExp does not catch colons in value part properly #1061

pauca commented Jun 30, 2016

fnothaft commented Jul 1, 2016

scala.MatchError RegExp does not catch colons in value part properly #1061

scala.MatchError RegExp does not catch colons in value part properly #1061

Comments

pauca commented Jun 30, 2016

fnothaft commented Jul 1, 2016