Fastq reader clips long reads at 10,000 bp #1992

heuermh · 2018-05-24T17:54:58Z

E.g. ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/Ultralong_OxfordNanopore/

$ dsh-bio fastq-sequence-length -i combined_2018-05-18.fastq.gz | sort -n -r | head
322099
298433
283861
262525
247501

scala> val reads = sc.loadAlignments("combined_2018-05-18.fastq.gz")
reads: org.bdgenomics.adam.rdd.read.AlignmentRecordRDD =
ParquetUnboundAlignmentRecordRDD with 0 reference sequences, 0 read groups,
and 0 processing steps

scala> val lengths = reads.rdd.map(_.sequence.length()).collect()
lengths: Array[Int] = Array(641, 3082, 10000, 10000, 10000, ...

scala> lengths.sortBy(-1 * _).take(10)
res3: Array[Int] = Array(10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, ...

The text was updated successfully, but these errors were encountered:

Resolves bigdatagenomics#1992.

fnothaft pushed a commit to fnothaft/adam that referenced this issue Jul 6, 2018

[ADAM-1992] Make maximum FASTQ read length configurable.

cf1204d

Resolves bigdatagenomics#1992.

fnothaft mentioned this issue Jul 6, 2018

[ADAM-1992] Make maximum FASTQ read length configurable. #2011

Closed

fnothaft added this to the 0.24.1 milestone Jul 6, 2018

heuermh pushed a commit to heuermh/adam that referenced this issue Nov 5, 2018

[ADAM-1992] Make maximum FASTQ read length configurable.

2ca826e

Resolves bigdatagenomics#1992.

heuermh mentioned this issue Nov 5, 2018

[ADAM-1992] Make maximum FASTQ read length configurable. #2077

Merged

akmorrow13 closed this as completed in #2077 Nov 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fastq reader clips long reads at 10,000 bp #1992

Fastq reader clips long reads at 10,000 bp #1992

heuermh commented May 24, 2018

Fastq reader clips long reads at 10,000 bp #1992

Fastq reader clips long reads at 10,000 bp #1992

Comments

heuermh commented May 24, 2018