normalize-by-median skip consuming kmers of PE reads #1000

drtamermansour · 2015-05-14T00:33:53Z

The normalize-by-median script in the PE mode keeps both paired ends if only one of them is below the coverage cutoff. However then script only consume the kmers from one end. This is why if we run abound-dist.py on the output k-mer counting table, 1st line will show some no of kmers with 0 frequency.

drtamermansour · 2015-05-14T01:32:12Z

Another important point, current implantation replace 'N's with 'A's then use this modified sequence to consume kmers. I think load-into-counting.py (& filter-abund-single.py and abundance-dist-single.py) ignore reads with 'N's.

While fixing this we need to take care that: normalize-by-median is using the function consume(seq) which does not check for DNA validity and thus will count kmers with 'N's. I think this function needs to be fixed to call for the check_and_process_read function.

drtamermansour · 2015-05-14T01:36:10Z

Both issues raised here of course would impact the behavior of downstream filter-abund.py script

SensibleSalmon · 2015-06-01T20:28:29Z

#1010 resolves the bug in normalize-by-median.

drtamermansour mentioned this issue May 15, 2015

Diginorm skip kmers #1001

Closed

ctb mentioned this issue Jun 1, 2015

Normalize-by-median refactor: Stage 0 #1010

Merged

ctb closed this as completed Jun 1, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

normalize-by-median skip consuming kmers of PE reads #1000

normalize-by-median skip consuming kmers of PE reads #1000

drtamermansour commented May 14, 2015

drtamermansour commented May 14, 2015

drtamermansour commented May 14, 2015

SensibleSalmon commented Jun 1, 2015

normalize-by-median skip consuming kmers of PE reads #1000

normalize-by-median skip consuming kmers of PE reads #1000

Comments

drtamermansour commented May 14, 2015

drtamermansour commented May 14, 2015

drtamermansour commented May 14, 2015

SensibleSalmon commented Jun 1, 2015