Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

normalize-by-median skip consuming kmers of PE reads #1000

Closed
drtamermansour opened this issue May 14, 2015 · 3 comments
Closed

normalize-by-median skip consuming kmers of PE reads #1000

drtamermansour opened this issue May 14, 2015 · 3 comments

Comments

@drtamermansour
Copy link
Member

The normalize-by-median script in the PE mode keeps both paired ends if only one of them is below the coverage cutoff. However then script only consume the kmers from one end. This is why if we run abound-dist.py on the output k-mer counting table, 1st line will show some no of kmers with 0 frequency.

@drtamermansour
Copy link
Member Author

Another important point, current implantation replace 'N's with 'A's then use this modified sequence to consume kmers. I think load-into-counting.py (& filter-abund-single.py and abundance-dist-single.py) ignore reads with 'N's.

While fixing this we need to take care that: normalize-by-median is using the function consume(seq) which does not check for DNA validity and thus will count kmers with 'N's. I think this function needs to be fixed to call for the check_and_process_read function.

@drtamermansour
Copy link
Member Author

Both issues raised here of course would impact the behavior of downstream filter-abund.py script

@SensibleSalmon
Copy link
Contributor

#1010 resolves the bug in normalize-by-median.

@ctb ctb closed this as completed Jun 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants