Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N-masked sequence may be ignored, no track feature created #16

Open
jduvick opened this issue Feb 18, 2016 · 0 comments
Open

N-masked sequence may be ignored, no track feature created #16

jduvick opened this issue Feb 18, 2016 · 0 comments

Comments

@jduvick
Copy link
Contributor

jduvick commented Feb 18, 2016

A dataset with Volvox carterei genome scaffolds, containing N-spacers (3.7 % of total according to xGDBvm's file validation script), was found to be incorrectly parsed for what we term 'N-masked' regions.

Specifically, no N-masked regions were parsed by the script parseGsegMask.pl, resulting in and empty ~mask.fa file and a WARNING flag 6.40 in the Pipeline_Procedure.log. I can reproduce the problem but haven't found the source yet. One difference from the Example 1 benchmark for N-mask parsing is that V.carterei genome segments include lower case (gatc) bases, although the N-masked sequence interspersed is uppercase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant