Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minialign ONT parameters #11

Open
npavlovikj opened this issue Jun 1, 2018 · 6 comments
Open

Minialign ONT parameters #11

npavlovikj opened this issue Jun 1, 2018 · 6 comments

Comments

@npavlovikj
Copy link

Hi,

I am comparing few nanopore aligners on ONT 1D and ONT 2D data, so I would like to verify if the general commands below are correct for those types of reads
minialign -d ref_index.mai ref.fa
minialign -x ont.r9.2d -O sam -T MD,AS,NM ref_index.mai input.fasta > output.sam
minialign -x ont.r9.1d -O sam -T MD,AS,NM ref_index.mai input.fasta > output.sam

or I need to specify some other parameters as well.

I would highly appreciate your input on this.

Thank you,
Natasha

@ocxtal
Copy link
Owner

ocxtal commented Jun 6, 2018

Sorry for being late. And thank you for testing minialign.

Everything seems correct if the input.fasta is Nanopore reads. If you want to change index parameters such as k-mer length (-k) and window size (-w), they must be specified when the index is created.

Thanks,

Hajime Suzuki

@npavlovikj
Copy link
Author

npavlovikj commented Jun 6, 2018

Thanks @ocxtal !
My input data is nanopore reads, and I think I will use the default index parameters for now.

Another question - one of my genomes is circular, so is adding "-c '*'" to "minialign -d" enough?

@ocxtal
Copy link
Owner

ocxtal commented Jun 7, 2018

Yes, the -c is only needed (and effective) when index is built. But you might need modify the argument because -c '*' marks all the sequences as circular. If you want to mark only specific ones such as mitochondria and chloroplast, -c chrM,chrC (comma-separated without space) would be more appropriate.

@npavlovikj
Copy link
Author

npavlovikj commented Jun 7, 2018

and the sequence name after "-c" is the name of the reference sequence?
For example, if one of my reference sequences I want to mark as circular is:

U00096.3 Escherichia coli str. K-12 substr.
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCA

I should use "-c U00096.3 Escherichia coli str. K-12 substr."?

I apologize for the question, but I didn't find much information about the proper syntax of "-c" in the manual.

@ocxtal
Copy link
Owner

ocxtal commented Jun 7, 2018

In this case the correct argument will be -c U00096.3. The fasta/q parser first splits the name row with spaces, and recognize the first column as its name and the others as comments. The comments are together saved in CO:Z tag when -T CO option is specified. (I found there was a bug in the -T CO option and fixed it just now. Sorry for inconvenience.)

@npavlovikj
Copy link
Author

This is really useful information - thank you so much @ocxtal !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants