Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help needed in understanding Saute output #40

Open
swarnalilouha opened this issue Jul 16, 2023 · 6 comments
Open

Help needed in understanding Saute output #40

swarnalilouha opened this issue Jul 16, 2023 · 6 comments

Comments

@swarnalilouha
Copy link

I used Saute to assemble a reference fasta sequence '>CRYPT1020_1' from Illumina reads. I got 2 assemblies:

CRYPT1020_1:1:1:87926 2 4 1
CRYPT1020_1:1:2:87918 2 3 1

Why are there 2 assemblies and what do the numbers in the fasta headers mean?

@souvorov
Copy link
Collaborator

Often time assemblies have multiple variants. The simplest case are SNPs. SAUTE arranges all variants in a graph (output is controlled by --gfa option). You can analyze this graph if you install BANDAGE (https://rrwick.github.io/Bandage/). Up to 1000 variants are printed by SAUTE in --all_variants in the fasta format.
The first part of the fasta ID is Target name:graph number:contig number:estimated k-mer count. After that the numbers of the used graph nodes are printed separated by a space.
From what you posted one can say that your graph has two variants. The difference is represented by nodes 3 and 4. You should either look at the graph or align two contigs to understand what kind of difference they have.

@atongsa
Copy link

atongsa commented Jan 27, 2024

Can SAUTE be used to assemble whole genome sequencing (WGS) data for humans?

@souvorov
Copy link
Collaborator

souvorov commented Jan 30, 2024 via email

@atongsa
Copy link

atongsa commented Feb 1, 2024

yes, i not mean the whole genome, but only specific genes in human genome with SAUTE using human WGS

@souvorov
Copy link
Collaborator

souvorov commented Feb 4, 2024

Try the target sequences slightly exceeding the area of the gene of interest. It should work, unless there are large insertions/deletions/rearrangements inside the gene introns.

@atongsa
Copy link

atongsa commented Feb 7, 2024

thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants