Help needed in understanding Saute output #40

swarnalilouha · 2023-07-16T02:00:09Z

I used Saute to assemble a reference fasta sequence '>CRYPT1020_1' from Illumina reads. I got 2 assemblies:

CRYPT1020_1:1:1:87926 2 4 1
CRYPT1020_1:1:2:87918 2 3 1

Why are there 2 assemblies and what do the numbers in the fasta headers mean?

souvorov · 2023-07-16T17:08:08Z

Often time assemblies have multiple variants. The simplest case are SNPs. SAUTE arranges all variants in a graph (output is controlled by --gfa option). You can analyze this graph if you install BANDAGE (https://rrwick.github.io/Bandage/). Up to 1000 variants are printed by SAUTE in --all_variants in the fasta format.
The first part of the fasta ID is Target name:graph number:contig number:estimated k-mer count. After that the numbers of the used graph nodes are printed separated by a space.
From what you posted one can say that your graph has two variants. The difference is represented by nodes 3 and 4. You should either look at the graph or align two contigs to understand what kind of difference they have.

atongsa · 2024-01-27T11:31:48Z

Can SAUTE be used to assemble whole genome sequencing (WGS) data for humans?

souvorov · 2024-01-30T15:27:03Z

SAUTE was designed for assembling bacterial genes. It is not appropriate for assembling the human genome. From: atongsa ***@***.***> Sent: Saturday, January 27, 2024 6:32 AM To: ncbi/SKESA ***@***.***> Cc: Souvorov, Alexander (NIH/NLM/NCBI) [E] ***@***.***>; Comment ***@***.***> Subject: [EXTERNAL] Re: [ncbi/SKESA] Help needed in understanding Saute output (Issue #40) Can SAUTE be used to assemble whole genome sequencing (WGS) data for humans? - Reply to this email directly, view it on GitHub<#40 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGIEUFRKGMSTYC74O2H4X2DYQTQS7AVCNFSM6AAAAAA2LUBT3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJTGEZDKNZZGQ>. You are receiving this because you commented.Message ID: ***@***.******@***.***>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and are confident the content is safe.

atongsa · 2024-02-01T11:27:14Z

yes, i not mean the whole genome, but only specific genes in human genome with SAUTE using human WGS

souvorov · 2024-02-04T23:35:06Z

Try the target sequences slightly exceeding the area of the gene of interest. It should work, unless there are large insertions/deletions/rearrangements inside the gene introns.

atongsa · 2024-02-07T11:25:25Z

thank you very much

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help needed in understanding Saute output #40

Help needed in understanding Saute output #40

swarnalilouha commented Jul 16, 2023

souvorov commented Jul 16, 2023

atongsa commented Jan 27, 2024

souvorov commented Jan 30, 2024 via email

atongsa commented Feb 1, 2024

souvorov commented Feb 4, 2024

atongsa commented Feb 7, 2024

Help needed in understanding Saute output #40

Help needed in understanding Saute output #40

Comments

swarnalilouha commented Jul 16, 2023

souvorov commented Jul 16, 2023

atongsa commented Jan 27, 2024

souvorov commented Jan 30, 2024 via email

atongsa commented Feb 1, 2024

souvorov commented Feb 4, 2024

atongsa commented Feb 7, 2024