Skip to content

Summary table and summary image

pedroscampoy edited this page Sep 18, 2018 · 3 revisions

Summary table

After running plasmidID on one or many samples, you can execute the next command:

summary_table.sh GROUP_FOLDER

A folder named 00_summary, with a summary table in TSV format is generated. This table includes extended information such accession number, length, species, description, how many samples within the group have the same plasmid and the percentage of length covered.

Summary image

PlasmidID outputs a summary image inside images folder. This image shows all plasmids that matched user threshold, by default 90% of its length covered and clustered by 90% of identity.

The aim of this summary image, is to have an overview of the sample plasmidome but, since the clustering process is yet to be adapted to the available plasmid database. This database lack several necessary features:

  • Even though is a refseq database, a plasmid GenBank submission is propagated to the RefSeq collection if it is part of a larger registered genome sequencing project, without being manually curated.
  • There is no consensus of whether the start of the sequence should be the replication start sequence. Important for small circular sequences.
  • There is no plasmid taxonomy and, since they are modular, cluster them by features is hard to determine.

Those issues makes the clustering process much harder:

  • cd-hit doesn't handle circular sequences (even circular mode in psi-cd-hit).
  • Erroneous submissions makes larger wrongly-assembled sequences to be reference of plasmids which actually are in the sample.

To determine non properly clusterized plasmids, the summary image is created with links between the same contigs in order to choose only the most appropriate. The summary image is constructed as follow:

  1. Plasmids selected are placed in a single circle:

01_img_guide


  1. Links are established between the same contig when they are present in two different plasmids.

01_img_guide


  1. User determine the number of different plasmids present in the sample analyzed.

01_img_guide


The final determination is manual, user has to decide the best plasmid configuration but, with all information supplied by PlasmidID and this guide, this task should be easier 😊.