Skip to content
forked from patrickwest/EukRep

Classification of Eukaryotic and Prokaryotic sequences from metagenomic datasets

License

Notifications You must be signed in to change notification settings

lfdelzam/EukRep

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EukRep

Classification of Eukaryotic and Prokaryotic sequences from metagenomic datasets

Installation

  • Requires Python3
  • Easy install with pip
$ pip install EukRep

Example Usage

  • Identify and output sequences predicted to be of eukaryotic origin from a fasta file:
$ EukRep -i <Sequences in Fasta format> -o <Eukaryote sequence output file>
  • Identify and output both sequences of eukaryotic and prokaryotic origin from a fasta file:
$ EukRep -i <Sequences in Fasta format> -o <Eukaryote sequence output file> --prokarya <Prokaryote sequence output file>

Obtaining Eukaryotic Bins

EukRep is intended to be used as one part of a larger pipeline. For obtaining high quality gene predictions and binning identified eukaryotic contigs as described in "Genome-reconstruction for eukaryotes from complex natural microbial communities" (West et al. in review), see methods section https://doi.org/10.1101/171355

-or-

See a provided example workflow (work in progess) https://github.com/patrickwest/EukRep_Pipeline

Adjusting Stringency

The stringency of identifying eukaryotic contigs can be adjusted with -m. The false positive rate (FPR) and false negative rate (FNR) for the strict, balanced, and lenient modes are shown below. Default is balanced. Prior to version 0.6.5, lenient was the default.

20kb

5kb

Data was obtained by running EukRep on 20kb and 5kb fragmented scaffolds from genomes from mock novel phyla.

Important Caveat

In our experience, most metagenomes do not have a eukaryotic genome present; however, EukRep has a false positive rate and you will still receive output in these cases.

About

Classification of Eukaryotic and Prokaryotic sequences from metagenomic datasets

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%