Phylogeny pipeline

Description

Snakemake scripts used to perform a phylogeny.

All the fasta files should contain records (at least one) from ALL the samples in our study.
The IDs from fasta files should have only 4 letters (orthomcl requeriments)(>XXXX|).
For translation to protein, transeq or biopython are available (default biopython).
For remove duplicates, two options: remove the shorter ones or remove them randomly (default random).
Default raxml bootstrap: 100.

Prerequisites

transeq from EMBOSS, in case the user prefer to use it (instead biopython) to perform the protein translation.
Biotpython and bioawk are required in any case.

Usage (slurm example)

$ srun --partition=compute --time 7-0 --mem=10G --cpus-per-task=24 --ntasks=1 --pty bash 
$ module load python/3.5.0 
(dry run) $ snakemake --snakefile snk_phylo.py --config fastadir=CDSgroups/ -j 24 -np 
$ snakemake --snakefile snk_phylo.py --config fastadir=CDSgroups/ -j 24

Steps

Obtain ortholog groups with snk_orthomcl, extract sequences using snk_groups2fasta and finally run snk_phylo to build the phylogeny tree.

Input files: Fasta files. Proteins and CDS
Output file: RAxML tree.

Coming Soon

Run MEGA calibration with raxml results.
Add different aligner (prank) in snk_phylo.
Run paml analysis with snk_groups2fasta output.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phylogeny pipeline

Description

Contents

snk_orthomcl

snk_groups2fasta

snk_phylo

Steps

Coming Soon

About

Releases

Packages

Languages

License

migrau/phylopipe

Folders and files

Latest commit

History

Repository files navigation

Phylogeny pipeline

Description

Contents

snk_orthomcl

snk_groups2fasta

snk_phylo

Steps

Coming Soon

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages