Skip to content

miwipe/getLCA

 
 

Repository files navigation

getLCA

################################# GUIDE TO RUN THE SCRIPT getLCA.py #################################

  1. Download the NCBI taxonomy files, names.dmp and nodes.dmp

    mkdir taxdump wget ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz tar -xzf taxdump.tar.gz -C taxdump rm taxdump.tar.gz

  2. Update the paths to names.dmp and nodes.dmp on line 3 and 4 in the script get_LCA_functions.py

  3. Map reads to DB using bowtie (or a mapper of your choice) - see Prepare_ref_DB_guide.md

    bowtie2 -k 500 -p 24 -f -x $DB -U $infile --no-unal > $filename.unsorted.sam

  4. Sort samfile

    sort -k1 $filename.unsorted.sam > $filename.sam rm $filename.unsorted.sam

  5. Assign LCA

    python get_LCA.py $filename.sam

OPTIONAL: 6) Report a sorted list of vertebrate species assigned below family level:

cat $filename.getLCA|grep -v 'NOMATCH'|grep 'genus\|family\|species\|subfamily\|subspecies\|subgenus'|grep 'Vertebrata'|awk '{print $2}'|sort|uniq -c|sort -nk1|awk -F ':' '{print $1}'

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%