-
Notifications
You must be signed in to change notification settings - Fork 15
Entry points into the database
This page describes the scripts that you will often use to begin your pipelines when starting a comparative genomics analysis. They will return results in formats appropriate for piping to other ITEP scripts.
Before running any ITEP scripts make sure you source the SourceMe.sh file (in the root directory of the repository) to set up your paths. This will ensure that you can run the ITEP scripts from anywhere on your machine.
$ source SourceMe.sh
You can pull out all genes that match a search string using the following command:
$ db_getGenesWithAnnotation.py "Search_string"
You can get a list of clusters containing genes that match a search string using
$ db_getClustersWithAnnotation.py "Search_string"
It returns a table containing the run ID, cluster ID, gene IDs and annotations for clusters that matched the search string (note only the genes that matched the search string are provided - if you want all of the genes in the cluster you should pipe the results into db_getGenesInClusters.py).
You can search for aliases in the same manner as searching for annotations:
$ db_getGenesWithAnnotation.py "Alias"
$ db_getClustersWithAnnotation.py "Alias"
If locus tags are available in the source Genbank files they will automatically be searchable when they are loaded. Otherwise, if you want to be able to search for a name you should add it to the aliases file ($root/aliases/aliases) and re-run setup_step1.sh.
You can get a list of cluster run IDs that are currently loaded into ITEP with the db_getAllClusterRuns.py function
$ db_getAllClusterRuns.py
You can then choose one from the list and pipe \ use it in other analyses in the other scripts.
You can get a list of valid ITEP contig IDs using
$ db_getContigs.py