The whole pipeline runs with pipeline.sh as the main program and it runs 4 sub programs:
getfiles.sh,create.sh,intersect.sh,table.sh (and lollipop.sh as a bonus for diagrams)
###pipeline.sh
This is the hub here you can use -g to get files, -c to create, -i to use intersect, -t for the tables, and -l for generating lollipop plots. You can also pass some optional arguments into each subscript used.
###getfiles.sh
Here is where the initial files are collected and formatted (if they don't already exist). VEP annotation is optional because it takes very long but the proper annotation is necessary to complete the pipeline.
###create.sh
This is where the files for the domain regions and the non-domain regions (or nodoms) are created. These are used further down the pipeline to generate the tables and intersections. The coverage value is an optional argument that can be passed in, the default is 5.
###intersect.sh
In this program, the files generated by create.sh, the doms and nodoms, are intersected with VEP annotated ExAC data to generate a conglomerated BED file of all SNP intersections across all exons annotated by region.
###tables.sh
Here, a region-based table is generated with variant density, domain prevalence (1 for nodoms), FVRV (fraction of very rare variants) dN, dS, and dN/dS are placed into tables and made into a mock-BED12 format (such that the information encompasses all exons for a domain autoreg - a unique domain identifier). Also, there is a gene-based table that is formatted much the same way.
###lollipop.sh
This part of the pipeline is totally optional, but it is good for illustrating how a protein's variation looks visually with height of lollipops determing the rarity of a variant and the color determining the type - they are placed by AA coordinates.
END OF README
***