-
Notifications
You must be signed in to change notification settings - Fork 0
This software has been completely rewritten. Please, use shapemapper2, unless you have a good reason to use this instead. ShapeMapper converts raw sequencing files into mutational profiles, creates SHAPE reactivity plots, and provides extensive diagnostic information useful for experiment analysis and troubleshooting.
License
Weeks-UNC/ShapeMapper_v1.2
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
################################################################################### ShapeMapper installation, execution, and troubleshooting. Steven Busan 2014 ################################################################################### Requirements: =================================================================================== python 2.7 =================================================================================== RNAStructure 2 (only required if performing structure prediction) - Download command-line applications for your platform Extract to home directory add following 2 lines to ~/.bash_profile export PATH=$PATH:$HOME/RNAstructure/exe export DATAPATH=$HOME/RNAstructure/data_tables =================================================================================== Bowtie2 (required for sequence alignment) - If on a cluster environment with the Modules package installed: run the command "module initadd bowtie2" Then log off and log back in. Otherwise, to install locally: Download bowtie2 binary for your platform Extract to any directory add folder location to PATH in ~/.bash_profile, just as for RNAstructure =================================================================================== matplotlib (python module required for .pdf figure rendering) - Download source Extract to any directory cd to the extracted directory run the command "python setup.py install --user" =================================================================================== httplib2 (python module only required if rendering structures) - Download httplib2-0.7.6.tar.gz (or later version) Extract to any directory cd to httplib2 directory run the command "python setup.py install --user" =================================================================================== ShapeMapper itself - Extract files to any directory Add ShapeMapper directory location to PATH Build the C and C++ modules: cd to the ShapeMapper directory run the command "make" Make sure the files ShapeMapper.py and pvclient.py are both executable from your account. If not, cd to the ShapeMapper directory and run the commands "chmod +x pvclient.py" and "chmod +x ShapeMapper.py" ################################################################################### ################################################################################### Execution instructions: Setup: Optional: Obtain the example dataset containing bacterial ribosome and the TPP riboswitch from the Sequence Read Archive, accession SRP052065. Create a folder on a filesystem with space available for the large intermediate files generated by the pipeline. Make sure no spaces are in any of the folder names parent to this directory (bowtie2's perl wrapper seems to fail otherwise). Put .fastq files in this directory. If they are compressed, uncompress them. Create a .fa FASTA-formatted sequence file for each target sequence: The filename must exactly match the 1st line of the file after the ">" char. There should be no whitespace between the ">" character and the title. Use "T" not "U", and all capital letters for sequence. Copy the EXAMPLE.cfg file to a new name in the same folder as the FASTQ files. To run locally: cd to directory containing .fastq read files, .fa reference sequence files, and a .cfg file run the command "ShapeMapper.py yourfile.cfg" To run in a cluster environment with LSF: cd to directory containing .fastq read files, .fa reference sequence files, and a .cfg file run the command: "bsub -q week -n 6 -o run.out -R span[ptile=6] ShapeMapper.py yourfile.cfg" ################################################################################### Output description and troubleshooting: Outputs are listed in the order of execution: A "log.txt" file will be created in the run folder. If running in a cluster environment, also check the file "run.out" for memory errors, etc. which will not be recorded in the log file. A folder "temp" is created that stores subprocess stdout and stderr during pipeline execution. It can be safely deleted after pipeline completion. A file "temp_config.pickle" is created to store configuration options for easy loading by pipeline stages. It can be safely deleted after pipeline completion. A folder "output" is created that will hold the bulk of the pipeline output. Quality trimmed reads (made by running trimPhred) are written to output/trimmed_reads/ Bowtie2 reference sequence indices are written to output/bowtie_index/ Sequence alignment files are written to output/aligned_reads/ Parsed and simplified alignments are written to output/mutation_strings/ Mutation counts and sequencing depths are written to comma-separated files in output/counted_mutations/ The same files are also written in column form to output/counted_mutations_columns/ SHAPE reactivity files (.shape) are written to output/reactivity_profiles/ Tab-separated column files containing per-nucleotide depths, total mutation rates, reactivities, and standard error estimates are written to .tab files in output/reactivity_profiles/ SHAPE-MaP reactivity files (.map) are written to output/reactivity_profiles/ These are the same as SHAPE files but contain 2 additional columns: standard error and nucleotide sequence. Reactivity profile images and sequencing depth images are written to .pdf files in output/reactivity_profiles/ Check the depth image to troubleshoot. If using directed primers (i.e. not random priming): The depths should be flat or very nearly so on a log scale plot (Unfortunately log scale is currently disabled because of a matplotlib bug). Bumpy depths indicate off-target primer binding, something the pipeline is not currently set up to handle. A workaround in this case is to increase the minLength config option to ensure that all reads included in the analysis completely cover the desired region. If using random primers: Regions of low depth are undesired. For some RNAs with pockets of low GC content, the distribution of coverage can be improved by using specially designed random primers. See the supplemental information in the first SHAPE-MaP publication for details. Check the reactivity profiles to troubleshoot: Many or large negative peaks may indicate little signal above background. This can result from high mutation rates in the background condition, or from low mutation rates in the +reagent condition - sometimes from DNA contamination. Large error bars indicate that more sequencing depth is needed for accurate reactivity determination. Do not use SHAPE profiles with large error bars for structure modeling! Images of histograms of mutation rates, sequencing depths, and reactivities are written to .pdf files in output/reactivity_profiles/ Check these images to troubleshoot an experiment. Background mutation rates should peak near 0. +Reagent mutation rates should peak above those of the background. Sequencing depths should mostly fall above 5-10k. More is better. Reactivities should be mostly positive. If performing structure prediction: Sequence files (.seq) are written to output/folds/ Structure predictions are written to .ct files in output/folds/ If rendering structures: Postscript image files (.eps) for the lowest predicted free energy structure for each RNA are written to output/folds/ XRNA files (.xrna) for the lowest predicted free energy structure are written to output/folds/
About
This software has been completely rewritten. Please, use shapemapper2, unless you have a good reason to use this instead. ShapeMapper converts raw sequencing files into mutational profiles, creates SHAPE reactivity plots, and provides extensive diagnostic information useful for experiment analysis and troubleshooting.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published