GitHub - Weeks-UNC/ShapeMapper_v1.2: This software has been completely rewritten. Please, use shapemapper2, unless you have a good reason to use this instead. ShapeMapper converts raw sequencing files into mutational profiles, creates SHAPE reactivity plots, and provides extensive diagnostic information useful for experiment analysis and troubleshooting.

Weeks-UNC / ShapeMapper_v1.2 Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

This software has been completely rewritten. Please, use shapemapper2, unless you have a good reason to use this instead. ShapeMapper converts raw sequencing files into mutational profiles, creates SHAPE reactivity plots, and provides extensive diagnostic information useful for experiment analysis and troubleshooting.

weekslab.com/software/

GPL-3.0 license

0 stars 0 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitattributes		.gitattributes
16S.fa		16S.fa
CHANGELOG		CHANGELOG
COPYING		COPYING
EXAMPLE.cfg		EXAMPLE.cfg
Makefile		Makefile
README		README
ShapeMapper.py		ShapeMapper.py
TPP_riboswitch.fa		TPP_riboswitch.fa
conf.py		conf.py
countMutations.cpp		countMutations.cpp
countMutations.py		countMutations.py
defaults.cfg		defaults.cfg
generateReactivityProfiles.py		generateReactivityProfiles.py
makeOldMutationStrings.py		makeOldMutationStrings.py
parseAlignment.cpp		parseAlignment.cpp
parseConfigFile.py		parseConfigFile.py
parsed_reads.h		parsed_reads.h
pivotCSV.py		pivotCSV.py
pvclient.py		pvclient.py
ref_seq.h		ref_seq.h
string_funcs.h		string_funcs.h
trimPhred.c		trimPhred.c

Repository files navigation

###################################################################################
ShapeMapper installation, execution, and troubleshooting.
Steven Busan 2014

###################################################################################
Requirements:

===================================================================================
python 2.7

===================================================================================
RNAStructure 2 (only required if performing structure prediction) -

Download command-line applications for your platform
Extract to home directory

add following 2 lines to ~/.bash_profile

    export PATH=$PATH:$HOME/RNAstructure/exe
    export DATAPATH=$HOME/RNAstructure/data_tables
    
===================================================================================
Bowtie2 (required for sequence alignment) - 

If on a cluster environment with the Modules package installed:
    run the command "module initadd bowtie2"
    Then log off and log back in.

Otherwise, to install locally:
    Download bowtie2 binary for your platform
    Extract to any directory
    add folder location to PATH in ~/.bash_profile, just as for RNAstructure

===================================================================================
matplotlib (python module required for .pdf figure rendering) - 

    Download source
    Extract to any directory
    cd to the extracted directory
    run the command "python setup.py install --user"

===================================================================================
httplib2 (python module only required if rendering structures) - 

    Download httplib2-0.7.6.tar.gz (or later version)
    Extract to any directory
    cd to httplib2 directory
    run the command "python setup.py install --user"

===================================================================================
ShapeMapper itself - 

    Extract files to any directory
    Add ShapeMapper directory location to PATH

    Build the C and C++ modules:
        cd to the ShapeMapper directory
        run the command "make"

    Make sure the files ShapeMapper.py and pvclient.py are both executable 
    from your account. If not, cd to the ShapeMapper directory and run the commands 
    "chmod +x pvclient.py" and "chmod +x ShapeMapper.py"

###################################################################################

###################################################################################    
Execution instructions:


Setup:
    Optional: Obtain the example dataset containing bacterial ribosome and the TPP
    riboswitch from the Sequence Read Archive, accession SRP052065. 
    
    Create a folder on a filesystem with space available for the large intermediate
    files generated by the pipeline. Make sure no spaces are in any of the folder
    names parent to this directory (bowtie2's perl wrapper seems to fail otherwise).
    Put .fastq files in this directory. If they are compressed, uncompress them.
    Create a .fa FASTA-formatted sequence file for each target sequence:
        The filename must exactly match the 1st line of the file after the ">" char.
        There should be no whitespace between the ">" character and the title.
        Use "T" not "U", and all capital letters for sequence.
    Copy the EXAMPLE.cfg file to a new name in the same folder as the FASTQ files.

To run locally:
    cd to directory containing .fastq read files, .fa reference sequence files, 
    and a .cfg file
    run the command "ShapeMapper.py yourfile.cfg"

To run in a cluster environment with LSF:
    cd to directory containing .fastq read files, .fa reference sequence files, 
    and a .cfg file
    run the command: 
     "bsub -q week -n 6 -o run.out -R span[ptile=6] ShapeMapper.py yourfile.cfg"


###################################################################################
Output description and troubleshooting:


Outputs are listed in the order of execution:
    
A "log.txt" file will be created in the run folder. If running in a cluster 
environment, also check the file "run.out" for memory errors, etc. which will not 
be recorded in the log file.

A folder "temp" is created that stores subprocess stdout and stderr during pipeline
execution. It can be safely deleted after pipeline completion.

A file "temp_config.pickle" is created to store configuration options for easy
loading by pipeline stages. It can be safely deleted after pipeline completion.

A folder "output" is created that will hold the bulk of the pipeline output.

Quality trimmed reads (made by running trimPhred) are written to 
output/trimmed_reads/

Bowtie2 reference sequence indices are written to output/bowtie_index/

Sequence alignment files are written to output/aligned_reads/

Parsed and simplified alignments are written to output/mutation_strings/

Mutation counts and sequencing depths are written to comma-separated files in
output/counted_mutations/
The same files are also written in column form to output/counted_mutations_columns/

SHAPE reactivity files (.shape) are written to output/reactivity_profiles/

Tab-separated column files containing per-nucleotide depths, total mutation rates, 
reactivities, and standard error estimates are written to .tab files in 
output/reactivity_profiles/

SHAPE-MaP reactivity files (.map) are written to output/reactivity_profiles/
These are the same as SHAPE files but contain 2 additional columns: standard error
and nucleotide sequence.

Reactivity profile images and sequencing depth images are written to .pdf files in
output/reactivity_profiles/
Check the depth image to troubleshoot.
    If using directed primers (i.e. not random priming):
        The depths should be flat or very nearly so on a log scale plot
        (Unfortunately log scale is currently disabled because of a matplotlib
        bug). Bumpy depths indicate off-target primer binding, something the 
        pipeline is not currently set up to handle. A workaround in this case is 
        to increase the minLength config option to ensure that all reads included 
        in the analysis completely cover the desired region.
    If using random primers:
        Regions of low depth are undesired. For some RNAs with pockets of low GC
        content, the distribution of coverage can be improved by using specially 
        designed random primers. See the supplemental information in the first
        SHAPE-MaP publication for details.
Check the reactivity profiles to troubleshoot:
    Many or large negative peaks may indicate little signal above background. This
    can result from high mutation rates in the background condition, or from low
    mutation rates in the +reagent condition - sometimes from DNA contamination.
    Large error bars indicate that more sequencing depth is needed for accurate
    reactivity determination. Do not use SHAPE profiles with large error bars for
    structure modeling!

Images of histograms of mutation rates, sequencing depths, and reactivities are 
written to .pdf files in output/reactivity_profiles/
Check these images to troubleshoot an experiment.
    Background mutation rates should peak near 0.
    +Reagent mutation rates should peak above those of the background.
    Sequencing depths should mostly fall above 5-10k. More is better.
    Reactivities should be mostly positive.

If performing structure prediction:
    Sequence files (.seq) are written to output/folds/
    Structure predictions are written to .ct files in output/folds/

If rendering structures:
    Postscript image files (.eps) for the lowest predicted free energy structure
    for each RNA are written to output/folds/
    XRNA files (.xrna) for the lowest predicted free energy structure are
    written to output/folds/