Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ac homer #379

Merged
merged 48 commits into from
Mar 20, 2018
Merged
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
4be5d06
I have added a homer implimentation
Oct 30, 2017
149cda5
pep8 changes
Oct 31, 2017
b2cb4d0
updated homer reporting
Nov 6, 2017
1344884
I have removed mutiqc from report temporarily
Nov 6, 2017
181e342
Updated homer documentation
Nov 9, 2017
6e4e3cb
Adressed pep8 errors
Nov 9, 2017
b079be1
Revert "I have removed mutiqc from report temporarily"
Nov 9, 2017
b90798d
Revert "Adressed pep8 errors"
Nov 9, 2017
1efc71d
Revert "Revert "I have removed mutiqc from report temporarily""
Nov 9, 2017
a2c60e9
Revert "Updated homer documentation"
Nov 9, 2017
405f368
added changes to pipline.ini
jencyw Nov 10, 2017
7981b29
documentation for pipeline_homer pipeline.ini
jencyw Nov 23, 2017
54859cd
I have added active if statememts for the homer pipline
Nov 23, 2017
bc69667
I have implimented the first fuction for deeptools
Nov 23, 2017
70a0f3e
I have added the function for fingerprints; going to test
jencyw Nov 23, 2017
930b6b5
deeptools_Fingerprint is done; it works
jencyw Nov 23, 2017
6264b00
Added fragment size function
Nov 24, 2017
58460b4
I have added functions for converting bams to bigwigs
Nov 24, 2017
7931a82
merge changes from MA
Nov 24, 2017
c3a06cf
I have added several functions in deeptools
jencyw Nov 29, 2017
f29616b
the updated pipeline.ini for pipeline_homer.py
jencyw Nov 29, 2017
cee8ed4
Add some functions for computeMatrix, plotHeatmap and plotProfile tho…
jencyw Nov 30, 2017
2010a07
Merge branch 'master' into AC-homer
Nov 30, 2017
dc7b0d5
Changed name of pipeline
Nov 30, 2017
409b9bc
have added extra functions computeMatrix and plotHeatmap
Nov 30, 2017
1b899eb
have finished the required functions of deeptools; currently testing …
jencyw Dec 4, 2017
edcb57b
some updated changes but still some bugs retained
jencyw Dec 11, 2017
e61cb62
I have fixed the bamCompare function so the correct input is now parsed
Dec 11, 2017
01573d1
have fixed bugs and pipelie seems to work. Will add test data
Dec 11, 2017
b122a91
pep8
Dec 11, 2017
f43dd7c
have added various bux fixes and it now works and conforms to pep8
Dec 12, 2017
b27c241
I have removed the pipeline that managed to find its way into the dir…
Dec 12, 2017
a3ce233
have placed samples for homer in Tag.dir
Dec 12, 2017
6487f80
have added example design.tsv and various bug fixes
Dec 12, 2017
172a132
fix to get around lack of design file in Travis testing
Dec 12, 2017
966891d
forgot sample_dict for testing too
Dec 12, 2017
242d126
pep8
Dec 12, 2017
f1eb479
I have updated jupyter notebooks and renamed pipeline_docs folder
Dec 12, 2017
ad3c7e8
fix for report
Dec 12, 2017
e6e6616
Merge branch 'master' into AC-homer
Dec 19, 2017
257d475
various small fixes
Jan 19, 2018
045d6db
updated docmentation
Jan 19, 2018
fa22022
have added a pause in findPeaks to avoid missing file ruffus errors
Feb 19, 2018
cc388fb
increased sleep for chiptools
sebastian-luna-valero Mar 6, 2018
0ec1db4
add sleeps to pipeline_chiptools.py; ruffus file not found issue
sebastian-luna-valero Mar 20, 2018
d70a197
add deeptools, homer, picard to scripts/cgat_conda_deps.sh
sebastian-luna-valero Mar 20, 2018
14bc226
merge with master
sebastian-luna-valero Mar 20, 2018
185d344
update conda environments
sebastian-luna-valero Mar 20, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
971 changes: 971 additions & 0 deletions CGATPipelines/pipeline_chiptools.py

Large diffs are not rendered by default.

Empty file.
4 changes: 4 additions & 0 deletions CGATPipelines/pipeline_chiptools/example_design.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
SampleID Tissue Factor Condition Treatment Replicate bamReads ControlID bamControl
S1 Th17 K4 DMSO DMSO 1 hsTh1-Tbet-R1.bam C1 hsTh1-TbetInput-R1.bam
S3 Th17 K4 J4 J4 2 neural-SMC3-1.bam C3 neural-input-1.bam
S4 Th17 K4 J4 J4 2 neural-SMC3-2.bam C4 neural-input-1.bam
279 changes: 279 additions & 0 deletions CGATPipelines/pipeline_chiptools/pipeline.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,279 @@
######################################################################################################
# Pipeline pipeline_chiptools.py configuration file
#
######################################################################################################
## general options

[general]
database = csvdb

# Specify with a 1 or 0 which tool you require to run, currently homer and deeptools are implimented
homer=1
deeptools=1

######################################################################################################
## Homer ##
######################################################################################################

[homer]

# perl /ifs/apps/bio/homer-4.9//configureHomer.pl -list
# to check which packages are installed and which are available to download
# installed the packages needed before running
# (e.g.: perl /ifs/apps/bio/homer-4.9//configureHomer.pl -install hg19)

######################################################################################################
# MakeTagDirectory #
# Creating a "Tag Directory" to facilitate the analysis of ChIP-Seq #
######################################################################################################

# Alignment files in one of the following formats: BED, SAM, BAM
# For BAM format, HOMER will use "samtools view file.BAM > file.SAM" to covert to a SAM file
# "samtools" must be available
# The genome according to the homer is needed to the function "-checkGC"
# You need to make sure that homer has the genome installed (e.g. hg19)
# Genomes that currently installed: hg19, mm10

maketagdir_genome=hg19

######################################################################################################
# #
# findpeaks: Finding Enriched Peaks, Regions, and Transcripts #
# #
######################################################################################################

# parameters= histone (broad peaks), factor (sharp)
# factor: a FIXED width peak size is used; suitable for sharp peaks
# histone: Peak finding for broad regions of enrichment; This analysis finds variable-width peaks
# output for factor: peaks.txt
# output for histone: regions.txt
# could name the output by changing "auto" to <filename.txt>

findpeaks_style=histone
findpeaks_output=auto
findpeaks_options=

######################################################################################################
# BED-coversion #
# Converting peak files to BED files for uploading peak files to the UCSC Genome Browser #
######################################################################################################

BED_options=

######################################################################################################
# AnnotatePeaks #
# Annotatinge Regions in the Genome #
# Quantifying Data and Motifs and Comparing Peaks/Regions in the Genome #
######################################################################################################

# Program (annotatePeak.pl) also contains:
# Associate peaks with nearby genes, Gene Ontology Analysis, genomic feature association analysis (Genome Ontology)
# Associate peaks with gene expression data, calculate ChIP-Seq Tag densities from different experiments
# Find motif occurrences in peaks
# By default, annotatePeaks.pl assigns peaks to the nearest TSS

annotatepeaks_genome=hg19

# getDiffExpression.pl
# Quantifying Differential Features (Enrichment/Expression)
# to report the raw read counts > countTable.txt
# Apply the countTable.tx to find differentially regulated features (output:diffOutput.txt)


diffannotat_raw=1

annotate_raw_region=tss
annotate_raw_options=-raw
annotate_raw_genome=hg19

######################################################################################################
# motif
# Finding Enriched Motifs in Genomic Regions
######################################################################################################

# parameter for size is mandatory, to know exactly what size of the regions you are analyzing
# Parameters= 200 (default), given (to find motifs using your peaks with their exact sizes)
# For transcription factor (TF) peaks, most of the motifs are found +/- 50-75 bp from the peak center
# so for TF, using a fixed size rather than depend on your peak size is better

motif_genome=hg19
motif_size=200
motif_options=peakAnalysis

######################################################################################################
# diff_expr
# Quantifying Differential Features (Enrichment/Expression)
######################################################################################################

# Fill in the details of samples in the design table (design.tsv)
# Treatment: control/ drug
# For diff_expr_group example: Mock Mock WNT WNT. This has to be specified in the same order as in the countTable file
diff_expr=1
diff_expr_group=DMSO DMSO J4 J4
diff_expr_options=

######################################################################################################
# getDiffPeaksReplicates
# Identifying peaks from replicates
######################################################################################################
# Ultimately passes these values to the R/Bioconductor package DESeq2
# to calculate enrichment values for each peak
# Return only peaks that pass a given fold enrichment (default: 2-fold) and FDR cutoff (default 5%)

diff_repeats=0
diff_repeats_options=
diff_repeats_genome=hg19


######################################################################################################
## Deeptools ##
# http://deeptools.readthedocs.io/en/latest/index.html#) #
# deepTools is a suite of python tools particularly developed for the efficient analysis #
# of high-throughput sequencing data, such as ChIP-seq, RNA-seq or MNase-seq #
# Input files are indexed Bam files, listed in the file of "design.tsv" #
# All the plot-related funtions, the outputs will be saved in the folder called "Plot.dir" #
# Plots are mainly in eps format which could be further edited in Adobe Illustrator. #
## Functions are specified ##
######################################################################################################

[deep]

# Specify which bigwig conversion tools you would like to use for your samples
# options include bamCompare or bamCoverage
# bamCoverage: generate bigwig files from bam files
# banCompare: generate a bigwig file based on two BAM files that are compared to
# each other while being simultaneously normalized for sequencing depth

bam_compare=1
bam_coverage=0

# is the bam file paired?
paired_end=0

# ignore duplicates for the plotCoverage, plotFingerprint and multiBamSummary
# plotCoverage: to assess the sequencing depth of a given sample
# plotFingerprint: the quality control for ChIP-seq experiments. ChIP samples compared with input.

ignore_dups=1

# minimum mapping quality
# parameter used in plotCoverage and plotFingerprint
# default is 10
mapping_qual=10

########################################################################################################
# For PEFragmentSize #
########################################################################################################
# calculates the fragment sizes for read pairs given a BAM file from paired-end sequencing
# logscale: for plotting in log scale set as 1 or 0

logscale=0


########################################################################################################
# For bamCoverage and bamCompare #
# Output file is set as bigwig files; store in the "Bwfiles.dir" #
########################################################################################################
# binsize: defaults are 10 but needs to be set
# this parameter is shared with multiBamSummary
binsize=10

# A list of space-delimited chromosome names containing those chromosomes that should be excluded
# for computing the normalization.
# Useful when considering samples with unequal coverage across chromosomes, like male samples.
# e.g. chrX chrY.
# please set to None if not required
ignore_norm=chrX chrY

# This parameter allows the extension of reads to fragment size.
# If set, each read is extended, without exception. NOTE: This feature
# is generally NOT recommended for spliced-read data, such as RNA-seq,
# as it would extend reads over skipped regions
# set as 1 (True) or 0 (False)
extendreads=0

# Optional argument in case you would like to add in the command line
bamcoverage_options=

# These are optional bamCompare arguments and are specified as
# you would on the commandline
bamcompare_options=

########################################################################################################
# For multiBamSummary and multiBigwigSummary #
########################################################################################################

# multiBamSummary: the read coverages for genomic regions for typically two or more BAM files
# multiBigWigSummary: the average scores for each of the files, typically two or more, in every genomic region

# mode_setting: choices for bins, BED-file
# default is None (bins) or file (e.g.: ooo.bed)
# bins: Consecutive bins of equal size (10 kilobases by default)
# The bin size (set with bamCoverage)
# Distance between bins can be adjusted
# ignore_Dups was set together with plotCoverage and plotFingerprint

mode_setting=None
summary_options=

########################################################################################################
# For plotCorrelation and plotPCA #
########################################################################################################
# Correlation methods: spearman, pearson
# Plot: plot type; heatmap, scatterplot
# colormap:e.g.:RdBu_r; check http://matplotlib.org/users/colormaps.html
# Filetype: png, pdf, svg, eps
# plot_options are optional extras to supply, see deeptools documentation

cormethod=spearman
plot=scatterplot
colormap=RdBu
filetype=pdf
plot_options=


########################################################################################################
# For computeMatrix and plotHeatmap, plotProfile #
########################################################################################################
# startfactor: scale-regions, reference-point
# regions= TSS, TES, center; default is TSS.
# region_length used when the startfactor set as scale-regions; default is 1000
# bedfile: for region (e.g.: xxx.bed)
# out_namematrix: name for the file (.tab; Path of directory needed); a tab file is generated and could put in R
# out_sorted: name for the file (BED; Path of direcotory needed); regions with the sorting order selected

startfactor=reference-point
regions=TSS
region_length=1000
bedfile=/ifs/projects/adam/homer_test/cpgislands.bed
# Just the name of what you want the matrix to be called.
out_namematrix=test.tab
out_sorted=test.bed

#########################################################################################################
# Advanced arguments for computeMatrix #
#########################################################################################################
# brslength(BeforeRegionStartLength): e.g.:1000
# arslength(AfterRegionStartLength)

brslength=1000
arslength=0
matrix_options=

#########################################################################################################
# For plotHeatmap and plotProfile #
#########################################################################################################

# dpi: set the resolution for figures
# legendlocation: best, upper-right, upper-left, upper-center, lower-left, lower-right,
# lower-center, center, center-left, center-right, none (does not work for profile)
# refpointlabel: label shown for the reference-point; TSS, TES, center, peak start
# plottype: for plotProfile, options: line or heatmaplines, fill, se, std, overlapped_lines, heatmap
# pergroup: plots all samples by group of regions

kmeans=3
dpi=300
legendlocation=upper-left
refpointlabel="center"
plottype=heatmap
pergroup=

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Loading