Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ay-lab authored Sep 21, 2018
1 parent 275a344 commit 233c758
Showing 1 changed file with 84 additions and 46 deletions.
130 changes: 84 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -567,7 +567,8 @@ The script "MergeBAMInferPeak.sh" within the folder "Imp_Scripts" processes a se

Output files generated from this script:
1) ${OutBaseDir}/merged_input.bam: merged ChIP-seq alignment file
2) ${OutBaseDir}/MACS2_Out/out.macs2_peaks.narrowPeak: Peak (derived by MACS2) file generated by merged ChIP-seq alignment.
2) ${OutBaseDir}/MACS2_Out/out.macs2_peaks.narrowPeak: Peak (derived by MACS2)
file generated by merged ChIP-seq alignment.
3) ${OutBaseDir}/ChIPCoverage1.bed, ${OutBaseDir}/ChIPCoverage2.bed, ..., ${OutBaseDir}/ChIPCoverageN.bed,
where N is the number of input ChIP-seq alignment files. Each of these output files contain the
coverage of corresponding input alignment file (with respect to the specified bin size).
Expand All @@ -579,57 +580,94 @@ The R script "DiffAnalysisHiChIP.r" within the folder "Imp_Scripts" is the code
FitHiChIP loops, having M and N replicates, respectively. These two categories may correspond to two different cell types
or cell lines. However, bin size employed for all of these interactions should be identical.

Parameters associated with this script are as follows:

--AllLoopList FILELIST Comma or colon separated list of FitHiChIP loop files from all the replicates
of both input categories, where individual interaction files are of the
format: PREFIX.interactions_FitHiC.bed. That is, all significant and
non-significant interactions of individual categories and replicates are
used as the input (without applying any FDR threshold). MANDATORY PARAMETER.
--FDRThrLoop Threshold FDR threshold used for FitHiChIP loops. Default = 0.01 (same used in
FitHiChIP implementation)


make_option(c("--AllLoopList"), type="character", default=NULL, help="Comma or colon separated list of all possible interactions, along with their FItHiChIP significance value (would include both significant and insignificant loops as well). Mandatory parameter."),
make_option(c("--FDRThrLoop"), type="numeric", default=0.01, help="FDR threshold used for FitHiChIP loops. Default = 0.01 (same as FitHiChIP implementation)."),

make_option(c("--OutDir"), type="character", default=NULL, help="Base Output directory. Mandatory parameter."),

make_option(c("--UseRawCC"), type="integer", action="store", default=0, help="If 1, uses the raw contact count for replicate analysis. If 0, uses the observed contact count (c), multiplied by the ratio (c/e) (upto nearest integer) where e is the expected contact count of this locus pair, according to the spline fit and bias regression model of FitHiChIP (listed in the field exp_cc_bias of FitHiChIP loop file). Default value of this parameter is 0."),

# make_option(c("--BiasFileList"), type="character", default=NULL, help="Comma or colon separated list of bias files generated by FitHiChIP loops. Mandatory parameter."),

make_option(c("--PeakFileCat1"), type="character", default=NULL, help="Peak file used for the samples in the first category (to infer peak specific HiChIP loops). Mandatory parameter."),
make_option(c("--PeakFileCat2"), type="character", default=NULL, help="Peak file used for the samples in the second category (to infer peak specific HiChIP loops). Mandatory parameter."),

make_option(c("--CategoryList"), type="character", default=NULL, help="Comma or colon separated list of the two main categories (whose replicates are present). Default: Category1, Category2."),

make_option(c("--ReplicaCount"), type="character", default=NULL, help="Comma or colon separated list of the count of replicates for individual categories. Default: 1,1 (means that we are considering one replicate per sample)."),
make_option(c("--ReplicaLabels1"), type="character", default=NULL, help="Comma or colon separated list of the label of replicates for the first category. Default: R1,R2, etc."),
make_option(c("--ReplicaLabels2"), type="character", default=NULL, help="Comma or colon separated list of the label of replicates for the second category. Default: R1,R2, etc."),
--OutDir DirName Base Output directory under which all results will be stored.
MANDATORY PARAMETER.
--UseRawCC 0/1 If 1, uses the raw contact count for differential analysis. Else, uses both
raw and expected contact count (obtained from the bias regression model) values
for differential analysis. Default = 0 (Recommended).
--PeakFileCat1 filename ChIP-seq peak file obtained by merging ChIP-seq replicates of the first category.
User may use the script "MergeBAMInferPeak.sh" for producing such a file.
MANDATORY PARAMETER.
--PeakFileCat2 filename ChIP-seq peak file obtained by merging ChIP-seq replicates of the second category.
User may use the script "MergeBAMInferPeak.sh" for producing such a file.
MANDATORY PARAMETER.
--CategoryList Names Comma or colon separated list of strings depicting the names of two categories.
User may provide the names of two cell lines or cell types.
Default: Category1:Category2
--ReplicaCount Counts Comma or colon separated list of two integer values -
the number of replicates belonging to individual
input categories. Default: 1:1 meaning that both categories have single
replicate.
--ReplicaLabels1 Names Comma or colon separated list of the label of replicates for the first
category. Default: R1:R2:R3 etc (as per the replicate counts)
--ReplicaLabels2 Names Comma or colon separated list of the label of replicates for the second
category. Default: R1:R2:R3 etc (as per the replicate counts)
--BinCoverageList FILELIST List of files storing the ChIP-seq coverage of individual alignment files (of
different replicates of both categories) according to the specified bin size.
User may use the script "MergeBAMInferPeak.sh" for producing these files.
MANDATORY PARAMETER.
--InpTSSFile FileName Name of file containing TSS information of the reference genome.
Please check the description mentioned below to know how to
generate this file. MANDATORY PARAMETER.
--GeneExprFileList TwoFileNames Comma or colon separated list of two files, storing the gene expression
values for individual categories. THIS PARAMETER is OPTIONAL; if provided,
gene expression of the differential loops are also analyzed.
--GeneNameColList Counts Comma or colon separated list of two integer values - the column numbers of
individual gene expression files (mentioned in the parameter
--GeneExprFileList) storing the gene expression values.
THIS PARAMETER is OPTIONAL; required only if the parameter
--GeneExprFileList is provided.
--ExprValColList Counts Comma or colon separated list of two integer values - the column numbers of
individual gene expression files (mentioned in the parameter
--GeneExprFileList) storing the name of corresponding genes.
THIS PARAMETER is OPTIONAL; required only if the parameter
--GeneExprFileList is provided.
--FoldChangeThr integer Fold change threshold employed in EdgeR (log2 scale). Default = 3, meaning
log2(3) is used as the fold change threshold.
--FDRThr threshold FDR threshold for determining the significance of EdgeR. Default is 0.05,
means that loops with FDR < 0.05, and fold change >= log2(FoldChangeThr)
would be considered as differential.

make_option(c("--BinCoverageList"), type="character", default=NULL, help="List of files storing the ChIP-seq coverage for the specified bin size. Mandatory parameter."),

make_option(c("--InpTSSFile"), type="character", default=NULL, help="TSS containing file for the reference genome. Mandatory parameter."),

make_option(c("--GeneExprFileList"), type="character", default=NULL, help="Comma or colon separated list of gene expression containing files, for the cell lines checked."),

make_option(c("--GeneNameColList"), type="character", default=NULL, help="Comma or colon separated list of columns containing the gene names, for the above mentioned gene expression files."),

make_option(c("--ExprValColList"), type="character", default=NULL, help="Comma or colon separated list of columns containing the gene expression values, for the above mentioned gene expression files."),

# make_option(c("--UseDESeq"), type="integer", action="store", default=0, help="If 1, DESeq2 is used for differential analysis. Else, EdgeR is used. Default value of this parameter is 0 (means EdgeR is used)."),

make_option(c("--FoldChangeThr"), type="integer", action="store", default=3, help="DESeq / EdgeR fold change threshold - log2 of this value is used. Default = 3, means that log2(3) is used as the fold change threshold."),

make_option(c("--FDRThr"), type="numeric", default=0.05, help="FDR threshold for DESeq / EdgeR. Default is 0.05, means that loops with FDR < 0.05, and fold change >= log2(FoldChangeThr) would be considered as differential."),

make_option(c("--bcv"), type="numeric", default=0.4, help="If EdgeR is used with single samples (replica count = 1 for any of the categories), this value is the square-root-dispersion. For datasets arising from well-controlled experiments are 0.4 for human data, 0.1 for data on genetically identical model organisms or 0.01 for technical replicates. For details, see the edgeR manual. By default, the value is set as 0.4.")
);









--bcv threshold If EdgeR is used with single samples (replica count = 1 for any of the
categories), this value is the square-root-dispersion.
For datasets arising from well-controlled experiments are 0.4 for human data,
0.1 for data on genetically identical model organisms or
0.01 for technical replicates. For details, see the edgeR manual.
By default, the value is set as 0.4. Used only if a category contains
single replicate.


**** Example of differential loop calling using the above mentioned parameters, and with respect to the test data provided,
is described in the file "DiffAnalysisHiChIP_script.sh" placed within the folder "Imp_Scripts".

**** This file also contains sample scripts to generate the TSS containing file (used in the parameter --InpTSSFile)
with respect to different reference genomes (such as hg19, mm9, mm10, etc.) Users are requested to check the script
in details for understanding the parameters.


Sample logs from the console, corresponding to the TestData
Expand Down

0 comments on commit 233c758

Please sign in to comment.