Update README.md

ay-lab · Sep 21, 2018 · 233c758 · 233c758
1 parent 275a344
commit 233c758
Showing 1 changed file with 84 additions and 46 deletions.
diff --git a/README.md b/README.md
@@ -567,7 +567,8 @@ The script "MergeBAMInferPeak.sh" within the folder "Imp_Scripts" processes a se
 
 	Output files generated from this script:
 		1) ${OutBaseDir}/merged_input.bam: merged ChIP-seq alignment file
-		2) ${OutBaseDir}/MACS2_Out/out.macs2_peaks.narrowPeak: Peak (derived by MACS2) file generated by merged ChIP-seq alignment.
+		2) ${OutBaseDir}/MACS2_Out/out.macs2_peaks.narrowPeak: Peak (derived by MACS2) 
+		file generated by merged ChIP-seq alignment.
 		3) ${OutBaseDir}/ChIPCoverage1.bed, ${OutBaseDir}/ChIPCoverage2.bed, ..., ${OutBaseDir}/ChIPCoverageN.bed, 
 		where N is the number of input ChIP-seq alignment files. Each of these output files contain the 
 		coverage of corresponding input alignment file (with respect to the specified bin size).
@@ -579,57 +580,94 @@ The R script "DiffAnalysisHiChIP.r" within the folder "Imp_Scripts" is the code
 FitHiChIP loops, having M and N replicates, respectively. These two categories may correspond to two different cell types 
 or cell lines. However, bin size employed for all of these interactions should be identical.
 
+Parameters associated with this script are as follows:
 
+	--AllLoopList	FILELIST	Comma or colon separated list of FitHiChIP loop files from all the replicates 
+					of both input categories, where individual interaction files are of the 
+					format: PREFIX.interactions_FitHiC.bed. That is, all significant and 
+					non-significant interactions of individual categories and replicates are 
+					used as the input (without applying any FDR threshold). MANDATORY PARAMETER.
+					
+	--FDRThrLoop	Threshold	FDR threshold used for FitHiChIP loops. Default = 0.01 (same used in 
+					FitHiChIP implementation)
 
-
-        make_option(c("--AllLoopList"), type="character", default=NULL, help="Comma or colon separated list of all possible interactions, along with their FItHiChIP significance value (would include both significant and insignificant loops as well). Mandatory parameter."),
-        make_option(c("--FDRThrLoop"), type="numeric", default=0.01, help="FDR threshold used for FitHiChIP loops. Default = 0.01 (same as FitHiChIP implementation)."),
-
-        make_option(c("--OutDir"), type="character", default=NULL, help="Base Output directory. Mandatory parameter."),
-
-        make_option(c("--UseRawCC"), type="integer", action="store", default=0, help="If 1, uses the raw contact count for replicate analysis. If 0, uses the observed contact count (c), multiplied by the ratio (c/e) (upto nearest integer) where e is the expected contact count of this locus pair, according to the spline fit and bias regression model of FitHiChIP (listed in the field exp_cc_bias of FitHiChIP loop file). Default value of this parameter is 0."),
-
-        # make_option(c("--BiasFileList"), type="character", default=NULL, help="Comma or colon separated list of bias files generated by FitHiChIP loops. Mandatory parameter."),      
-
-        make_option(c("--PeakFileCat1"), type="character", default=NULL, help="Peak file used for the samples in the first category (to infer peak specific HiChIP loops). Mandatory parameter."),
-        make_option(c("--PeakFileCat2"), type="character", default=NULL, help="Peak file used for the samples in the second category (to infer peak specific HiChIP loops). Mandatory parameter."),
-
-        make_option(c("--CategoryList"), type="character", default=NULL, help="Comma or colon separated list of the two main categories (whose replicates are present). Default: Category1, Category2."),
-
-        make_option(c("--ReplicaCount"), type="character", default=NULL, help="Comma or colon separated list of the count of replicates for individual categories. Default: 1,1 (means that we are considering one replicate per sample)."),
-        make_option(c("--ReplicaLabels1"), type="character", default=NULL, help="Comma or colon separated list of the label of replicates for the first category. Default: R1,R2, etc."),
-        make_option(c("--ReplicaLabels2"), type="character", default=NULL, help="Comma or colon separated list of the label of replicates for the second category. Default: R1,R2, etc."),
+	--OutDir	DirName		Base Output directory under which all results will be stored. 
+					MANDATORY PARAMETER.
+					
+	--UseRawCC	0/1		If 1, uses the raw contact count for differential analysis. Else, uses both 
+					raw and expected contact count (obtained from the bias regression model) values 
+					for differential analysis. Default = 0 (Recommended).
+					
+	--PeakFileCat1	filename	ChIP-seq peak file obtained by merging ChIP-seq replicates of the first category. 
+					User may use the script "MergeBAMInferPeak.sh" for producing such a file. 
+					MANDATORY PARAMETER.
+					
+	--PeakFileCat2	filename	ChIP-seq peak file obtained by merging ChIP-seq replicates of the second category. 
+					User may use the script "MergeBAMInferPeak.sh" for producing such a file. 
+					MANDATORY PARAMETER.
+					
+	--CategoryList	Names		Comma or colon separated list of strings depicting the names of two categories. 
+					User may provide the names of two cell lines or cell types. 
+					Default: Category1:Category2
+					
+	--ReplicaCount	Counts		Comma or colon separated list of two integer values - 
+					the number of replicates belonging to individual 
+					input categories. Default: 1:1 meaning that both categories have single 
+					replicate.
+					
+	--ReplicaLabels1 Names		Comma or colon separated list of the label of replicates for the first 
+					category. Default: R1:R2:R3 etc (as per the replicate counts)
+					
+	--ReplicaLabels2 Names		Comma or colon separated list of the label of replicates for the second 
+					category. Default: R1:R2:R3 etc (as per the replicate counts)
+					
+	--BinCoverageList FILELIST	List of files storing the ChIP-seq coverage of individual alignment files (of 
+					different replicates of both categories) according to the specified bin size. 
+					User may use the script "MergeBAMInferPeak.sh" for producing these files. 
+					MANDATORY PARAMETER.
+					
+	--InpTSSFile	FileName	Name of file containing TSS information of the reference genome. 
+					Please check the description mentioned below to know how to 
+					generate this file. MANDATORY PARAMETER.
+					
+	--GeneExprFileList TwoFileNames	Comma or colon separated list of two files, storing the gene expression 
+					values for individual categories. THIS PARAMETER is OPTIONAL; if provided, 
+					gene expression of the differential loops are also analyzed.
+					
+	--GeneNameColList Counts	Comma or colon separated list of two integer values - the column numbers of 
+					individual gene expression files (mentioned in the parameter 
+					--GeneExprFileList) storing the gene expression values. 
+					THIS PARAMETER is OPTIONAL; required only if the parameter 
+					--GeneExprFileList is provided.
+					
+	--ExprValColList  Counts	Comma or colon separated list of two integer values - the column numbers of 
+					individual gene expression files (mentioned in the parameter 
+					--GeneExprFileList) storing the name of corresponding genes. 
+					THIS PARAMETER is OPTIONAL; required only if the parameter 
+					--GeneExprFileList is provided.
+					
+	--FoldChangeThr	  integer	Fold change threshold employed in EdgeR (log2 scale). Default = 3, meaning 
+					log2(3) is used as the fold change threshold.
+					
+	--FDRThr 	threshold	FDR threshold for determining the significance of EdgeR. Default is 0.05, 
+					means that loops with FDR < 0.05, and fold change >= log2(FoldChangeThr) 
+					would be considered as differential.
 
-	make_option(c("--BinCoverageList"), type="character", default=NULL, help="List of files storing the ChIP-seq coverage for the specified bin size. Mandatory parameter."),
-
-        make_option(c("--InpTSSFile"), type="character", default=NULL, help="TSS containing file for the reference genome. Mandatory parameter."),
-
-        make_option(c("--GeneExprFileList"), type="character", default=NULL, help="Comma or colon separated list of gene expression containing files, for the cell lines checked."),
-
-        make_option(c("--GeneNameColList"), type="character", default=NULL, help="Comma or colon separated list of columns containing the gene names, for the above mentioned gene expression files."),
-
-        make_option(c("--ExprValColList"), type="character", default=NULL, help="Comma or colon separated list of columns containing the gene expression values, for the above mentioned gene expression files."),
-
-        # make_option(c("--UseDESeq"), type="integer", action="store", default=0, help="If 1, DESeq2 is used for differential analysis. Else, EdgeR is used. Default value of this parameter is 0 (means EdgeR is used)."),     
-
-        make_option(c("--FoldChangeThr"), type="integer", action="store", default=3, help="DESeq / EdgeR fold change threshold - log2 of this value is used. Default = 3, means that log2(3) is used as the fold change threshold."),
-
-        make_option(c("--FDRThr"), type="numeric", default=0.05, help="FDR threshold for DESeq / EdgeR. Default is 0.05, means that loops with FDR < 0.05, and fold change >= log2(FoldChangeThr) would be considered as differential."),
-
-        make_option(c("--bcv"), type="numeric", default=0.4, help="If EdgeR is used with single samples (replica count = 1 for any of the categories), this value is the square-root-dispersion. For datasets arising from well-controlled experiments are 0.4 for human data, 0.1 for data on genetically identical model organisms or 0.01 for technical replicates. For details, see the edgeR manual. By default, the value is set as 0.4.")
-);
-
-
-
-
-
-
-
-
-
+	--bcv		threshold	If EdgeR is used with single samples (replica count = 1 for any of the 
+					categories), this value is the square-root-dispersion.  
+					For datasets arising from well-controlled experiments are 0.4 for human data, 
+					0.1 for data on genetically identical model organisms or 
+					0.01 for technical replicates. For details, see the edgeR manual. 
+					By default, the value is set as 0.4. Used only if a category contains 
+					single replicate.
 
 
+**** Example of differential loop calling using the above mentioned parameters, and with respect to the test data provided, 
+is described in the file "DiffAnalysisHiChIP_script.sh" placed within the folder "Imp_Scripts".
 
+**** This file also contains sample scripts to generate the TSS containing file (used in the parameter --InpTSSFile) 
+with respect to different reference genomes (such as hg19, mm9, mm10, etc.) Users are requested to check the script 
+in details for understanding the parameters.
 
 
 Sample logs from the console, corresponding to the TestData