-
Notifications
You must be signed in to change notification settings - Fork 3
/
README
executable file
·74 lines (52 loc) · 5.17 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
###################################################################################################################################################
# Copyright (c) 2016, Yu Hu and Mingyao Li (Department of Biostatistics and Epidemiology, University of Pennsylvania Perelman School of Medicine) #
# All rights reserved. #
###################################################################################################################################################
######################
# PennDiff Tutorials #
######################
REQUIREMENTS:
• Python v2.6+
• R with packages "gcmr" and "betareg" installed
USAGE:
python PennDiff.py -ref <RefSeqAnnotation> -est <isoformRelativeAbundances>
REQUIRED OPTIONS:
-ref ---RefSeqAnnotation file can be downloaded from UCSC Genome Browsers. The file "RefSeqAnnotation_ECE1_example" is an example of RefSeqAnnotation file
RefSeqAnnotation_ECE1_example:
749 NM_001397 chr1 - 21543739 21616982 21546447 21616907 19 21543739,21548239,21551742,21553651,21554423,21560050,21562342,21563238,21564626,21571481,21573713,21582439,21584017,21585185,21586763,21599191,21605683,21616562,21616856, 21546624,21548335,21551933,21553719,21554534,21560154,21562420,21563337,21564737,21571596,21573856,21582631,21584083,21585332,21586885,21599404,21605825,21616649,21616982, 0 ECE1 cmpl cmpl 0,0,1,2,2,0,0,0,0,2,0,0,0,0,1,1,0,0,0,
93 NM_001113348 chr1 - 21543739 21672034 21546447 21671871 19 21543739,21548239,21551742,21553651,21554423,21560050,21562342,21563238,21564626,21571481,21573713,21582439,21584017,21585185,21586763,21599191,21605683,21616562,21671868, 21546624,21548335,21551933,21553719,21554534,21560154,21562420,21563337,21564737,21571596,21573856,21582631,21584083,21585332,21586885,21599404,21605825,21616649,21672034, 0 ECE1 cmpl cmpl 0,0,1,2,2,0,0,0,0,2,0,0,0,0,1,1,0,0,0,
749 NM_001113349 chr1 - 21543739 21616766 21546447 21616691 18 21543739,21548239,21551742,21553651,21554423,21560050,21562342,21563238,21564626,21571481,21573713,21582439,21584017,21585185,21586763,21599191,21605683,21616562, 21546624,21548335,21551933,21553719,21554534,21560154,21562420,21563337,21564737,21571596,21573856,21582631,21584083,21585332,21586885,21599404,21605825,21616766, 0 ECE1 cmpl cmpl 0,0,1,2,2,0,0,0,0,2,0,0,0,0,1,1,0,0,
749 NM_001113347 chr1 - 21543739 21606183 21546447 21605927 17 21543739,21548239,21551742,21553651,21554423,21560050,21562342,21563238,21564626,21571481,21573713,21582439,21584017,21585185,21586763,21599191,21605683, 21546624,21548335,21551933,21553719,21554534,21560154,21562420,21563337,21564737,21571596,21573856,21582631,21584083,21585332,21586885,21599404,21606183, 0 ECE1 cmpl cmpl 0,0,1,2,2,0,0,0,0,2,0,0,0,0,1,1,0,
-est ---Isoform relative abundances can be obtained by using existing software such as PennSeq or Cufflinks. However, the format of results is consistent across different software.Thus we ask user to give a isoformRelativeAbundances file with format as example file "isoformRelativeAbundances_ECE1_example" shows:
• Column 1: gene name
• Column 2: isoform name
• Column 3: sample number
• Column 4: estimated isoform relative abundances
• Column 5: condition group (1 or 2)
isoformRelativeAbundances_ECE1_example:
ECE1 NM_001113347 0 0.257339787046751 1
ECE1 NM_001113349 0 0.0899339615416256 1
ECE1 NM_001113348 0 0.608750982445272 1
ECE1 NM_001397 0 0.0439752689663516 1
ECE1 NM_001113347 1 0.294457146326302 1
ECE1 NM_001113349 1 0.087303489040162 1
ECE1 NM_001113348 1 0.571130340575692 1
ECE1 NM_001397 1 0.047109024057844 1
OUTPUT:
"geneDASResults" file: Gene-based test results file with 3 columns:
• Column 1: gene name
• Column 2: test P-Value
• Column 3: Hellinger distance which measures the splicing difference between two conditions for a gene in terms of isoform relative abundances
"exonDASResults" file: Exon-based test results file with 10 columns:
• Column 1: gene name
• Column 2: DAS testing group where exons within same group share same P-Value
• Column 3: exon location
• Column 4: test P-Value
• Column 5: exon inclusion level difference which measures splicing difference between two conditions for an spliced exon
• Column 6: "+"/"-" if exon inclusion level of condition 1 is greater/smaller than condition 2.
• Column 7: number of samples in condition 1 group
• Column 8: exon inclusion levels of each sample in condition 1 group
• Column 9: number of samples in condition 2 group
• Column 10: exon inclusion levels of each sample in condition 2 group
CONTACT INFORMATION:
For more information, such as bug reports and comments, please contact Yu Hu via huyu1@mail.med.upenn.edu.