-
Notifications
You must be signed in to change notification settings - Fork 14
Additional options (gprofile)
In release 1.0.5 we have added additional options for gprofile.
- --perCategory option reports assignment of each read in the following format:
readName,chr,category, geneID,geneName
4029592_h_0_TCTGATG_CGAT,10,CDS,ENSMUSG00000015202,Cnksr3
- --mouse option allows to run gprofile for mouse samples. Make sure to allign to genome/transcriptome from NCBIM37 release.
This is an example how to run gprofile with --perCategory
and --mouse
options:
You need to create gProfile15 before running this command.
python /u/home/s/serghei/collab/code/rop/gprofile.py mapped_SJV040_CGAT.bam gProfile15/mapped_SJV040_CGAT.csv --perCategory --mouse
Create samples.txt with all the file names (with no extension). One way do it is to use awk
:
ls *bam | awk -F ".bam" '{print $1}'>samples.txt
Then run :
mkdir gprofileOut
while read line ;do echo ". /u/local/Modules/default/init/modules.sh" >run_${line}.sh; echo "module load python/2.7.3" >>run_${line}.sh; echo "python /u/home/s/serghei/collab/code/rop/gprofile.py /u/home/s/svandrie/nobackup-kmartin/FinalBams/${line}.bam /u/home/s/svandrie/nobackup-kmartin/gprofileOut/gprofileOut/${line}.csv --perCategory --mouse">>run_${line}.sh;done<samples.txt
Please follow the following steps to submit jobs per samples:
ls run*sh | awk '{i+=1;print "qsub -cwd -V -N gprofile"i" -l h_data=16G,time=24:00:00 "$1}' > all.sh
Now you have all.sh with individual qsub command per sample. To run all the qsubs from all.sh
do the following:
chmod u+x all.sh
nohup ./all.sh &
Alternatively, you can use runGprofile.sh
(to be available in 1.0.6 release), as follows:
~/code/SarahSeq/SarahSeq-master/runGprofile.sh /u/home/s/svandrie/nobackup-kmartin/testBam/ $PWD/testGprofile/
##Output of the gprofile
By using --perCategory option gprofile.py
will generate the following:
- A summary of the read assignment per chromosome. A separate file per chromosome will be created in
mapped_SJV040_CGAT_perCategory
directory.
Those are the files to be generated in mapped_SJV040_CGAT_perCategory
:
mapped_SJV040_CGAT.10.genomicFeature mapped_SJV040_CGAT.3.genomicFeature
mapped_SJV040_CGAT.11.genomicFeature mapped_SJV040_CGAT.4.genomicFeature
mapped_SJV040_CGAT.12.genomicFeature mapped_SJV040_CGAT.5.genomicFeature
mapped_SJV040_CGAT.13.genomicFeature mapped_SJV040_CGAT.6.genomicFeature
mapped_SJV040_CGAT.14.genomicFeature mapped_SJV040_CGAT.7.genomicFeature
mapped_SJV040_CGAT.15.genomicFeature mapped_SJV040_CGAT.8.genomicFeature
mapped_SJV040_CGAT.16.genomicFeature mapped_SJV040_CGAT.9.genomicFeature
mapped_SJV040_CGAT.17.genomicFeature mapped_SJV040_CGAT.MT.genomicFeature
mapped_SJV040_CGAT.18.genomicFeature mapped_SJV040_CGAT.X.genomicFeature
mapped_SJV040_CGAT.19.genomicFeature mapped_SJV040_CGAT.Y.genomicFeature
mapped_SJV040_CGAT.1.genomicFeature perGeneSummary
mapped_SJV040_CGAT.2.genomicFeature
The format is the following:
readName,chr,category, geneID,geneName
3730084_h_0_CGTGCTA_CGAT,19,junction,ENSMUSG00000024831,Ighmbp2
3098258_h_1_GAACTTA_CGAT,19,CDS,ENSMUSG00000024831,Ighmbp2
11015546_h_0_TTGGTTC_CGAT,19,CDS,ENSMUSG00000024831,Ighmbp2
- We will also summarize the number of
junction
,CDS
,UTR3
,UTR5
,UTR_
,Intron
per gene in themapped_SJV040_CGAT_perCategory/perGeneSummary
directory.
Those are the files to be generated in mapped_SJV040_CGAT_perCategory/perGeneSummary
:
mapped_SJV040_CGAT.10.perGeneSummary mapped_SJV040_CGAT.2.perGeneSummary
mapped_SJV040_CGAT.11.perGeneSummary mapped_SJV040_CGAT.3.perGeneSummary
mapped_SJV040_CGAT.12.perGeneSummary mapped_SJV040_CGAT.4.perGeneSummary
mapped_SJV040_CGAT.13.perGeneSummary mapped_SJV040_CGAT.5.perGeneSummary
mapped_SJV040_CGAT.14.perGeneSummary mapped_SJV040_CGAT.6.perGeneSummary
mapped_SJV040_CGAT.15.perGeneSummary mapped_SJV040_CGAT.7.perGeneSummary
mapped_SJV040_CGAT.16.perGeneSummary mapped_SJV040_CGAT.8.perGeneSummary
mapped_SJV040_CGAT.17.perGeneSummary mapped_SJV040_CGAT.9.perGeneSummary
mapped_SJV040_CGAT.18.perGeneSummary mapped_SJV040_CGAT.X.perGeneSummary
mapped_SJV040_CGAT.19.perGeneSummary mapped_SJV040_CGAT.Y.perGeneSummary
mapped_SJV040_CGAT.1.perGeneSummary
The format is the following:
geneName,chr,nJunction,nCDS,nUTR3,nUTR5,nUTR_,nIntron
Plekhg1,2,22,0,0,0
Cand1,16,86,0,0,0
2510003E04Rik,16,76,1,0,0
Tmcc3,0,8,0,2,0
Spryd4,0,1,0,0,0
Msl3l2,0,4,0,0,0
Asf1a,0,8,0,0,0
F630110N24Rik,1,2,0,0,0
##What program will output while running?
This is expected output of the program:
Mouse annotations are used ...
Load gene annotations ...
Load /u/home/galaxy/collaboratory/serghei/code/rop/source/gprofile/annotations/mouse/bedPrepared/UTR3_NCBIM37_prepared.bed
Load /u/home/galaxy/collaboratory/serghei/code/rop/source/gprofile/annotations/mouse/bedPrepared/UTR5_NCBIM37_prepared.bed
Load /u/home/galaxy/collaboratory/serghei/code/rop/source/gprofile/annotations/mouse/bedPrepared/CDS_NCBIM37_prepared.bed
Load /u/home/galaxy/collaboratory/serghei/code/rop/source/gprofile/annotations/mouse/bedPrepared/geneCoordinates_NCBIM37.bed
Open bam file /u/home/s/svandrie/nobackup-kmartin/FinalBams/SJV040_TCGA_PCRDuplicatesCollapsed_sort.bam
Process chr 1
Process chr 2
...
Don’t let your unmapped reads go to waste
- Main
- About ROP Tutorial
- What is ROP?
- How ROP works?
- How to prepare unmapped reads
- How to customize tools used by ROP
- Unix Tutorial
- Get started
- Targeted analysis
- ROP analysis: one RNA-Seq sample
- How to run ROP for mouse
- ROP analysis via qsub
- ROP analysis of multiple samples via qsub array
- Immune profiling by ROP (ImReP)
- ImRep across multiple samples
- ROP input details
- ROP output details
- Source of every last read
- Additional options
- How to calculate immune diversity?
- How to run hyper editing pipeline?