Skip to content

Commit

Permalink
FuSeq v1.1.3
Browse files Browse the repository at this point in the history
  • Loading branch information
nghiavtr committed Aug 19, 2020
1 parent 1ee10b7 commit 31f9406
Show file tree
Hide file tree
Showing 6 changed files with 4,435 additions and 29 deletions.
1 change: 1 addition & 0 deletions R/FuSeq.R
Original file line number Diff line number Diff line change
Expand Up @@ -216,3 +216,4 @@ if (validatedCommand){
cat("\n Done! \n")
}


19 changes: 11 additions & 8 deletions R/processSplitRead.R
Original file line number Diff line number Diff line change
Expand Up @@ -387,15 +387,18 @@ processSplitRead <-function(inPath,geneAnno, anntxdb, txFastaFile, FuSeq.params)
rmID=which(myFusion$ssEnd>=3 & myFusion$ssStart>=3)
if (length(rmID)>0) myFusion=myFusion[-rmID,]

cat("\n\n Checking possible paralogs...")
cat("\n\n Checking possible paralogs...")
gp=paste(geneParalog[,1],"-",geneParalog[,2],sep="")
rmID=myFusion$name12 %in% gp | myFusion$name21 %in% gp
rmID=!rmID

#remove paralogs from database
rmID=unlist(lapply(c(1:nrow(myFusion)), function(x){
par1=c(as.character(myFusion$gene1[x]),geneParalog[which(geneParalog[,1]==as.character(myFusion$gene1[x])),2])
par2=c(as.character(myFusion$gene2[x]),geneParalog[which(geneParalog[,1]==as.character(myFusion$gene2[x])),2])
if (length(par1)>0 & length(par2)>0) return(length(intersect(par1,par2))>0)
return(FALSE)
}))
# #remove paralogs from database
# rmID=unlist(lapply(c(1:nrow(myFusion)), function(x){
# par1=c(as.character(myFusion$gene1[x]),geneParalog[which(geneParalog[,1]==as.character(myFusion$gene1[x])),2])
# par2=c(as.character(myFusion$gene2[x]),geneParalog[which(geneParalog[,1]==as.character(myFusion$gene2[x])),2])
# if (length(par1)>0 & length(par2)>0) return(length(intersect(par1,par2))>0)
# return(FALSE)
# }))

myFusion$paralog=rep(0,nrow(myFusion))
myFusion$paralog[rmID]=1
Expand Down
45 changes: 26 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
#####################################
# Documents for FuSeq version 1.1.2
# Documents for FuSeq version 1.1.3
#####################################

## Update news
#### 19 Aug 2020: version 1.1.3
1) speed up the processSplitRead() in the paralogs checking
2) export fragmentDist.txt for the data with a low mapped read rate
3) fix the error when installing from sources caused by the dependency fmtlib mentioned in https://github.com/nghiavtr/FuSeq/issues/1
4) fix the error when running the binary version in Ubuntu

#### 09 Sep 2019: version 1.1.2
1) Fix small bugs in processSplitRead.R and postProcessSplitRead.R
2) Add excludeDiscordantTx.R (details in Section 7) to remove the discordant transcripts which are existing in fasta cdna file but not in gtf file. These discordant transcripts can crash down the split-read pipeline. This step is recommmended for any untested or new annotations such as Homo_sapiens.GRCh38.
Expand Down Expand Up @@ -61,6 +67,7 @@ We also prepare annotation RData files for several annotations from Ensembl (ens
The latest version and information of FuSeq are updated at https://github.com/nghiavtr/FuSeq

The older versions can be found here:
- Version 1.1.2: https://github.com/nghiavtr/FuSeq/releases/tag/v1.1.2
- Version 1.1.1: https://github.com/nghiavtr/FuSeq/releases/tag/v1.1.1
- Version 1.1.0: https://github.com/nghiavtr/FuSeq/releases/tag/v1.1.0
- Version 1.0.0: https://github.com/nghiavtr/FuSeq/releases/tag/v1.0.0
Expand All @@ -69,30 +76,30 @@ The older versions can be found here:

## 2. Download and installation
If you use the binary verion of FuSeq:
- Download the lastest binary version from FuSeq website: [FuSeq_v1.1.2_linux_x86-64](https://github.com/nghiavtr/FuSeq/releases/download/v1.1.2/FuSeq_v1.1.2_linux_x86-64.tar.gz)
- Download the lastest binary version from FuSeq website: [FuSeq_v1.1.3_linux_x86-64](https://github.com/nghiavtr/FuSeq/releases/download/v1.1.3/FuSeq_v1.1.3_linux_x86-64.tar.gz)
```sh
wget https://github.com/nghiavtr/FuSeq/releases/download/v1.1.2/FuSeq_v1.1.2_linux_x86-64.tar.gz -O FuSeq_v1.1.2_linux_x86-64.tar.gz
wget https://github.com/nghiavtr/FuSeq/releases/download/v1.1.3/FuSeq_v1.1.3_linux_x86-64.tar.gz -O FuSeq_v1.1.3_linux_x86-64.tar.gz
```
- Uncompress to folder
```sh
tar -xzvf FuSeq_v1.1.2_linux_x86-64.tar.gz
tar -xzvf FuSeq_v1.1.3_linux_x86-64.tar.gz
```
- Move to the *FuSeq_home* directory and do configuration for FuSeq
```sh
cd FuSeq_v1.1.2_linux_x86-64
cd FuSeq_v1.1.3_linux_x86-64
bash configure.sh
```
- Add paths of lib folder and bin folder to LD_LIBRARY_PATH and PATH
```sh
export LD_LIBRARY_PATH=/path/to/FuSeq_v1.1.2_linux_x86-64/linux/lib:$LD_LIBRARY_PATH
export PATH=/path/to/FuSeq_v1.1.2_linux_x86-64/linux/bin:$PATH
export LD_LIBRARY_PATH=/path/to/FuSeq_v1.1.3_linux_x86-64/linux/lib:$LD_LIBRARY_PATH
export PATH=/path/to/FuSeq_v1.1.3_linux_x86-64/linux/bin:$PATH
```
If you want to build FuSeq from sources:
- Download FuSeq from [FuSeq website](https://github.com/nghiavtr/FuSeq) and move to *FuSeq_home* directory
```sh
wget https://github.com/nghiavtr/FuSeq/archive/v1.1.2.tar.gz
tar -xzvf v1.1.2.tar.gz
cd FuSeq-1.1.2
wget https://github.com/nghiavtr/FuSeq/archive/v1.1.3.tar.gz
tar -xzvf v1.1.3.tar.gz
cd FuSeq-1.1.3
```
- FuSeq requires information of flags from Sailfish including DFETCH_BOOST, DBOOST_ROOT, DTBB_INSTALL_DIR and DCMAKE_INSTALL_PREFIX. Please refer to the Sailfish website for more details of these flags.
- Do installation by the following command:
Expand Down Expand Up @@ -256,16 +263,16 @@ For simplicity, in this practice, the FuSeq software, the annotation, RNA-seq da
### 8.1. Download and install
#### Download and configure FuSeq
```sh
wget https://github.com/nghiavtr/FuSeq/releases/download/v1.1.2/FuSeq_v1.1.2_linux_x86-64.tar.gz -O FuSeq_v1.1.2_linux_x86-64.tar.gz
tar -xzvf FuSeq_v1.1.2_linux_x86-64.tar.gz
cd FuSeq_v1.1.2_linux_x86-64
wget https://github.com/nghiavtr/FuSeq/releases/download/v1.1.3/FuSeq_v1.1.3_linux_x86-64.tar.gz -O FuSeq_v1.1.3_linux_x86-64.tar.gz
tar -xzvf FuSeq_v1.1.3_linux_x86-64.tar.gz
cd FuSeq_v1.1.3_linux_x86-64
bash configure.sh
cd ..
```
#### Set paths to FuSeq
```sh
export LD_LIBRARY_PATH=$PWD/FuSeq_v1.1.2_linux_x86-64/linux/lib:$LD_LIBRARY_PATH
export PATH=$PWD/FuSeq_v1.1.2_linux_x86-64/linux/bin:$PATH
export LD_LIBRARY_PATH=$PWD/FuSeq_v1.1.3_linux_x86-64/linux/lib:$LD_LIBRARY_PATH
export PATH=$PWD/FuSeq_v1.1.3_linux_x86-64/linux/bin:$PATH
```
### 8.2. Download and prepare the reference files
#### Download the fasta and gtf of transcripts
Expand All @@ -277,14 +284,14 @@ gunzip Homo_sapiens.GRCh37.75.gtf.gz
```
#### Create sqlite
```sh
Rscript FuSeq_v1.1.2_linux_x86-64/R/createSqlite.R Homo_sapiens.GRCh37.75.gtf Homo_sapiens.GRCh37.75.sqlite
Rscript FuSeq_v1.1.3_linux_x86-64/R/createSqlite.R Homo_sapiens.GRCh37.75.gtf Homo_sapiens.GRCh37.75.sqlite
```
#### Download the extra transcript information and annotation from FuSeq
```sh
wget https://github.com/nghiavtr/FuSeq/releases/download/v0.1.0/Homo_sapiens.GRCh37.75.txAnno.RData
```
### 8.3. Parameter setting
The default of parameter setting is located at FuSeq_v1.1.2_linux_x86-64/R/params.txt that we will use for the pratical examples.
The default of parameter setting is located at FuSeq_v1.1.3_linux_x86-64/R/params.txt that we will use for the pratical examples.
For more advanced-level users, we suggest running FuSeq with the setting of keepRData=TRUE to keep the processed data of FuSeq, then FuSeq will save all data into file FuSeq_process.RData. This file contains the results of both mapped read pipeline and split read pipeline, and extra relevant information of fusion gene candidates such as supporting exons, read mapping positions, sequence reads, etc.

### 8.4. An example for a short read sample
Expand All @@ -306,7 +313,7 @@ FuSeq -i TxIndexer_idx_k21 -l IU -1 <(gunzip -c SRR064287_1.fastq.gz) -2 <(gunzi
```
#### Discover fusion genes
```sh
Rscript FuSeq_v1.1.2_linux_x86-64/R/FuSeq.R in=SRR064287_feqDir txfasta=Homo_sapiens.GRCh37.75.cdna.all.fa sqlite=Homo_sapiens.GRCh37.75.sqlite txanno=Homo_sapiens.GRCh37.75.txAnno.RData out=SRR064287_FuseqOut params=FuSeq_v1.1.2_linux_x86-64/R/params.txt
Rscript FuSeq_v1.1.3_linux_x86-64/R/FuSeq.R in=SRR064287_feqDir txfasta=Homo_sapiens.GRCh37.75.cdna.all.fa sqlite=Homo_sapiens.GRCh37.75.sqlite txanno=Homo_sapiens.GRCh37.75.txAnno.RData out=SRR064287_FuseqOut params=FuSeq_v1.1.3_linux_x86-64/R/params.txt
```
The results is a list of gene-fusion candidates stored in file fusions.FuSeq in the output folder (SRR064287_FuseqOut).

Expand All @@ -333,7 +340,7 @@ FuSeq -i TxIndexer_idx_k31 -l IU -1 <(gunzip -c SRR934746_1.fastq.gz) -2 <(gunzi
```
#### Discover fusion genes
```sh
Rscript FuSeq_v1.1.2_linux_x86-64/R/FuSeq.R in=SRR934746_feqDir txfasta=Homo_sapiens.GRCh37.75.cdna.all.fa sqlite=Homo_sapiens.GRCh37.75.sqlite txanno=Homo_sapiens.GRCh37.75.txAnno.RData out=SRR934746_FuseqOut params=FuSeq_v1.1.2_linux_x86-64/R/params.txt
Rscript FuSeq_v1.1.3_linux_x86-64/R/FuSeq.R in=SRR934746_feqDir txfasta=Homo_sapiens.GRCh37.75.cdna.all.fa sqlite=Homo_sapiens.GRCh37.75.sqlite txanno=Homo_sapiens.GRCh37.75.txAnno.RData out=SRR934746_FuseqOut params=FuSeq_v1.1.3_linux_x86-64/R/params.txt
```
The results is a list of fusion gene candidates stored in file fusions.FuSeq in the output folder (SRR934746_FuseqOut).
## 9. License
Expand Down
Loading

0 comments on commit 31f9406

Please sign in to comment.