diff --git a/docs/CONDUCT.html b/docs/CONDUCT.html index 7536f0c..b925b4b 100644 --- a/docs/CONDUCT.html +++ b/docs/CONDUCT.html @@ -62,7 +62,7 @@
diff --git a/docs/CONTRIBUTING.html b/docs/CONTRIBUTING.html index 394a5fa..5993705 100644 --- a/docs/CONTRIBUTING.html +++ b/docs/CONTRIBUTING.html @@ -62,7 +62,7 @@ diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index 5b45e4e..afd9586 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -62,7 +62,7 @@ diff --git a/docs/articles/index.html b/docs/articles/index.html index a240016..8065472 100644 --- a/docs/articles/index.html +++ b/docs/articles/index.html @@ -62,7 +62,7 @@ diff --git a/docs/articles/protr.html b/docs/articles/protr.html index 09bd2be..38226f4 100644 --- a/docs/articles/protr.html +++ b/docs/articles/protr.html @@ -31,7 +31,7 @@ @@ -75,7 +75,7 @@vignettes/protr.Rmd
protr.Rmd
The protr package offers a unique and comprehensive toolkit for generating various numerical representation schemes of protein sequences. The descriptors included are extensively utilized in bioinformatics and chemogenomics research. The commonly used descriptors listed in protr include amino acid composition, autocorrelation, CTD, conjoint traid, quasi-sequence order, pseudo amino acid composition, and profile-based descriptors derived by Position-Specific Scoring Matrix (PSSM). The descriptors for proteochemometric (PCM) modeling, includes the scales-based descriptors derived by principal components analysis, factor analysis, multidimensional scaling, amino acid properties (AAindex), 20+ classes of 2D and 3D molecular descriptors (Topological, WHIM, VHSE, etc.), and BLOSUM/PAM matrix-derived descriptors. The protr package also integrates the function of parallelized similarity computation derived by pairwise protein sequence alignment and Gene Ontology (GO) semantic similarity measures. ProtrWeb, the web application built on protr, can be accessed from http://protr.org.
-If you find protr is useful in your research, please feel free to cite our paper:
+The protr package offers a unique and comprehensive toolkit for generating various numerical representation schemes of protein sequences. The descriptors included are extensively utilized in bioinformatics and chemogenomics research.
+The commonly used descriptors listed in protr include amino acid composition, autocorrelation, CTD, conjoint traid, quasi-sequence order, pseudo amino acid composition, and profile-based descriptors derived by Position-Specific Scoring Matrix (PSSM).
+The descriptors for proteochemometric (PCM) modeling, includes the scales-based descriptors derived by principal components analysis, factor analysis, multidimensional scaling, amino acid properties (AAindex), 20+ classes of 2D and 3D molecular descriptors (Topological, WHIM, VHSE, etc.), and BLOSUM/PAM matrix-derived descriptors.
+The protr package also implemented parallelized similarity computation derived by pairwise protein sequence alignment and Gene Ontology (GO) semantic similarity measures. ProtrWeb, the web application built on protr, can be accessed from http://protr.org.
+If you find protr useful in your research, please feel free to cite our paper:
Nan Xiao, Dong-Sheng Cao, Min-Feng Zhu, and Qing-Song Xu. (2015). protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences. Bioinformatics 31 (11), 1857-1859.@@ -107,58 +110,62 @@
Here we use the subcellular localization dataset of human proteins presented in Chou and Shen (2008) to demonstrate the workflow of using protr.
+An Example Predictive Modeling Workflow +Here we use the subcellular localization dataset of human proteins presented in Chou and Shen (2008) to demonstrate the basic usage of protr.
The complete dataset includes 3,134 protein sequences (2,750 different proteins), classified into 14 human subcellular locations. We selected two classes of proteins as our benchmark dataset. Class 1 contains 325 extracell proteins, and class 2 includes 307 mitochondrion proteins. Here we aim to build a random forest classification model to classify these two types of proteins.
First, we load the protr package, then read the protein sequences stored in two separated FASTA files with readFASTA()
:
library("protr")
# load FASTA files
-extracell = readFASTA(system.file(
- "protseq/extracell.fasta", package = "protr"))
-mitonchon = readFASTA(system.file(
- "protseq/mitochondrion.fasta", package = "protr"))
To read protein sequences stored in PDB format files, use readPDB()
instead. The loaded sequences will be stored as two lists in R, and each component in the list is a character string representing one protein sequence. In this case, there are 325 extracell protein sequences and 306 mitonchon protein sequences:
To ensure that the protein sequences only have the 20 standard amino acid types which is usually required for the descriptor computation, we use the protcheck()
function to do the amino acid type sanity check and remove the non-standard sequences:
extracell = extracell[(sapply(extracell, protcheck))]
-mitonchon = mitonchon[(sapply(mitonchon, protcheck))]
extracell <- extracell[(sapply(extracell, protcheck))]
+mitonchon <- mitonchon[(sapply(mitonchon, protcheck))]
Two protein sequences were removed from each class. For the remaining sequences, we calculate the Type II PseAAC descriptor, i.e., the amphiphilic pseudo amino acid composition (APseAAC) descriptor (Chou, 2005) and make class labels for classification modeling.
# calculate APseAAC descriptors
-x1 = t(sapply(extracell, extractAPAAC))
-x2 = t(sapply(mitonchon, extractAPAAC))
-x = rbind(x1, x2)
+x1 <- t(sapply(extracell, extractAPAAC))
+x2 <- t(sapply(mitonchon, extractAPAAC))
+x <- rbind(x1, x2)
# make class labels
-labels = as.factor(c(rep(0, length(extracell)), rep(1, length(mitonchon))))
In protr, the functions of commonly used descriptors for protein sequences and proteochemometric (PCM) modeling descriptors are named after extract...()
.
Next, we will split the data into a 75% training set and a 25% test set.
set.seed(1001)
# split training and test set
-tr.idx = c(
+tr.idx <- c(
sample(1:nrow(x1), round(nrow(x1) * 0.75)),
sample(nrow(x1) + 1:nrow(x2), round(nrow(x2) * 0.75))
)
-te.idx = setdiff(1:nrow(x), tr.idx)
+te.idx <- setdiff(1:nrow(x), tr.idx)
-x.tr = x[tr.idx, ]
-x.te = x[te.idx, ]
-y.tr = labels[tr.idx]
-y.te = labels[te.idx]
We will train a random forest classification model on the training set with 5-fold cross-validation, using the randomForest
package.
The training result is:
## Call:
## randomForest(x = x.tr, y = y.tr, cv.fold = 5)
@@ -173,7 +180,7 @@
## 1 72 156 0.3157895
With the model trained on the training set, we predict on the test set and plot the ROC curve with the pROC
package, as is shown in Figure 1.
# predict on test set
-rf.pred = predict(rf.fit, newdata = x.te, type = "prob")[, 1]
+rf.pred <- predict(rf.fit, newdata = x.te, type = "prob")[, 1]
# plot ROC curve
library("pROC")
@@ -192,10 +199,9 @@
Package Overview
-The protr package (Xiao et al., 2015) implemented most of the state-of-the-art protein sequence descriptors with R. Generally, each type of the descriptors (features) can be calculated with a function named extractX()
in the protr package, where X
stands for the abbrevation of the descriptor name. The descriptors and the function names implemented are listed below:
+The protr package (Xiao et al., 2015) implemented most of the state-of-the-art protein sequence descriptors with R. Generally, each type of the descriptors (features) can be calculated with a function named extractX()
in the protr package, where X
stands for the abbrevation of the descriptor name. The descriptors are:
--
-
Amino acid composition
+ - Amino acid composition
-
extractAAC()
- Amino acid composition
@@ -205,8 +211,7 @@
extractTC()
- Tripeptide composition
-
-Autocorrelation
+ Autocorrelation
-
extractMoreauBroto()
- Normalized Moreau-Broto autocorrelation
@@ -216,8 +221,7 @@
extractGeary()
- Geary autocorrelation
-
-CTD descriptors
+ CTD descriptors
-
extractCTDC()
- Composition
@@ -227,15 +231,13 @@
extractCTDD()
- Distribution
-
-Conjoint triad descriptors
+ Conjoint triad descriptors
-
extractCTriad()
- Conjoint triad descriptors
-
-Quasi-sequence-order descriptors
+ Quasi-sequence-order descriptors
-
extractSOCN()
- Sequence-order-coupling number
@@ -243,8 +245,7 @@
extractQSO()
- Quasi-sequence-order descriptors
-
-Pseudo-amino acid composition
+ Pseudo-amino acid composition
-
extractPAAC()
- Pseudo-amino acid composition (PseAAC)
@@ -252,8 +253,7 @@
extractAPAAC()
- Amphiphilic pseudo-amino acid composition (APseAAC)
-
-Profile-based descriptors
+ Profile-based descriptors
extractPSSM()
extractPSSMAcc()
@@ -264,7 +264,7 @@
The descriptors commonly used in Proteochemometric Modeling (PCM) implemented in protr include:
-
-
extractScales()
, extractScalesGap()
- Scales-based descriptors derived by Principal Components Analysis
+extractScales()
, extractScalesGap()
- Scales-based descriptors derived by Principal Components Analysis
-
extractProtFP()
, extractProtFPGap()
- Scales-based descriptors derived by amino acid properties from AAindex (a.k.a. Protein Fingerprint)
@@ -272,9 +272,12 @@
extractDescScales()
- Scales-based descriptors derived by 20+ classes of 2D and 3D molecular descriptors (Topological, WHIM, VHSE, etc.)
-extractFAScales()
- Scales-based descriptors derived by Factor Analysis
-extractMDSScales()
- Scales-based descriptors derived by Multidimensional Scaling
-extractBLOSUM()
- BLOSUM and PAM matrix-derived descriptors
+
+extractFAScales()
- Scales-based descriptors derived by Factor Analysis
+
+extractMDSScales()
- Scales-based descriptors derived by Multidimensional Scaling
+
+extractBLOSUM()
- BLOSUM and PAM matrix-derived descriptors
The protr package integrates the function of parallelized similarity score computation derived by local or global protein sequence alignment between a list of protein sequences, the sequence alignment computation is provided by Biostrings
, the corresponding functions listed in the protr
package include:
@@ -311,10 +314,12 @@
As was described above, we can use the function extractAAC()
to extract the descriptors (features) from protein sequences:
library("protr")
-x = readFASTA(system.file(
- "protseq/P00750.fasta", package = "protr"))[[1]]
-
-extractAAC(x)
+x <- readFASTA(system.file(
+ "protseq/P00750.fasta",
+ package = "protr"
+))[[1]]
+
+extractAAC(x)
## A R N D C E
## 0.06405694 0.07117438 0.03914591 0.05160142 0.06761566 0.04804270
## Q G H I L K
@@ -333,7 +338,7 @@
f(r, s) = \frac{N_{rs}}{N - 1} \quad r, s = 1, 2, \ldots, 20.
\]
where \(N_{rs}\) is the number of dipeptide represented by amino acid type \(r\) and type \(s\). Similar to extractAAC()
, here we use extractDC()
to compute the descriptors:
-dc = extractDC(x)
+
## AA RA NA DA CA EA
## 0.003565062 0.003565062 0.000000000 0.007130125 0.003565062 0.003565062
@@ -355,7 +360,7 @@
f(r, s, t) = \frac{N_{rst}}{N - 2} \quad r, s, t = 1, 2, \ldots, 20
\]
where \(N_{rst}\) is the number of tripeptides represented by amino acid type \(r\), \(s\), and \(t\). With function extractTC()
, we can easily obtain the length-8000 descriptor, to save some space, here we also omitted the long outputs:
-tc = extractTC(x)
+
## AAA RAA NAA DAA CAA EAA
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
@@ -398,7 +403,7 @@
ATS(d) = \frac{AC(d)}{N-d} \quad d = 1, 2, \ldots, \textrm{nlag}
\]
The corresponding function for this descriptor is extractMoreauBroto()
. A typical call would be:
-moreau = extractMoreauBroto(x)
+
## CIDH920105.lag1 CIDH920105.lag2 CIDH920105.lag3 CIDH920105.lag4
## 0.081573213 -0.016064817 -0.015982990 -0.025739038
@@ -431,29 +436,32 @@
Users can change the property names of AAindex database with the argument props
. The AAindex data shipped with protr can be loaded by data(AAindex)
, which has the detailed information of each property. With the argument customprops
and nlag
, users can specify their own properties and lag value to calculate with. For example:
# Define 3 custom properties
-myprops = data.frame(
+myprops <- data.frame(
AccNo = c("MyProp1", "MyProp2", "MyProp3"),
- A = c(0.62, -0.5, 15), R = c(-2.53, 3, 101),
- N = c(-0.78, 0.2, 58), D = c(-0.9, 3, 59),
- C = c(0.29, -1, 47), E = c(-0.74, 3, 73),
- Q = c(-0.85, 0.2, 72), G = c(0.48, 0, 1),
- H = c(-0.4, -0.5, 82), I = c(1.38, -1.8, 57),
- L = c(1.06, -1.8, 57), K = c(-1.5, 3, 73),
- M = c(0.64, -1.3, 75), F = c(1.19, -2.5, 91),
- P = c(0.12, 0, 42), S = c(-0.18, 0.3, 31),
- T = c(-0.05, -0.4, 45), W = c(0.81, -3.4, 130),
- Y = c(0.26, -2.3, 107), V = c(1.08, -1.5, 43)
+ A = c(0.62, -0.5, 15), R = c(-2.53, 3, 101),
+ N = c(-0.78, 0.2, 58), D = c(-0.9, 3, 59),
+ C = c(0.29, -1, 47), E = c(-0.74, 3, 73),
+ Q = c(-0.85, 0.2, 72), G = c(0.48, 0, 1),
+ H = c(-0.4, -0.5, 82), I = c(1.38, -1.8, 57),
+ L = c(1.06, -1.8, 57), K = c(-1.5, 3, 73),
+ M = c(0.64, -1.3, 75), F = c(1.19, -2.5, 91),
+ P = c(0.12, 0, 42), S = c(-0.18, 0.3, 31),
+ T = c(-0.05, -0.4, 45), W = c(0.81, -3.4, 130),
+ Y = c(0.26, -2.3, 107), V = c(1.08, -1.5, 43)
)
# Use 4 properties in the AAindex database, and 3 cutomized properties
-moreau2 = extractMoreauBroto(
- x, customprops = myprops,
- props = c(
- "CIDH920105", "BHAR880101",
- "CHAM820101", "CHAM820102",
- "MyProp1", "MyProp2", "MyProp3"))
-
-head(moreau2, n = 36L)
+moreau2 <- extractMoreauBroto(
+ x,
+ customprops = myprops,
+ props = c(
+ "CIDH920105", "BHAR880101",
+ "CHAM820101", "CHAM820102",
+ "MyProp1", "MyProp2", "MyProp3"
+ )
+)
+
+head(moreau2, n = 36L)
## CIDH920105.lag1 CIDH920105.lag2 CIDH920105.lag3 CIDH920105.lag4
## 0.081573213 -0.016064817 -0.015982990 -0.025739038
## CIDH920105.lag5 CIDH920105.lag6 CIDH920105.lag7 CIDH920105.lag8
@@ -489,14 +497,17 @@
With extractMoran()
(which has the identical parameters as extractMoreauBroto()
), we can compute the Moran autocorrelation descriptors (only print out the first 36 elements):
# Use the 3 custom properties defined before
# and 4 properties in the AAindex database
-moran = extractMoran(
- x, customprops = myprops,
- props = c(
- "CIDH920105", "BHAR880101",
- "CHAM820101", "CHAM820102",
- "MyProp1", "MyProp2", "MyProp3"))
-
-head(moran, n = 36L)
+moran <- extractMoran(
+ x,
+ customprops = myprops,
+ props = c(
+ "CIDH920105", "BHAR880101",
+ "CHAM820101", "CHAM820102",
+ "MyProp1", "MyProp2", "MyProp3"
+ )
+)
+
+head(moran, n = 36L)
## CIDH920105.lag1 CIDH920105.lag2 CIDH920105.lag3 CIDH920105.lag4
## 0.062895724 -0.044827681 -0.045065117 -0.055955678
## CIDH920105.lag5 CIDH920105.lag6 CIDH920105.lag7 CIDH920105.lag8
@@ -527,14 +538,17 @@
For each amino acid index, there will be \(3 \times \textrm{nlag}\) autocorrelation descriptors. The usage of extractGeary()
is exactly the same as extractMoreauBroto()
and extractMoran()
:
# Use the 3 custom properties defined before
# and 4 properties in the AAindex database
-geary = extractGeary(
- x, customprops = myprops,
- props = c(
- "CIDH920105", "BHAR880101",
- "CHAM820101", "CHAM820102",
- "MyProp1", "MyProp2", "MyProp3"))
-
-head(geary, n = 36L)
+geary <- extractGeary(
+ x,
+ customprops = myprops,
+ props = c(
+ "CIDH920105", "BHAR880101",
+ "CHAM820101", "CHAM820102",
+ "MyProp1", "MyProp2", "MyProp3"
+ )
+)
+
+head(geary, n = 36L)
## CIDH920105.lag1 CIDH920105.lag2 CIDH920105.lag3 CIDH920105.lag4
## 0.9361830 1.0442920 1.0452843 1.0563467
## CIDH920105.lag5 CIDH920105.lag6 CIDH920105.lag7 CIDH920105.lag8
@@ -867,7 +881,7 @@
\]
The numerical value of \(d_i\) of each protein ranges from 0 to 1, which thereby enables the comparison between proteins. Accordingly, we obtain another vector space (designated \(\mathbf{D}\)) consisting of \(d_i\) to represent protein.
To compute conjoint triads of protein sequences, we can simply use:
-ctriad = extractCTriad(x)
+
## VS111 VS211 VS311 VS411 VS511 VS611 VS711 VS121 VS221 VS321 VS421 VS521
## 0.1 0.3 0.6 0.2 0.4 0.0 0.3 1.0 0.6 0.5 0.0 0.2
@@ -1178,14 +1192,18 @@
Note that each of the scales-based descriptor functions are freely to combine with the more than 20 classes of 2D and 3D molecular descriptors to construct highly customized scales-based descriptors. Of course, these functions are designed to be flexible enough that users can provide totally self-defined property matrices to construct scales-based descriptors.
For example, to compute the “topological scales” derived by PCA (using the first 5 principal components), one can use extractDescScales()
:
-x = readFASTA(system.file(
- "protseq/P00750.fasta", package = "protr"))[[1]]
-
-descscales = extractDescScales(
- x, propmat = "AATopo",
- index = c(37:41, 43:47),
- pc = 5, lag = 7, silent = FALSE)
-## Summary of the first 5 principal components:
+x <- readFASTA(system.file(
+ "protseq/P00750.fasta",
+ package = "protr"
+))[[1]]
+
+descscales <- extractDescScales(
+ x,
+ propmat = "AATopo",
+ index = c(37:41, 43:47),
+ pc = 5, lag = 7, silent = FALSE
+)
+## Summary of the first 5 principal components:
## PC1 PC2 PC3 PC4 PC5
## Standard deviation 2.581537 1.754133 0.4621854 0.1918666 0.08972087
## Proportion of Variance 0.666430 0.307700 0.0213600 0.0036800 0.00080000
@@ -1202,13 +1220,17 @@
## scl1.lag3 scl2.lag3 scl3.lag3 scl4.lag3 scl5.lag3
## 2.011431e-02 -9.211136e-02 -1.461755e-03 6.747801e-04 2.386782e-04
For another example, to compute the descriptors derived by the BLOSUM62 matrix and use the first 5 scales, one can use:
-x = readFASTA(system.file(
- "protseq/P00750.fasta", package = "protr"))[[1]]
-
-blosum = extractBLOSUM(
- x, submat = "AABLOSUM62",
- k = 5, lag = 7, scale = TRUE, silent = FALSE)
-## Relative importance of all the possible 20 scales:
+x <- readFASTA(system.file(
+ "protseq/P00750.fasta",
+ package = "protr"
+))[[1]]
+
+blosum <- extractBLOSUM(
+ x,
+ submat = "AABLOSUM62",
+ k = 5, lag = 7, scale = TRUE, silent = FALSE
+)
+## Relative importance of all the possible 20 scales:
## [1] 1.204960e+01 7.982007e+00 6.254364e+00 4.533706e+00 4.326286e+00
## [6] 3.850579e+00 3.752197e+00 3.538207e+00 3.139155e+00 2.546405e+00
## [11] 2.373286e+00 1.666259e+00 1.553126e+00 1.263685e+00 1.024699e+00
@@ -1230,14 +1252,14 @@
Similarity Calculation by Sequence Alignment
Similarity computation derived by local or global protein sequence alignment between a list of protein sequences is of great need in protein research. However, this type of pairwise similarity computation often computationally intensive, especially when there exists many protein sequences. Luckily, this process is also highly parallelizable, the protr package integrates the function of parallelized similarity computation derived by local or global protein sequence alignment between a list of protein sequences.
The function twoSeqSim()
calculates the alignment result between two protein sequences. The function parSeqSim()
calculates the pairwise similarity calculation with a list of protein sequences in parallel:
-s1 = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]]
-s2 = readFASTA(system.file("protseq/P08218.fasta", package = "protr"))[[1]]
-s3 = readFASTA(system.file("protseq/P10323.fasta", package = "protr"))[[1]]
-s4 = readFASTA(system.file("protseq/P20160.fasta", package = "protr"))[[1]]
-s5 = readFASTA(system.file("protseq/Q9NZP8.fasta", package = "protr"))[[1]]
-plist = list(s1, s2, s3, s4, s5)
-psimmat = parSeqSim(plist, cores = 4, type = "local", submat = "BLOSUM62")
-print(psimmat)
+s1 <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]]
+s2 <- readFASTA(system.file("protseq/P08218.fasta", package = "protr"))[[1]]
+s3 <- readFASTA(system.file("protseq/P10323.fasta", package = "protr"))[[1]]
+s4 <- readFASTA(system.file("protseq/P20160.fasta", package = "protr"))[[1]]
+s5 <- readFASTA(system.file("protseq/Q9NZP8.fasta", package = "protr"))[[1]]
+plist <- list(s1, s2, s3, s4, s5)
+psimmat <- parSeqSim(plist, cores = 4, type = "local", submat = "BLOSUM62")
+psimmat
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.00000000 0.11825938 0.10236985 0.04921696 0.03943488
## [2,] 0.11825938 1.00000000 0.18858241 0.12124217 0.06391103
@@ -1259,26 +1281,26 @@
The protr package also integrates the function of similarity score computation derived by Gene Ontology (GO) semantic similarity measures between a list of GO terms or Entrez Gene IDs.
The function twoGOSim()
calculates the similarity derived by GO-terms semantic similarity measures between two GO terms / Entrez Gene IDs, and the function parGOSim()
calculates the pairwise similarity with a list of GO terms / Entrez Gene IDs:
# By GO Terms
-go1 = c(
+go1 <- c(
"GO:0005215", "GO:0005488", "GO:0005515",
"GO:0005625", "GO:0005802", "GO:0005905"
) # AP4B1
-go2 = c(
+go2 <- c(
"GO:0005515", "GO:0005634", "GO:0005681",
"GO:0008380", "GO:0031202"
) # BCAS2
-go3 = c(
+go3 <- c(
"GO:0003735", "GO:0005622", "GO:0005840",
"GO:0006412"
) # PDE4DIP
-golist = list(go1, go2, go3)
+golist <- list(go1, go2, go3)
parGOSim(golist, type = "go", ont = "CC", measure = "Wang")
# By Entrez gene id
-genelist = list(c("150", "151", "152", "1814", "1815", "1816"))
+genelist <- list(c("150", "151", "152", "1814", "1815", "1816"))
parGOSim(genelist, type = "gene", ont = "BP", measure = "Wang")
## 150 151 152 1814 1815 1816
## 150 1.000 0.702 0.725 0.496 0.570 0.455
@@ -1296,9 +1318,9 @@
Retrieve Protein Sequences from UniProt
This function getUniProt()
gets protein sequences from uniprot.org by protein ID(s). The input ID
is a character vector specifying the protein ID(s). The returned sequences are stored in a list:
-
+
## [[1]]
## [1] "MDAMKRGLCCVLLLCGAVFVSPSQEIHARFRRGARSYQVICRDEKTQMIYQQHQSWLRPVLRSNRVEYCWCN
## SGRAQCHSVPVKSCSEPRCFNGGTCQQALYFSDFVCQCPEGFAGKCCEIDTRATCYEDQGISYRGTWSTAESGAECT
@@ -1308,7 +1330,7 @@
## GEEEQKFEVEKYIVHKEFDDDTYDNDIALLQLKSDSSRCAQESSVVRTVCLPPADLQLPDWTECELSGYGKHEALSP
## FYSERLKEAHVRLYPSSRCTSQHLLNRTVTDNMLCAGDTRSGGPQANLHDACQGDSGGPLVCLNDGRMTLVGIISWG
## LGCGQKDVPGVYTKVTNYLDWIRDNMRP"
-##
+##
## [[2]]
## [1] "MGSNLSPQLCLMPFILGLLSGGVTTTPWSLARPQGSCSLEGVEIKGGSFRLLQEGQALEYVCPSGFYPYPVQ
## TRTCRSTGSWSTLKTQDQKTVRKAECRAIHCPRPHDFENGEYWPRSPYYNVSDEISFHCYDGYTLRGSANRTCQVNG
@@ -1320,7 +1342,7 @@
## HSIKVSVGGEKRDLEIEVVLFHPNYNINGKKEAGIPEFYDYDVALIKLKNKLKYGQTIRPICLPCTEGTTRALRLPP
## TTTCQQQKEELLPAQDIKALFVSEEEKKLTRKEVYIKNGDKKGSCERDAQYAPGYDKVKDISEVVTPRFLCTGGVSP
## YADPNTCRGDSGGPLIVHKRSRFIQVGVISWGVVDVCKNQKRQKQVPAHARDFHINLFQVLPWLKEKLQDEDLGFL"
-##
+##
## [[3]]
## [1] "APPIQSRIIGGRECEKNSHPWQVAIYHYSSFQCGGVLVNPKWVLTAAHCKNDNYEVWLGRHNLFENENTAQF
## FGVTADFPHPGFNLSLLKXHTKADGKDYSHDLMLLRLQSPAKITDAVKVLELPTQEPELGSTCEASGWGSIEPGPDB
@@ -1341,7 +1363,7 @@
Sanity Check for Amino Acid Types
The protcheck()
function checks if the protein sequence’s amino acid types are in the 20 default types, which returns a TRUE
if all the amino acids in the sequence belongs to the 20 default types:
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]]
+x <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]]
# a real sequence
protcheck(x)
## [1] TRUE
@@ -1761,7 +1783,7 @@
Schneider,G. and Wrede,P. (1994) The rational design of amino acid sequences by artificial neural networks and simulated molecular evolution: Do novo design of an idealized leader cleavage site. Biophysical Journal, 66, 335–344.
-Shen,J. et al. (2007) Predicting protein-protein interactions based only on sequences information. Proceedings of the National Academy of Sciences, 104, 4337–4341.
+Shen,J.W. et al. (2007) Predicting protein-protein interactions based only on sequences information. Proceedings of the National Academy of Sciences, 104, 4337–4341.
Xiao,N. et al. (2015) protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences. Bioinformatics, 31, 1857–1859.
@@ -1779,7 +1801,7 @@
Contents
@@ -111,7 +111,7 @@ Citation
Xiao N, Cao D, Zhu M, Xu Q (2015).
“protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences.”
Bioinformatics, 31(11), 1857–1859.
-doi: 10.1093/bioinformatics/btv042.
+doi: 10.1093/bioinformatics/btv042.
@Article{,
title = {protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences},
diff --git a/docs/index.html b/docs/index.html
index 837fab5..4e07776 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -36,7 +36,7 @@
@@ -111,15 +111,15 @@
devtools::install_github("nanxstats/protr")
Browse the package vignette for a quick-start.
-
+
-Shiny Web Application
+Shiny App
ProtrWeb, the Shiny web application built on protr, can be accessed from http://protr.org.
ProtrWeb is a user-friendly web application for computing the protein sequence descriptors (features) presented in the protr package.
-
+
-
-
-Links
-
-- Website: https://nanx.me/protr/
-
-- CRAN: https://cran.r-project.org/package=protr
-
-- GitHub: https://github.com/nanxstats/protr
-
-
-
Contribute
@@ -275,7 +263,6 @@ Developers
Dev status
diff --git a/docs/news/index.html b/docs/news/index.html
index 3e367be..764c5ab 100644
--- a/docs/news/index.html
+++ b/docs/news/index.html
@@ -62,7 +62,7 @@
@@ -108,14 +108,28 @@ Changelog
Source: NEWS.md
-
+
+
+
+protr 1.6-1 (2019-02-24) 2019-02-24
+
+
+
+Improvements
+
- Added a new argument
batches
to parSeqSim()
. The new argument supports breaking down the pairwise similarity computation into smaller batches. This is useful when you have a large number of protein sequences, enough number of CPU cores, but not enough RAM to compute and hold all the pairwise similarities in a single batch. Also, use the other new argument verbose
to track the computation progress.
@@ -157,9 +171,9 @@
Fixed the API endpoint issue (from HTTP to HTTPS) in getUniProt()
.
-
+
-Improvements
+Improvements
- Added two new parameters
gap.opening
and gap.extension
to parSeqSim()
, allowing more flexible tuning of the sequence alignment for more types of amino acid sequence data. We thank Dr. Maisa Pinheiro for the feedback.
- Added floating TOC and new CSS style in the vignette to improve navigation and readability.
@@ -189,9 +203,9 @@
- Resolved a critical bug due to improper
ifelse
conditioning (3f6e106) for the distribution descriptor in CTD. We thank Jielu Yan from the University of Macau for kindly reporting this issue.
-
+
-Improvements
+Improvements
- General fixes and improvements for the package vignette.
@@ -201,9 +215,9 @@
protr 1.4-2 (2017-09-28) 2017-09-29
-
+
-Improvements
+Improvements
- The function list is now organized into sections on the package website (https://nanx.me/protr/reference/).
- Use system font stack instead of Google Fonts in vignettes to avoid pandoc SSL issue.
@@ -214,9 +228,9 @@
protr 1.4-1 (2017-07-08) 2017-07-09
-
+
-Improvements
+Improvements
- Converted table images to markdown tables in the vignette
- Updated the screenshot of protrweb in the vignette
@@ -227,9 +241,9 @@
protr 1.4-0 (2017-06-06) 2017-06-06
-
+
-Improvements
+Improvements
- Migrated from Sweave-based PDF vignette to knitr-based HTML vignette
@@ -239,9 +253,9 @@
protr 1.3-0 (2017-05-07) 2017-05-08
-
+
-Improvements
+Improvements
- Fix obsolete URLs
- Better R code formatting
@@ -253,9 +267,9 @@
protr 1.2-1 (2016-12-29) 2016-12-30
-
+
-Improvements
+Improvements
- New website: https://nanx.me/protr/
@@ -268,9 +282,9 @@
protr 1.2-0 (2016-11-12) 2016-11-12
-
+
-Improvements
+Improvements
- Added continuous integration
- Code style improvements
@@ -320,9 +334,9 @@
- Improvements for dealing with boundary cases in several functions (thanks for @koefoed’s patches)
-
+
-Improvements
+Improvements
- Added citation information
@@ -332,9 +346,9 @@
protr 0.5-1 (2014-12-22) 2014-12-24
-
+
-Improvements
+Improvements
- Minor improvements and fixes for documentation
@@ -344,9 +358,9 @@
protr 0.5-0 (2014-12-18) Unreleased
-
+
-Improvements
+Improvements
- Added functions allowing users to specify their own classification of the amino acid
- Documentation improvements
@@ -358,9 +372,9 @@
protr 0.4-1 (2014-10-10) 2014-10-10
-
+
-Improvements
+Improvements
- General documentation improvements
@@ -382,9 +396,9 @@
protr 0.3-0 (2014-06-20) Unreleased
-
+
-Improvements
+Improvements
- Added example workflow using protr in the vignette
@@ -394,9 +408,9 @@
protr 0.2-1 (2014-01-25) 2014-01-25
-
+
diff --git a/docs/reference/AAACF.html b/docs/reference/AAACF.html
index 3db891c..49a6070 100644
--- a/docs/reference/AAACF.html
+++ b/docs/reference/AAACF.html
@@ -67,7 +67,7 @@
diff --git a/docs/reference/AABLOSUM100.html b/docs/reference/AABLOSUM100.html
index 1e259b8..b9a1992 100644
--- a/docs/reference/AABLOSUM100.html
+++ b/docs/reference/AABLOSUM100.html
@@ -66,7 +66,7 @@
diff --git a/docs/reference/AABLOSUM45.html b/docs/reference/AABLOSUM45.html
index b0ff3af..b887aee 100644
--- a/docs/reference/AABLOSUM45.html
+++ b/docs/reference/AABLOSUM45.html
@@ -66,7 +66,7 @@
diff --git a/docs/reference/AABLOSUM50.html b/docs/reference/AABLOSUM50.html
index 377843d..b9a3177 100644
--- a/docs/reference/AABLOSUM50.html
+++ b/docs/reference/AABLOSUM50.html
@@ -66,7 +66,7 @@
diff --git a/docs/reference/AABLOSUM62.html b/docs/reference/AABLOSUM62.html
index 2ddd20f..d439daa 100644
--- a/docs/reference/AABLOSUM62.html
+++ b/docs/reference/AABLOSUM62.html
@@ -66,7 +66,7 @@
diff --git a/docs/reference/AABLOSUM80.html b/docs/reference/AABLOSUM80.html
index 31f09ac..b21c136 100644
--- a/docs/reference/AABLOSUM80.html
+++ b/docs/reference/AABLOSUM80.html
@@ -66,7 +66,7 @@
diff --git a/docs/reference/AABurden.html b/docs/reference/AABurden.html
index 6fac997..0f50582 100644
--- a/docs/reference/AABurden.html
+++ b/docs/reference/AABurden.html
@@ -67,7 +67,7 @@
diff --git a/docs/reference/AACPSA.html b/docs/reference/AACPSA.html
index a661744..c90a538 100644
--- a/docs/reference/AACPSA.html
+++ b/docs/reference/AACPSA.html
@@ -67,7 +67,7 @@
diff --git a/docs/reference/AAConn.html b/docs/reference/AAConn.html
index 49ac001..b3700a4 100644
--- a/docs/reference/AAConn.html
+++ b/docs/reference/AAConn.html
@@ -67,7 +67,7 @@
diff --git a/docs/reference/AAConst.html b/docs/reference/AAConst.html
index 3a63e47..6826a2c 100644
--- a/docs/reference/AAConst.html
+++ b/docs/reference/AAConst.html
@@ -67,7 +67,7 @@
diff --git a/docs/reference/AADescAll.html b/docs/reference/AADescAll.html
index f45332e..99d30bd 100644
--- a/docs/reference/AADescAll.html
+++ b/docs/reference/AADescAll.html
@@ -67,7 +67,7 @@
diff --git a/docs/reference/AAEdgeAdj.html b/docs/reference/AAEdgeAdj.html
index 79f5dca..70f9378 100644
--- a/docs/reference/AAEdgeAdj.html
+++ b/docs/reference/AAEdgeAdj.html
@@ -67,7 +67,7 @@
diff --git a/docs/reference/AAEigIdx.html b/docs/reference/AAEigIdx.html
index c622cfc..e16d3dd 100644
--- a/docs/reference/AAEigIdx.html
+++ b/docs/reference/AAEigIdx.html
@@ -67,7 +67,7 @@
diff --git a/docs/reference/AAFGC.html b/docs/reference/AAFGC.html
index fbcd4d8..2d3947a 100644
--- a/docs/reference/AAFGC.html
+++ b/docs/reference/AAFGC.html
@@ -67,7 +67,7 @@
diff --git a/docs/reference/AAGETAWAY.html b/docs/reference/AAGETAWAY.html
index e2f07a2..6eae269 100644
--- a/docs/reference/AAGETAWAY.html
+++ b/docs/reference/AAGETAWAY.html
@@ -67,7 +67,7 @@
diff --git a/docs/reference/AAGeom.html b/docs/reference/AAGeom.html
index 82fb592..ed095cd 100644
--- a/docs/reference/AAGeom.html
+++ b/docs/reference/AAGeom.html
@@ -67,7 +67,7 @@
diff --git a/docs/reference/AAInfo.html b/docs/reference/AAInfo.html
index a2810f3..03aecd8 100644
--- a/docs/reference/AAInfo.html
+++ b/docs/reference/AAInfo.html
@@ -67,7 +67,7 @@
diff --git a/docs/reference/AAMOE2D.html b/docs/reference/AAMOE2D.html
index 1c87d82..307324c 100644
--- a/docs/reference/AAMOE2D.html
+++ b/docs/reference/AAMOE2D.html
@@ -67,7 +67,7 @@
diff --git a/docs/reference/AAMOE3D.html b/docs/reference/AAMOE3D.html
index 8af3383..91c93fc 100644
--- a/docs/reference/AAMOE3D.html
+++ b/docs/reference/AAMOE3D.html
@@ -70,7 +70,7 @@
diff --git a/docs/reference/AAMetaInfo.html b/docs/reference/AAMetaInfo.html
index 20adf48..692c7af 100644
--- a/docs/reference/AAMetaInfo.html
+++ b/docs/reference/AAMetaInfo.html
@@ -74,7 +74,7 @@
diff --git a/docs/reference/AAMolProp.html b/docs/reference/AAMolProp.html
index 6fdbcc1..14c5f93 100644
--- a/docs/reference/AAMolProp.html
+++ b/docs/reference/AAMolProp.html
@@ -67,7 +67,7 @@
diff --git a/docs/reference/AAPAM120.html b/docs/reference/AAPAM120.html
index 3594c73..fbf1e4a 100644
--- a/docs/reference/AAPAM120.html
+++ b/docs/reference/AAPAM120.html
@@ -66,7 +66,7 @@
diff --git a/docs/reference/AAPAM250.html b/docs/reference/AAPAM250.html
index 3080d58..7cb528f 100644
--- a/docs/reference/AAPAM250.html
+++ b/docs/reference/AAPAM250.html
@@ -66,7 +66,7 @@
diff --git a/docs/reference/AAPAM30.html b/docs/reference/AAPAM30.html
index db4c41a..35d5311 100644
--- a/docs/reference/AAPAM30.html
+++ b/docs/reference/AAPAM30.html
@@ -66,7 +66,7 @@
diff --git a/docs/reference/AAPAM40.html b/docs/reference/AAPAM40.html
index f1845ce..9248b70 100644
--- a/docs/reference/AAPAM40.html
+++ b/docs/reference/AAPAM40.html
@@ -66,7 +66,7 @@
diff --git a/docs/reference/AAPAM70.html b/docs/reference/AAPAM70.html
index a15d49d..af4c3d3 100644
--- a/docs/reference/AAPAM70.html
+++ b/docs/reference/AAPAM70.html
@@ -66,7 +66,7 @@
diff --git a/docs/reference/AARDF.html b/docs/reference/AARDF.html
index 0df6fe2..ed1c446 100644
--- a/docs/reference/AARDF.html
+++ b/docs/reference/AARDF.html
@@ -67,7 +67,7 @@
diff --git a/docs/reference/AARandic.html b/docs/reference/AARandic.html
index fc9181d..e09475b 100644
--- a/docs/reference/AARandic.html
+++ b/docs/reference/AARandic.html
@@ -67,7 +67,7 @@
diff --git a/docs/reference/AATopo.html b/docs/reference/AATopo.html
index b5b3a8f..63986f7 100644
--- a/docs/reference/AATopo.html
+++ b/docs/reference/AATopo.html
@@ -67,7 +67,7 @@
# This operation requires the rcdk package +# this operation requires the rcdk package # require(rcdk) # optaa3d = load.molecules(system.file("sysdata/OptAA3d.sdf", package = "protr")) # view.molecule.2d(optaa3d[[1]]) # view the first AA diff --git a/docs/reference/acc.html b/docs/reference/acc.html index 5047d8e..ad621c1 100644 --- a/docs/reference/acc.html +++ b/docs/reference/acc.html @@ -68,7 +68,7 @@@@ -169,10 +169,10 @@See a
Examples
-p = 8 # p is the scales number -n = 200 # n is the amino acid number -lag = 7 # the lag paramter -mat = matrix(rnorm(p * n), nrow = p, ncol = n) +p <- 8 # p is the scales number +n <- 200 # n is the amino acid number +lag <- 7 # the lag paramter +mat <- matrix(rnorm(p * n), nrow = p, ncol = n) acc(mat, lag)#> scl1.lag1 scl2.lag1 scl3.lag1 scl4.lag1 scl5.lag1 #> -1.014626e-02 -4.689789e-02 -2.692515e-02 1.647534e-02 9.724370e-02 #> scl6.lag1 scl7.lag1 scl8.lag1 scl1.lag2 scl2.lag2 diff --git a/docs/reference/extractAAC.html b/docs/reference/extractAAC.html index 149e45a..adaf79a 100644 --- a/docs/reference/extractAAC.html +++ b/docs/reference/extractAAC.html @@ -65,7 +65,7 @@@@ -147,7 +147,7 @@See a
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +#> A R N D C E Q #> 0.06405694 0.07117438 0.03914591 0.05160142 0.06761566 0.04804270 0.04804270 #> G H I L K M F diff --git a/docs/reference/extractAPAAC.html b/docs/reference/extractAPAAC.html index 8289be2..95d4e9a 100644 --- a/docs/reference/extractAPAAC.html +++ b/docs/reference/extractAPAAC.html @@ -67,7 +67,7 @@@@ -202,7 +202,7 @@See a
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +#> Pc1.A Pc1.R Pc1.N #> 3.537412e+01 3.930458e+01 2.161752e+01 #> Pc1.D Pc1.C Pc1.E @@ -257,28 +257,31 @@Examp #> -2.512263e-03 1.387641e-03 2.060890e-03 #> Pc2.Hydrophobicity.30 Pc2.Hydrophilicity.30 #> 3.177340e-04 1.451909e-03
-myprops = data.frame( +myprops <- data.frame( AccNo = c("MyProp1", "MyProp2", "MyProp3"), - A = c(0.62, -0.5, 15), R = c(-2.53, 3, 101), - N = c(-0.78, 0.2, 58), D = c(-0.9, 3, 59), - C = c(0.29, -1, 47), E = c(-0.74, 3, 73), - Q = c(-0.85, 0.2, 72), G = c(0.48, 0, 1), - H = c(-0.4, -0.5, 82), I = c(1.38, -1.8, 57), - L = c(1.06, -1.8, 57), K = c(-1.5, 3, 73), - M = c(0.64, -1.3, 75), F = c(1.19, -2.5, 91), - P = c(0.12, 0, 42), S = c(-0.18, 0.3, 31), - T = c(-0.05, -0.4, 45), W = c(0.81, -3.4, 130), - Y = c(0.26, -2.3, 107), V = c(1.08, -1.5, 43)) + A = c(0.62, -0.5, 15), R = c(-2.53, 3, 101), + N = c(-0.78, 0.2, 58), D = c(-0.9, 3, 59), + C = c(0.29, -1, 47), E = c(-0.74, 3, 73), + Q = c(-0.85, 0.2, 72), G = c(0.48, 0, 1), + H = c(-0.4, -0.5, 82), I = c(1.38, -1.8, 57), + L = c(1.06, -1.8, 57), K = c(-1.5, 3, 73), + M = c(0.64, -1.3, 75), F = c(1.19, -2.5, 91), + P = c(0.12, 0, 42), S = c(-0.18, 0.3, 31), + T = c(-0.05, -0.4, 45), W = c(0.81, -3.4, 130), + Y = c(0.26, -2.3, 107), V = c(1.08, -1.5, 43) +) # use 2 default properties, 4 properties from the # AAindex database, and 3 cutomized properties extractAPAAC( - x, customprops = myprops, + x, + customprops = myprops, props = c( "Hydrophobicity", "Hydrophilicity", "CIDH920105", "BHAR880101", "CHAM820101", "CHAM820102", - "MyProp1", "MyProp2", "MyProp3") + "MyProp1", "MyProp2", "MyProp3" + ) )#> Pc1.A Pc1.R Pc1.N #> 2.726537e+01 3.029486e+01 1.666217e+01 #> Pc1.D Pc1.C Pc1.E diff --git a/docs/reference/extractBLOSUM.html b/docs/reference/extractBLOSUM.html index 06c5efb..ef7e86a 100644 --- a/docs/reference/extractBLOSUM.html +++ b/docs/reference/extractBLOSUM.html @@ -68,7 +68,7 @@@@ -178,9 +178,8 @@R
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] -blosum = extractBLOSUM( - x, submat = "AABLOSUM62", k = 5, lag = 7, scale = TRUE, silent = FALSE)#> Relative importance of all the possible 20 scales: +x <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +blosum <- extractBLOSUM(x, submat = "AABLOSUM62", k = 5, lag = 7, scale = TRUE, silent = FALSE)#> Relative importance of all the possible 20 scales: #> [1] 1.204960e+01 7.982007e+00 6.254364e+00 4.533706e+00 4.326286e+00 #> [6] 3.850579e+00 3.752197e+00 3.538207e+00 3.139155e+00 2.546405e+00 #> [11] 2.373286e+00 1.666259e+00 1.553126e+00 1.263685e+00 1.024699e+00 diff --git a/docs/reference/extractCTDC.html b/docs/reference/extractCTDC.html index 0bfe553..90ad1d0 100644 --- a/docs/reference/extractCTDC.html +++ b/docs/reference/extractCTDC.html @@ -66,7 +66,7 @@@@ -160,7 +160,7 @@See a
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +#> hydrophobicity.Group1 hydrophobicity.Group2 hydrophobicity.Group3 #> 0.29715302 0.40569395 0.29715302 #> normwaalsvolume.Group1 normwaalsvolume.Group2 normwaalsvolume.Group3 diff --git a/docs/reference/extractCTDCClass.html b/docs/reference/extractCTDCClass.html index c913d06..5f221ed 100644 --- a/docs/reference/extractCTDCClass.html +++ b/docs/reference/extractCTDCClass.html @@ -68,7 +68,7 @@@@ -180,29 +180,32 @@See a
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +x <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] # using five customized amino acid property classification -group1 = list( - "hydrophobicity" = c("R", "K", "E", "D", "Q", "N"), +group1 <- list( + "hydrophobicity" = c("R", "K", "E", "D", "Q", "N"), "normwaalsvolume" = c("G", "A", "S", "T", "P", "D", "C"), - "polarizability" = c("G", "A", "S", "D", "T"), + "polarizability" = c("G", "A", "S", "D", "T"), "secondarystruct" = c("E", "A", "L", "M", "Q", "K", "R", "H"), - "solventaccess" = c("A", "L", "F", "C", "G", "I", "V", "W")) + "solventaccess" = c("A", "L", "F", "C", "G", "I", "V", "W") +) -group2 = list( - "hydrophobicity" = c("G", "A", "S", "T", "P", "H", "Y"), +group2 <- list( + "hydrophobicity" = c("G", "A", "S", "T", "P", "H", "Y"), "normwaalsvolume" = c("N", "V", "E", "Q", "I", "L"), - "polarizability" = c("C", "P", "N", "V", "E", "Q", "I", "L"), + "polarizability" = c("C", "P", "N", "V", "E", "Q", "I", "L"), "secondarystruct" = c("V", "I", "Y", "C", "W", "F", "T"), - "solventaccess" = c("R", "K", "Q", "E", "N", "D")) + "solventaccess" = c("R", "K", "Q", "E", "N", "D") +) -group3 = list( - "hydrophobicity" = c("C", "L", "V", "I", "M", "F", "W"), +group3 <- list( + "hydrophobicity" = c("C", "L", "V", "I", "M", "F", "W"), "normwaalsvolume" = c("M", "H", "K", "F", "R", "Y", "W"), - "polarizability" = c("K", "M", "H", "F", "R", "Y", "W"), + "polarizability" = c("K", "M", "H", "F", "R", "Y", "W"), "secondarystruct" = c("G", "N", "P", "S", "D"), - "solventaccess" = c("M", "S", "P", "T", "H", "Y")) + "solventaccess" = c("M", "S", "P", "T", "H", "Y") +) extractCTDCClass(x, aagroup1 = group1, aagroup2 = group2, aagroup3 = group3)#> prop1.G1 prop1.G2 prop1.G3 prop2.G1 prop2.G2 prop2.G3 prop3.G1 prop3.G2 #> 0.2971530 0.4056940 0.2971530 0.4519573 0.2971530 0.2508897 0.3309609 0.4181495 diff --git a/docs/reference/extractCTDD.html b/docs/reference/extractCTDD.html index 502c0e4..a0ee3a1 100644 --- a/docs/reference/extractCTDD.html +++ b/docs/reference/extractCTDD.html @@ -66,7 +66,7 @@@@ -160,7 +160,7 @@See a
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +#> prop1.G1.residue0 prop1.G1.residue25 prop1.G1.residue50 prop1.G1.residue75 #> 0.3558719 23.1316726 50.1779359 73.8434164 #> prop1.G1.residue100 prop1.G2.residue0 prop1.G2.residue25 prop1.G2.residue50 diff --git a/docs/reference/extractCTDDClass.html b/docs/reference/extractCTDDClass.html index 4ff8412..8e59d7d 100644 --- a/docs/reference/extractCTDDClass.html +++ b/docs/reference/extractCTDDClass.html @@ -68,7 +68,7 @@@@ -180,29 +180,32 @@See a
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +x <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] # using five customized amino acid property classification -group1 = list( - "hydrophobicity" = c("R", "K", "E", "D", "Q", "N"), +group1 <- list( + "hydrophobicity" = c("R", "K", "E", "D", "Q", "N"), "normwaalsvolume" = c("G", "A", "S", "T", "P", "D", "C"), - "polarizability" = c("G", "A", "S", "D", "T"), + "polarizability" = c("G", "A", "S", "D", "T"), "secondarystruct" = c("E", "A", "L", "M", "Q", "K", "R", "H"), - "solventaccess" = c("A", "L", "F", "C", "G", "I", "V", "W")) + "solventaccess" = c("A", "L", "F", "C", "G", "I", "V", "W") +) -group2 = list( - "hydrophobicity" = c("G", "A", "S", "T", "P", "H", "Y"), +group2 <- list( + "hydrophobicity" = c("G", "A", "S", "T", "P", "H", "Y"), "normwaalsvolume" = c("N", "V", "E", "Q", "I", "L"), - "polarizability" = c("C", "P", "N", "V", "E", "Q", "I", "L"), + "polarizability" = c("C", "P", "N", "V", "E", "Q", "I", "L"), "secondarystruct" = c("V", "I", "Y", "C", "W", "F", "T"), - "solventaccess" = c("R", "K", "Q", "E", "N", "D")) + "solventaccess" = c("R", "K", "Q", "E", "N", "D") +) -group3 = list( - "hydrophobicity" = c("C", "L", "V", "I", "M", "F", "W"), +group3 <- list( + "hydrophobicity" = c("C", "L", "V", "I", "M", "F", "W"), "normwaalsvolume" = c("M", "H", "K", "F", "R", "Y", "W"), - "polarizability" = c("K", "M", "H", "F", "R", "Y", "W"), + "polarizability" = c("K", "M", "H", "F", "R", "Y", "W"), "secondarystruct" = c("G", "N", "P", "S", "D"), - "solventaccess" = c("M", "S", "P", "T", "H", "Y")) + "solventaccess" = c("M", "S", "P", "T", "H", "Y") +) extractCTDDClass(x, aagroup1 = group1, aagroup2 = group2, aagroup3 = group3)#> prop1.G1.residue0 prop1.G1.residue25 prop1.G1.residue50 prop1.G1.residue75 #> 0.3558719 23.1316726 50.1779359 73.8434164 diff --git a/docs/reference/extractCTDT.html b/docs/reference/extractCTDT.html index 8503097..48f6a63 100644 --- a/docs/reference/extractCTDT.html +++ b/docs/reference/extractCTDT.html @@ -66,7 +66,7 @@@@ -160,7 +160,7 @@See a
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +#> prop1.Tr1221 prop1.Tr1331 prop1.Tr2332 prop2.Tr1221 prop2.Tr1331 prop2.Tr2332 #> 0.27094474 0.16042781 0.23351159 0.26737968 0.22638146 0.17112299 #> prop3.Tr1221 prop3.Tr1331 prop3.Tr2332 prop4.Tr1221 prop4.Tr1331 prop4.Tr2332 diff --git a/docs/reference/extractCTDTClass.html b/docs/reference/extractCTDTClass.html index fde9d2e..5d51549 100644 --- a/docs/reference/extractCTDTClass.html +++ b/docs/reference/extractCTDTClass.html @@ -68,7 +68,7 @@@@ -180,29 +180,32 @@See a
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +x <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] # using five customized amino acid property classification -group1 = list( - "hydrophobicity" = c("R", "K", "E", "D", "Q", "N"), +group1 <- list( + "hydrophobicity" = c("R", "K", "E", "D", "Q", "N"), "normwaalsvolume" = c("G", "A", "S", "T", "P", "D", "C"), - "polarizability" = c("G", "A", "S", "D", "T"), + "polarizability" = c("G", "A", "S", "D", "T"), "secondarystruct" = c("E", "A", "L", "M", "Q", "K", "R", "H"), - "solventaccess" = c("A", "L", "F", "C", "G", "I", "V", "W")) + "solventaccess" = c("A", "L", "F", "C", "G", "I", "V", "W") +) -group2 = list( - "hydrophobicity" = c("G", "A", "S", "T", "P", "H", "Y"), +group2 <- list( + "hydrophobicity" = c("G", "A", "S", "T", "P", "H", "Y"), "normwaalsvolume" = c("N", "V", "E", "Q", "I", "L"), - "polarizability" = c("C", "P", "N", "V", "E", "Q", "I", "L"), + "polarizability" = c("C", "P", "N", "V", "E", "Q", "I", "L"), "secondarystruct" = c("V", "I", "Y", "C", "W", "F", "T"), - "solventaccess" = c("R", "K", "Q", "E", "N", "D")) + "solventaccess" = c("R", "K", "Q", "E", "N", "D") +) -group3 = list( - "hydrophobicity" = c("C", "L", "V", "I", "M", "F", "W"), +group3 <- list( + "hydrophobicity" = c("C", "L", "V", "I", "M", "F", "W"), "normwaalsvolume" = c("M", "H", "K", "F", "R", "Y", "W"), - "polarizability" = c("K", "M", "H", "F", "R", "Y", "W"), + "polarizability" = c("K", "M", "H", "F", "R", "Y", "W"), "secondarystruct" = c("G", "N", "P", "S", "D"), - "solventaccess" = c("M", "S", "P", "T", "H", "Y")) + "solventaccess" = c("M", "S", "P", "T", "H", "Y") +) extractCTDTClass(x, aagroup1 = group1, aagroup2 = group2, aagroup3 = group3)#> prop1.Tr1221 prop1.Tr1331 prop1.Tr2332 prop2.Tr1221 prop2.Tr1331 prop2.Tr2332 #> 0.2709447 0.1604278 0.2335116 0.2673797 0.2263815 0.1711230 diff --git a/docs/reference/extractCTriad.html b/docs/reference/extractCTriad.html index 76b3d11..2234079 100644 --- a/docs/reference/extractCTriad.html +++ b/docs/reference/extractCTriad.html @@ -65,7 +65,7 @@@@ -150,7 +150,7 @@R
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +#> VS111 VS211 VS311 VS411 VS511 VS611 VS711 VS121 VS221 VS321 VS421 VS521 VS621 #> 0.1 0.3 0.6 0.2 0.4 0.0 0.3 1.0 0.6 0.5 0.0 0.2 0.3 #> VS721 VS131 VS231 VS331 VS431 VS531 VS631 VS731 VS141 VS241 VS341 VS441 VS541 diff --git a/docs/reference/extractCTriadClass.html b/docs/reference/extractCTriadClass.html index 8bc43e9..85bc0d3 100644 --- a/docs/reference/extractCTriadClass.html +++ b/docs/reference/extractCTriadClass.html @@ -68,7 +68,7 @@@@ -161,13 +161,14 @@R
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +x <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] # use customized amino acid classification (normalized van der Waals volume) -newclass = list( +newclass <- list( c("G", "A", "S", "T", "P", "D", "C"), c("N", "V", "E", "Q", "I", "L"), - c("M", "H", "K", "F", "R", "Y", "W")) + c("M", "H", "K", "F", "R", "Y", "W") +) extractCTriadClass(x, aaclass = newclass)#> VS111 VS211 VS311 VS121 VS221 VS321 VS131 #> 0.90384615 0.55769231 0.46153846 0.59615385 0.23076923 0.26923077 0.42307692 diff --git a/docs/reference/extractDC.html b/docs/reference/extractDC.html index 31f2732..d89e641 100644 --- a/docs/reference/extractDC.html +++ b/docs/reference/extractDC.html @@ -65,7 +65,7 @@@@ -147,7 +147,7 @@See a
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +#> AA RA NA DA CA EA #> 0.003565062 0.003565062 0.000000000 0.007130125 0.003565062 0.003565062 #> QA GA HA IA LA KA diff --git a/docs/reference/extractDescScales.html b/docs/reference/extractDescScales.html index ea4aacf..c2c1734 100644 --- a/docs/reference/extractDescScales.html +++ b/docs/reference/extractDescScales.html @@ -69,7 +69,7 @@@@ -186,10 +186,12 @@Value
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] -descscales = extractDescScales( - x, propmat = "AATopo", index = c(37:41, 43:47), - pc = 5, lag = 7, silent = FALSE)#> Summary of the first 5 principal components: +x <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +descscales <- extractDescScales( + x, + propmat = "AATopo", index = c(37:41, 43:47), + pc = 5, lag = 7, silent = FALSE +)#> Summary of the first 5 principal components: #> PC1 PC2 PC3 PC4 PC5 #> Standard deviation 2.581537 1.754133 0.4621854 0.1918666 0.08972087 #> Proportion of Variance 0.666430 0.307700 0.0213600 0.0036800 0.00080000 diff --git a/docs/reference/extractFAScales.html b/docs/reference/extractFAScales.html index d4ca59f..235e2af 100644 --- a/docs/reference/extractFAScales.html +++ b/docs/reference/extractFAScales.html @@ -67,7 +67,7 @@@@ -182,11 +182,10 @@R
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +x <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] data(AATopo) -tprops = AATopo[, c(37:41, 43:47)] # select a set of topological descriptors -fa = extractFAScales( - x, propmat = tprops, factors = 5, lag = 7, silent = FALSE)#> Summary of the factor analysis result: +tprops <- AATopo[, c(37:41, 43:47)] # select a set of topological descriptors +fa <- extractFAScales(x, propmat = tprops, factors = 5, lag = 7, silent = FALSE)#> Summary of the factor analysis result: #> #> Call: #> factanal(x = propmat, factors = factors, scores = scores) diff --git a/docs/reference/extractGeary.html b/docs/reference/extractGeary.html index a8c114f..4d690df 100644 --- a/docs/reference/extractGeary.html +++ b/docs/reference/extractGeary.html @@ -66,7 +66,7 @@@@ -205,7 +205,7 @@See a
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +#> CIDH920105.lag1 CIDH920105.lag2 CIDH920105.lag3 CIDH920105.lag4 #> 0.9361830 1.0442920 1.0452843 1.0563467 #> CIDH920105.lag5 CIDH920105.lag6 CIDH920105.lag7 CIDH920105.lag8 @@ -326,26 +326,29 @@Examp #> 0.9977120 0.9509454 1.0878960 1.0429411 #> DAYM780201.lag27 DAYM780201.lag28 DAYM780201.lag29 DAYM780201.lag30 #> 0.9938437 0.9506562 0.9532393 1.0463685
-myprops = data.frame( +myprops <- data.frame( AccNo = c("MyProp1", "MyProp2", "MyProp3"), - A = c(0.62, -0.5, 15), R = c(-2.53, 3, 101), - N = c(-0.78, 0.2, 58), D = c(-0.9, 3, 59), - C = c(0.29, -1, 47), E = c(-0.74, 3, 73), - Q = c(-0.85, 0.2, 72), G = c(0.48, 0, 1), - H = c(-0.4, -0.5, 82), I = c(1.38, -1.8, 57), - L = c(1.06, -1.8, 57), K = c(-1.5, 3, 73), - M = c(0.64, -1.3, 75), F = c(1.19, -2.5, 91), - P = c(0.12, 0, 42), S = c(-0.18, 0.3, 31), - T = c(-0.05, -0.4, 45), W = c(0.81, -3.4, 130), - Y = c(0.26, -2.3, 107), V = c(1.08, -1.5, 43)) + A = c(0.62, -0.5, 15), R = c(-2.53, 3, 101), + N = c(-0.78, 0.2, 58), D = c(-0.9, 3, 59), + C = c(0.29, -1, 47), E = c(-0.74, 3, 73), + Q = c(-0.85, 0.2, 72), G = c(0.48, 0, 1), + H = c(-0.4, -0.5, 82), I = c(1.38, -1.8, 57), + L = c(1.06, -1.8, 57), K = c(-1.5, 3, 73), + M = c(0.64, -1.3, 75), F = c(1.19, -2.5, 91), + P = c(0.12, 0, 42), S = c(-0.18, 0.3, 31), + T = c(-0.05, -0.4, 45), W = c(0.81, -3.4, 130), + Y = c(0.26, -2.3, 107), V = c(1.08, -1.5, 43) +) # Use 4 properties in the AAindex database, and 3 cutomized properties extractGeary( - x, customprops = myprops, + x, + customprops = myprops, props = c( "CIDH920105", "BHAR880101", "CHAM820101", "CHAM820102", - "MyProp1", "MyProp2", "MyProp3") + "MyProp1", "MyProp2", "MyProp3" + ) )#> CIDH920105.lag1 CIDH920105.lag2 CIDH920105.lag3 CIDH920105.lag4 #> 0.9361830 1.0442920 1.0452843 1.0563467 #> CIDH920105.lag5 CIDH920105.lag6 CIDH920105.lag7 CIDH920105.lag8 diff --git a/docs/reference/extractMDSScales.html b/docs/reference/extractMDSScales.html index feed6ae..94e1c4c 100644 --- a/docs/reference/extractMDSScales.html +++ b/docs/reference/extractMDSScales.html @@ -67,7 +67,7 @@@@ -179,10 +179,10 @@See a
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +x <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] data(AATopo) -tprops = AATopo[, c(37:41, 43:47)] # select a set of topological descriptors -mds = extractMDSScales(x, propmat = tprops, k = 5, lag = 7, silent = FALSE)#> Eigenvalues computed during the scaling process: +tprops <- AATopo[, c(37:41, 43:47)] # select a set of topological descriptors +mds <- extractMDSScales(x, propmat = tprops, k = 5, lag = 7, silent = FALSE)#> Eigenvalues computed during the scaling process: #> [1] 1.266223e+02 5.846270e+01 4.058692e+00 6.994430e-01 1.529469e-01 #> [6] 3.434787e-03 4.284842e-04 4.918500e-05 3.626185e-06 3.042621e-10 #> [11] 1.694106e-15 1.138489e-15 1.016035e-15 3.315250e-16 -2.060931e-16 diff --git a/docs/reference/extractMoran.html b/docs/reference/extractMoran.html index 7538087..4c6246b 100644 --- a/docs/reference/extractMoran.html +++ b/docs/reference/extractMoran.html @@ -66,7 +66,7 @@@@ -205,7 +205,7 @@See a
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +@@ -170,9 +170,8 @@#> CIDH920105.lag1 CIDH920105.lag2 CIDH920105.lag3 CIDH920105.lag4 #> 0.0628957240 -0.0448276812 -0.0450651172 -0.0559556782 #> CIDH920105.lag5 CIDH920105.lag6 CIDH920105.lag7 CIDH920105.lag8 @@ -326,26 +326,29 @@Examp #> 0.0040764169 0.0521186819 -0.0844529172 -0.0405209150 #> DAYM780201.lag27 DAYM780201.lag28 DAYM780201.lag29 DAYM780201.lag30 #> 0.0099613021 0.0534887247 0.0521919187 -0.0428922743
-myprops = data.frame( +myprops <- data.frame( AccNo = c("MyProp1", "MyProp2", "MyProp3"), - A = c(0.62, -0.5, 15), R = c(-2.53, 3, 101), - N = c(-0.78, 0.2, 58), D = c(-0.9, 3, 59), - C = c(0.29, -1, 47), E = c(-0.74, 3, 73), - Q = c(-0.85, 0.2, 72), G = c(0.48, 0, 1), - H = c(-0.4, -0.5, 82), I = c(1.38, -1.8, 57), - L = c(1.06, -1.8, 57), K = c(-1.5, 3, 73), - M = c(0.64, -1.3, 75), F = c(1.19, -2.5, 91), - P = c(0.12, 0, 42), S = c(-0.18, 0.3, 31), - T = c(-0.05, -0.4, 45), W = c(0.81, -3.4, 130), - Y = c(0.26, -2.3, 107), V = c(1.08, -1.5, 43)) + A = c(0.62, -0.5, 15), R = c(-2.53, 3, 101), + N = c(-0.78, 0.2, 58), D = c(-0.9, 3, 59), + C = c(0.29, -1, 47), E = c(-0.74, 3, 73), + Q = c(-0.85, 0.2, 72), G = c(0.48, 0, 1), + H = c(-0.4, -0.5, 82), I = c(1.38, -1.8, 57), + L = c(1.06, -1.8, 57), K = c(-1.5, 3, 73), + M = c(0.64, -1.3, 75), F = c(1.19, -2.5, 91), + P = c(0.12, 0, 42), S = c(-0.18, 0.3, 31), + T = c(-0.05, -0.4, 45), W = c(0.81, -3.4, 130), + Y = c(0.26, -2.3, 107), V = c(1.08, -1.5, 43) +) # Use 4 properties in the AAindex database, and 3 cutomized properties extractMoran( - x, customprops = myprops, + x, + customprops = myprops, props = c( "CIDH920105", "BHAR880101", "CHAM820101", "CHAM820102", - "MyProp1", "MyProp2", "MyProp3") + "MyProp1", "MyProp2", "MyProp3" + ) )#> CIDH920105.lag1 CIDH920105.lag2 CIDH920105.lag3 CIDH920105.lag4 #> 0.0628957240 -0.0448276812 -0.0450651172 -0.0559556782 #> CIDH920105.lag5 CIDH920105.lag6 CIDH920105.lag7 CIDH920105.lag8 diff --git a/docs/reference/extractMoreauBroto.html b/docs/reference/extractMoreauBroto.html index 94bc7ac..f5fd0fb 100644 --- a/docs/reference/extractMoreauBroto.html +++ b/docs/reference/extractMoreauBroto.html @@ -66,7 +66,7 @@@@ -204,7 +204,7 @@See a
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +@@ -166,21 +166,21 @@#> CIDH920105.lag1 CIDH920105.lag2 CIDH920105.lag3 CIDH920105.lag4 #> 0.0815732133 -0.0160648174 -0.0159829904 -0.0257390382 #> CIDH920105.lag5 CIDH920105.lag6 CIDH920105.lag7 CIDH920105.lag8 @@ -325,26 +325,29 @@Examp #> 0.0041585424 0.0529190142 -0.0857070549 -0.0411108131 #> DAYM780201.lag27 DAYM780201.lag28 DAYM780201.lag29 DAYM780201.lag30 #> 0.0101342845 0.0543171778 0.0530046312 -0.0434996151
-myprops = data.frame( +myprops <- data.frame( AccNo = c("MyProp1", "MyProp2", "MyProp3"), - A = c(0.62, -0.5, 15), R = c(-2.53, 3, 101), - N = c(-0.78, 0.2, 58), D = c(-0.9, 3, 59), - C = c(0.29, -1, 47), E = c(-0.74, 3, 73), - Q = c(-0.85, 0.2, 72), G = c(0.48, 0, 1), - H = c(-0.4, -0.5, 82), I = c(1.38, -1.8, 57), - L = c(1.06, -1.8, 57), K = c(-1.5, 3, 73), - M = c(0.64, -1.3, 75), F = c(1.19, -2.5, 91), - P = c(0.12, 0, 42), S = c(-0.18, 0.3, 31), - T = c(-0.05, -0.4, 45), W = c(0.81, -3.4, 130), - Y = c(0.26, -2.3, 107), V = c(1.08, -1.5, 43)) + A = c(0.62, -0.5, 15), R = c(-2.53, 3, 101), + N = c(-0.78, 0.2, 58), D = c(-0.9, 3, 59), + C = c(0.29, -1, 47), E = c(-0.74, 3, 73), + Q = c(-0.85, 0.2, 72), G = c(0.48, 0, 1), + H = c(-0.4, -0.5, 82), I = c(1.38, -1.8, 57), + L = c(1.06, -1.8, 57), K = c(-1.5, 3, 73), + M = c(0.64, -1.3, 75), F = c(1.19, -2.5, 91), + P = c(0.12, 0, 42), S = c(-0.18, 0.3, 31), + T = c(-0.05, -0.4, 45), W = c(0.81, -3.4, 130), + Y = c(0.26, -2.3, 107), V = c(1.08, -1.5, 43) +) # Use 4 properties in the AAindex database, and 3 cutomized properties extractMoreauBroto( - x, customprops = myprops, + x, + customprops = myprops, props = c( "CIDH920105", "BHAR880101", "CHAM820101", "CHAM820102", - "MyProp1", "MyProp2", "MyProp3") + "MyProp1", "MyProp2", "MyProp3" + ) )#> CIDH920105.lag1 CIDH920105.lag2 CIDH920105.lag3 CIDH920105.lag4 #> 0.0815732133 -0.0160648174 -0.0159829904 -0.0257390382 #> CIDH920105.lag5 CIDH920105.lag6 CIDH920105.lag7 CIDH920105.lag8 diff --git a/docs/reference/extractPAAC.html b/docs/reference/extractPAAC.html index 94f6d1c..3114d83 100644 --- a/docs/reference/extractPAAC.html +++ b/docs/reference/extractPAAC.html @@ -66,7 +66,7 @@@@ -204,7 +204,7 @@See a
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +@@ -161,21 +161,21 @@#> Xc1.A Xc1.R Xc1.N Xc1.D Xc1.C #> 9.07025432 10.07806035 5.54293319 7.30659376 9.57415734 #> Xc1.E Xc1.Q Xc1.G Xc1.H Xc1.I @@ -225,28 +225,31 @@Examp #> 0.02476967 0.02342389 0.02431684 0.02610300 0.02626722 #> Xc2.lambda.26 Xc2.lambda.27 Xc2.lambda.28 Xc2.lambda.29 Xc2.lambda.30 #> 0.02457082 0.02343049 0.02588823 0.02490463 0.02451951
-myprops = data.frame( +myprops <- data.frame( AccNo = c("MyProp1", "MyProp2", "MyProp3"), - A = c(0.62, -0.5, 15), R = c(-2.53, 3, 101), - N = c(-0.78, 0.2, 58), D = c(-0.9, 3, 59), - C = c(0.29, -1, 47), E = c(-0.74, 3, 73), - Q = c(-0.85, 0.2, 72), G = c(0.48, 0, 1), - H = c(-0.4, -0.5, 82), I = c(1.38, -1.8, 57), - L = c(1.06, -1.8, 57), K = c(-1.5, 3, 73), - M = c(0.64, -1.3, 75), F = c(1.19, -2.5, 91), - P = c(0.12, 0, 42), S = c(-0.18, 0.3, 31), - T = c(-0.05, -0.4, 45), W = c(0.81, -3.4, 130), - Y = c(0.26, -2.3, 107), V = c(1.08, -1.5, 43)) + A = c(0.62, -0.5, 15), R = c(-2.53, 3, 101), + N = c(-0.78, 0.2, 58), D = c(-0.9, 3, 59), + C = c(0.29, -1, 47), E = c(-0.74, 3, 73), + Q = c(-0.85, 0.2, 72), G = c(0.48, 0, 1), + H = c(-0.4, -0.5, 82), I = c(1.38, -1.8, 57), + L = c(1.06, -1.8, 57), K = c(-1.5, 3, 73), + M = c(0.64, -1.3, 75), F = c(1.19, -2.5, 91), + P = c(0.12, 0, 42), S = c(-0.18, 0.3, 31), + T = c(-0.05, -0.4, 45), W = c(0.81, -3.4, 130), + Y = c(0.26, -2.3, 107), V = c(1.08, -1.5, 43) +) # use 3 default properties, 4 properties from the # AAindex database, and 3 cutomized properties extractPAAC( - x, customprops = myprops, + x, + customprops = myprops, props = c( "Hydrophobicity", "Hydrophilicity", "SideChainMass", "CIDH920105", "BHAR880101", "CHAM820101", "CHAM820102", - "MyProp1", "MyProp2", "MyProp3") + "MyProp1", "MyProp2", "MyProp3" + ) )#> Xc1.A Xc1.R Xc1.N Xc1.D Xc1.C #> 9.12536927 10.13929919 5.57661456 7.35099191 9.63233423 #> Xc1.E Xc1.Q Xc1.G Xc1.H Xc1.I diff --git a/docs/reference/extractPSSM.html b/docs/reference/extractPSSM.html index 28e9200..3fd260a 100644 --- a/docs/reference/extractPSSM.html +++ b/docs/reference/extractPSSM.html @@ -66,7 +66,7 @@@@ -318,21 +318,21 @@See a
Examples
if (Sys.which("makeblastdb") == "" | Sys.which("psiblast") == "") { - cat("Cannot find makeblastdb or psiblast. Please install NCBI Blast+ first") - } else { - - x = readFASTA(system.file( - "protseq/P00750.fasta", package = "protr"))[[1]] - dbpath = tempfile("tempdb", fileext = ".fasta") + x <- readFASTA(system.file( + "protseq/P00750.fasta", + package = "protr" + ))[[1]] + dbpath <- tempfile("tempdb", fileext = ".fasta") invisible(file.copy(from = system.file( - "protseq/Plasminogen.fasta", package = "protr"), to = dbpath)) - - pssmmat = extractPSSM(seq = x, database.path = dbpath) + "protseq/Plasminogen.fasta", + package = "protr" + ), to = dbpath)) - dim(pssmmat) # 20 x 562 (P00750: length 562, 20 Amino Acids) + pssmmat <- extractPSSM(seq = x, database.path = dbpath) + dim(pssmmat) # 20 x 562 (P00750: length 562, 20 Amino Acids) }#> Cannot find makeblastdb or psiblast. Please install NCBI Blast+ firstSee a
Examples
if (Sys.which("makeblastdb") == "" | Sys.which("psiblast") == "") { - cat("Cannot find makeblastdb or psiblast. Please install NCBI Blast+") - } else { - - x = readFASTA(system.file( - "protseq/P00750.fasta", package = "protr"))[[1]] - dbpath = tempfile("tempdb", fileext = ".fasta") + x <- readFASTA(system.file( + "protseq/P00750.fasta", + package = "protr" + ))[[1]] + dbpath <- tempfile("tempdb", fileext = ".fasta") invisible(file.copy(from = system.file( - "protseq/Plasminogen.fasta", package = "protr"), to = dbpath)) + "protseq/Plasminogen.fasta", + package = "protr" + ), to = dbpath)) - pssmmat = extractPSSM(seq = x, database.path = dbpath) - pssmacc = extractPSSMAcc(pssmmat, lag = 3) + pssmmat <- extractPSSM(seq = x, database.path = dbpath) + pssmacc <- extractPSSMAcc(pssmmat, lag = 3) tail(pssmacc) - }#> Cannot find makeblastdb or psiblast. Please install NCBI Blast+See a
Examples
if (Sys.which("makeblastdb") == "" | Sys.which("psiblast") == "") { - cat("Cannot find makeblastdb or psiblast. Please install NCBI Blast+") - } else { - - x = readFASTA(system.file( - "protseq/P00750.fasta", package = "protr"))[[1]] - dbpath = tempfile("tempdb", fileext = ".fasta") + x <- readFASTA(system.file( + "protseq/P00750.fasta", + package = "protr" + ))[[1]] + dbpath <- tempfile("tempdb", fileext = ".fasta") invisible(file.copy(from = system.file( - "protseq/Plasminogen.fasta", package = "protr"), to = dbpath)) + "protseq/Plasminogen.fasta", + package = "protr" + ), to = dbpath)) - pssmmat = extractPSSM(seq = x, database.path = dbpath) - pssmfeature = extractPSSMFeature(pssmmat) + pssmmat <- extractPSSM(seq = x, database.path = dbpath) + pssmfeature <- extractPSSMFeature(pssmmat) head(pssmfeature) - }#> Cannot find makeblastdb or psiblast. Please install NCBI Blast+Value
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] -fp = extractProtFP( - x, index = c(160:165, 258:296), pc = 5, lag = 7, silent = FALSE)#> Summary of the first 5 principal components: +x <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +fp <- extractProtFP(x, index = c(160:165, 258:296), pc = 5, lag = 7, silent = FALSE)#> Summary of the first 5 principal components: #> PC1 PC2 PC3 PC4 PC5 #> Standard deviation 4.51689 2.786022 2.27000 1.757295 1.419412 #> Proportion of Variance 0.45338 0.172490 0.11451 0.068620 0.044770 diff --git a/docs/reference/extractProtFPGap.html b/docs/reference/extractProtFPGap.html index f2ea625..bb4dd11 100644 --- a/docs/reference/extractProtFPGap.html +++ b/docs/reference/extractProtFPGap.html @@ -70,7 +70,7 @@@@ -176,9 +176,8 @@Value
Examples
# amino acid sequence with gaps -x = readFASTA(system.file("protseq/align.fasta", package = "protr"))$`IXI_235` -fp = extractProtFPGap( - x, index = c(160:165, 258:296), pc = 5, lag = 7, silent = FALSE)#> Summary of the first 5 principal components: +x <- readFASTA(system.file("protseq/align.fasta", package = "protr"))$`IXI_235` +fp <- extractProtFPGap(x, index = c(160:165, 258:296), pc = 5, lag = 7, silent = FALSE)#> Summary of the first 5 principal components: #> PC1 PC2 PC3 PC4 PC5 #> Standard deviation 4.398253 2.620509 2.267688 1.756102 1.52816 #> Proportion of Variance 0.429880 0.152600 0.114280 0.068530 0.05189 diff --git a/docs/reference/extractQSO.html b/docs/reference/extractQSO.html index 7d61a19..74850a3 100644 --- a/docs/reference/extractQSO.html +++ b/docs/reference/extractQSO.html @@ -66,7 +66,7 @@@@ -164,7 +164,7 @@See a
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +#> Schneider.Xr.A Schneider.Xr.R Schneider.Xr.N Schneider.Xr.D Schneider.Xr.C #> 6.096218e-02 6.773576e-02 3.725467e-02 4.910842e-02 6.434897e-02 #> Schneider.Xr.E Schneider.Xr.Q Schneider.Xr.G Schneider.Xr.H Schneider.Xr.I diff --git a/docs/reference/extractSOCN.html b/docs/reference/extractSOCN.html index 3cfffc7..a669eef 100644 --- a/docs/reference/extractSOCN.html +++ b/docs/reference/extractSOCN.html @@ -66,7 +66,7 @@@@ -160,7 +160,7 @@See a
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +#> Schneider.lag1 Schneider.lag2 Schneider.lag3 Schneider.lag4 Schneider.lag5 #> 204.2036 199.8708 206.8102 197.4828 193.3366 #> Schneider.lag6 Schneider.lag7 Schneider.lag8 Schneider.lag9 Schneider.lag10 diff --git a/docs/reference/extractScales.html b/docs/reference/extractScales.html index e2def50..a6fff2a 100644 --- a/docs/reference/extractScales.html +++ b/docs/reference/extractScales.html @@ -71,7 +71,7 @@@@ -181,10 +181,10 @@See a
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +@@ -417,7 +417,7 @@x <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] data(AAindex) -AAidxmat = t(na.omit(as.matrix(AAindex[, 7:26]))) -scales = extractScales(x, propmat = AAidxmat, pc = 5, lag = 7, silent = FALSE)#> Summary of the first 5 principal components: +AAidxmat <- t(na.omit(as.matrix(AAindex[, 7:26]))) +scales <- extractScales(x, propmat = AAidxmat, pc = 5, lag = 7, silent = FALSE)#> Summary of the first 5 principal components: #> PC1 PC2 PC3 PC4 PC5 #> Standard deviation 13.71695 8.924017 7.698803 6.110576 5.413655 #> Proportion of Variance 0.35434 0.149980 0.111620 0.070320 0.055190 diff --git a/docs/reference/extractScalesGap.html b/docs/reference/extractScalesGap.html index 1744e53..a39a34a 100644 --- a/docs/reference/extractScalesGap.html +++ b/docs/reference/extractScalesGap.html @@ -73,7 +73,7 @@@@ -185,10 +185,10 @@See a
Examples
# amino acid sequence with gaps -x = readFASTA(system.file("protseq/align.fasta", package = "protr"))$`IXI_235` +x <- readFASTA(system.file("protseq/align.fasta", package = "protr"))$`IXI_235` data(AAindex) -AAidxmat = t(na.omit(as.matrix(AAindex[, 7:26]))) -scales = extractScalesGap(x, propmat = AAidxmat, pc = 5, lag = 7, silent = FALSE)#> Summary of the first 5 principal components: +AAidxmat <- t(na.omit(as.matrix(AAindex[, 7:26]))) +scales <- extractScalesGap(x, propmat = AAidxmat, pc = 5, lag = 7, silent = FALSE)#> Summary of the first 5 principal components: #> PC1 PC2 PC3 PC4 PC5 #> Standard deviation 12.38381 10.73268 7.742507 6.802462 5.22316 #> Proportion of Variance 0.28881 0.21693 0.112890 0.087140 0.05138 diff --git a/docs/reference/extractTC.html b/docs/reference/extractTC.html index 3852a0a..9a14347 100644 --- a/docs/reference/extractTC.html +++ b/docs/reference/extractTC.html @@ -65,7 +65,7 @@@@ -147,7 +147,7 @@See a
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +diff --git a/docs/reference/index.html b/docs/reference/index.html index 8ec4b95..ada0b8b 100644 --- a/docs/reference/index.html +++ b/docs/reference/index.html @@ -62,7 +62,7 @@#> AAA RAA NAA DAA CAA EAA #> 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 #> QAA GAA HAA IAA LAA KAA diff --git a/docs/reference/getUniProt.html b/docs/reference/getUniProt.html index 59ca019..a7a692f 100644 --- a/docs/reference/getUniProt.html +++ b/docs/reference/getUniProt.html @@ -65,7 +65,7 @@@@ -142,7 +142,7 @@Examp
# NOT RUN { # Network latency may slow down this example # Only test this when your connection is fast enough -ids = c("P00750", "P00751", "P00752") +ids <- c("P00750", "P00751", "P00752") getUniProt(ids) # }protr-package -
+ Generating Various Numerical Representation Schemes for Protein Sequences
protr: Generating Various Numerical Representation Schemes for Protein Sequences
diff --git a/docs/reference/parGOSim.html b/docs/reference/parGOSim.html index 5662a24..d8b8519 100644 --- a/docs/reference/parGOSim.html +++ b/docs/reference/parGOSim.html @@ -68,7 +68,7 @@ diff --git a/docs/reference/parSeqSim.html b/docs/reference/parSeqSim.html index 89d71b1..9896069 100644 --- a/docs/reference/parSeqSim.html +++ b/docs/reference/parSeqSim.html @@ -68,7 +68,7 @@ diff --git a/docs/reference/parSeqSimDisk.html b/docs/reference/parSeqSimDisk.html index 4a33e1f..2d4e1f0 100644 --- a/docs/reference/parSeqSimDisk.html +++ b/docs/reference/parSeqSimDisk.html @@ -71,7 +71,7 @@ diff --git a/docs/reference/protcheck.html b/docs/reference/protcheck.html index 3b37d51..03da867 100644 --- a/docs/reference/protcheck.html +++ b/docs/reference/protcheck.html @@ -66,7 +66,7 @@ @@ -138,8 +138,8 @@Value
Examples
-+#> [1] TRUE#> [1] FALSE@@ -121,46 +112,31 @@#> [1] TRUE#> [1] FALSEdiff --git a/docs/reference/protseg.html b/docs/reference/protseg.html index 0300ef8..8fc5f05 100644 --- a/docs/reference/protseg.html +++ b/docs/reference/protseg.html @@ -65,7 +65,7 @@ @@ -152,7 +152,7 @@@@ -168,11 +144,17 @@-Generating Various Numerical Representation Schemes for Protein Sequences
+protr: Generating Various Numerical Representation Schemes for Protein Sequences
Source:R/protr-package.R
protr-package.Rd
--The protr package is a comprehensive toolkit for generating various -numerical representation schemes of protein sequence. -The descriptors are extensively utilized in bioinformatics and -chemogenomics research. The commonly used descriptors include -amino acid composition, autocorrelation, CTD, conjoint traid, -quasi-sequence order, pseudo amino acid composition, and -profile-based descriptors derived by Position-Specific Scoring Matrix (PSSM). -The descriptors for proteochemometric (PCM) modeling include the scales-based -descriptors derived by principal components analysis, factor analysis, -multidimensional scaling, amino acid properties (AAindex), 20+ classes -of 2D and 3D molecular descriptors (Topological, WHIM, VHSE, etc.), -and BLOSUM/PAM matrix-derived descriptors. The protr package also -integrates the function of parallelized similarity computation derived -by pairwise protein sequence alignment and Gene Ontology (GO) semantic -similarity measures.
+Comprehensive toolkit for generating various numerical + features of protein sequences described in Xiao et al. (2015) + <DOI:10.1093/bioinformatics/btv042>. For full functionality, + the software 'ncbi-blast+' is needed, see + <https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download> + for more information.
Note
- -The package vignette can be opened with
-vignette("protr")
.The web server for this package,
-ProtrWeb
can be accessed from: -http://protr.org.Bug reports and feature requests should be sent to -https://github.com/nanxstats/protr/issues.
- -References
- -Xiao, N., Cao, D.-S., Zhu, M.-F., and Xu, Q.-S. (2015). -protr/ProtrWeb: R package and web server for generating various -numerical representation schemes of protein sequences. -Bioinformatics 31 (11), 1857--1859.
+See also
+ +Useful links:
R
Contents
+Author
+Maintainer: Nan Xiao me@nanx.me (0000-0002-0250-5673)
+Authors:
+
+ +- +
Qing-Song Xu qsxu@csu.edu.cn
- +
Dong-Sheng Cao
Value
Examples
-x = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] +@@ -153,7 +153,7 @@x <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]] protseg(x, aa = "R", k = 5)#> $`6` #> [1] "MDAMKRGLCCV" #> diff --git a/docs/reference/readFASTA.html b/docs/reference/readFASTA.html index e9c26d6..71594d7 100644 --- a/docs/reference/readFASTA.html +++ b/docs/reference/readFASTA.html @@ -65,7 +65,7 @@@@ -171,7 +171,7 @@See a
Examples
-+See a
Examples
-+diff --git a/docs/reference/twoGOSim.html b/docs/reference/twoGOSim.html index 4911dae..4d4ef21 100644 --- a/docs/reference/twoGOSim.html +++ b/docs/reference/twoGOSim.html @@ -66,7 +66,7 @@ diff --git a/docs/reference/twoSeqSim.html b/docs/reference/twoSeqSim.html index 29e64e4..10a90ba 100644 --- a/docs/reference/twoSeqSim.html +++ b/docs/reference/twoSeqSim.html @@ -65,7 +65,7 @@