Date: 1/25/2017
Authors: Joshua Mayer, Raziur Rahman, Souparno Ghosh, Randip Pal
Platform: R Version 3.3.2
Required packages: partykit, Formula, strucchange, matrixStats, coin
Maintainer: Joshua Mayer joshua.mayer@ttu.edu
Description: Sequential removal of insignificant features.
SMuRFS(formula, data, ntree = 500, mtry, alpha = 0.05, prop.test = .632, response.position)
Inputs
Formula: A object of class formula. This formula will give the inherent regression equation.
data: A object of class data frame. Names in the data frame must match the names in the formula. Missing data are removed.
ntree: An integer greater than or equal to 1. The number of trees grown for the SMuRFS algorithm. Default is 500.
mtry: An integer greater than or equal to 1. The number of variables sampled for each tree.
alpha: An number between 0 and 1. The significance level declared for feature removal.
prop.test: A number between 0 and 1. The size of the test set for the secondary test, as a proportion of the data. Default is 0.632.
response.position: The column of which the responses are located. It could be done automatically with the Formula package, but this breaks down in high dimensions.
The following is the function to run the Sequential Multi Response Feature Selection (SMURFS). The function selects a subset of features of size mtry and a bootstrap sample of size n , grows a tree from those features and that bootstrap sample using the conditional inference framework (Hothornet et al. , 2006), then selects the features that are significant at any node of the tree. Features that are not selected are tested on a test set that is a subset of the data. Features that fail the second test are removed from consideration. After ntree iterations the features that survive are the selected features. A list of survived covariates.library(MASS)
library(Matrix)
set.seed(100)
beta <- c(runif(50,1,3), rep(0,950))
sigma.y <- matrix(c(1,0.7,0.7,0.7,1,0.7,0.7,0.7,1), nrow = 3, byrow = F)
omega <- function(n)
{
my.mat <- matrix(0.7, n, n)
diag(my.mat) <- rep(1,n)
return(my.mat)
}
sigma.x <- bdiag(omega(50), diag(1,950))
set.seed(100)
xx <- mvrnorm(200, rep(0,1000), sigma.x)
means <- xx %*% beta
set.seed(100)
yy <- t(sapply(1:200, function(i) mvrnorm(n=1, mu = rep(means[i,],3), Sigma = sigma.y)))
dat <- as.data.frame(cbind(xx,yy))
set.seed(100)
var.select <- SMuRFS(formula = V1001 + V1002 + V1003 ~., data = dat, ntree = 500, mtry = 8,
alpha = 0.05, prop.test = .632, response.position = c(1001,1002,1003))
################################################################ ################################################################