Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest Sciome update: likelihood bug fix and additional testing #36

Merged
merged 36 commits into from
Nov 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
095c5cd
Created the PrestoGP repository
ericbair-sciome Jul 17, 2023
e71fea9
New file structure for generalized locs
ericbair-sciome Jul 18, 2023
8a10471
Modified package to allow for generalized locs
ericbair-sciome Jul 21, 2023
5eafef7
Added documentation file for generalized locs
ericbair-sciome Jul 21, 2023
3da5580
Fixed a bug in the code to generate initial values of theta
ericbair-sciome Jul 21, 2023
fca8a3e
Parallel support for glmnet and minor bug fixes
ericbair-sciome Jul 28, 2023
4e26172
Fixed a bug where prestogp_fit crashed when beta.hat was specified
ericbair-sciome Aug 1, 2023
57fcd00
Changed the initial beta estimates to decrease computation time
ericbair-sciome Aug 4, 2023
9b42a60
Added multivariate prediction functionality
ericbair-sciome Oct 24, 2023
1d6da8a
Removed temporary files
ericbair-sciome Oct 24, 2023
7e4d279
Allow for generalized locs, bug fixes and optimizations
sciome-bot Nov 1, 2023
140e623
foldid parameter for cv.glmnet and minor bug fixes
ericbair-sciome Nov 1, 2023
544a290
Pull request #2: foldid parameter for cv.glmnet and minor bug fixes
ericbair-sciome Nov 1, 2023
b8d6f84
Update version number. foldid parameter for cv.glmnet and minor bugfixes
shail-choksi Nov 1, 2023
747f280
Fix merge conflict remnant
shail-choksi Nov 1, 2023
8c1ceda
Fix merge conflict remnant
sciome-bot Nov 1, 2023
2ade51c
Pull request #4: Fix merge conflict remnant in README
shail-choksi Nov 1, 2023
c91209d
Merge branch 'main-sciome' of github-sciome:Spatiotemporal-Exposures-…
sciome-bot Nov 1, 2023
08c3d1a
Merge branch 'main' of github.com:Spatiotemporal-Exposures-and-Toxico…
shail-choksi Nov 2, 2023
a3863f8
Merge branch 'main' of github.com:Spatiotemporal-Exposures-and-Toxico…
sciome-bot Nov 2, 2023
b02cc47
Merge branch 'to-git' of ssh://192.168.167.103:7999/stat/prestogp int…
shail-choksi Nov 2, 2023
19094a4
Merge branch 'main-sciome' of github-sciome:Spatiotemporal-Exposures-…
sciome-bot Nov 2, 2023
0d13ba9
Clean up merge conflict remnants
shail-choksi Nov 2, 2023
57a4177
Clean up merge conflict remnants
sciome-bot Nov 2, 2023
514c6b8
Merge branch 'master' of ssh://192.168.167.103:7999/stat/prestogp int…
sciome-bot Nov 2, 2023
71469f1
Merge branch 'main-sciome' of github-sciome:Spatiotemporal-Exposures-…
sciome-bot Nov 2, 2023
62f110f
Added new testthat tests
ericbair-sciome Nov 7, 2023
a6d1d1b
Added new testing files
ericbair-sciome Nov 7, 2023
31f7be3
Pull request #8: Testing
ericbair-sciome Nov 7, 2023
615e30a
Merge branch 'main-sciome' of github-sciome:Spatiotemporal-Exposures-…
sciome-bot Nov 7, 2023
9094abe
Merge branch 'main' of sciome-bot-git:Spatiotemporal-Exposures-and-To…
shail-choksi Nov 14, 2023
b613143
Merge branch 'main' of sciome-bot-git:Spatiotemporal-Exposures-and-To…
sciome-bot Nov 14, 2023
79011bd
Merge branch 'main-sciome' of sciome-bot-git:Spatiotemporal-Exposures…
sciome-bot Nov 14, 2023
84fd72f
Pull request #9: Testing
sciome-bot Nov 14, 2023
fb060ff
Testing
ericbair-sciome Nov 14, 2023
e7fd7ea
Merge branch 'main-sciome' of github-sciome:Spatiotemporal-Exposures-…
sciome-bot Nov 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
^examples$
^Makefile$
^R/scratch.R$
^.*\.Rproj$
^\.Rproj\.user$
^doc$
^Meta$
15 changes: 14 additions & 1 deletion .gitignore
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ manuscript/
**/._.DS_Store
../._create-prediction-grid.Rmd
../._map-nc-land-covariates.Rmd

# Hidden folders
.DS_Store/
._.DS_Store/
Expand Down Expand Up @@ -73,4 +74,16 @@ code/mitchell_tests/
._*

# Insang's negative value exploration script
tools/negative_exploration
tools/negative_exploration

# data files
data

# Automatic Emacs backup files
**/*~

# Rcpp compiled binary files
src/*.so
src/*.o
/doc/
/Meta/
60 changes: 37 additions & 23 deletions DESCRIPTION
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
Package: PrestoGP
Type: Package
Title: Penalized Regression for Spatio-Temporal Outcomes via Gaussian
Processes
Version: 0.2.0.9015
Title: Penalized Regression for Spatio-Temporal Outcomes via Gaussian Processes
Version: 0.2.0.9018
Authors@R: c(
person(given = "Eric",
family = "Bair",
Expand All @@ -21,26 +20,41 @@ Authors@R: c(
family = "Messier",
role = "aut"))
Description: Simultaneous variable seletion and estimation of LUR models with spatiotemporally correlated errors that is scalable for big data.
Depends: R (>= 3.5.0)
LinkingTo: Rcpp, RcppArmadillo
Imports: GPvecchia, Matrix, fields, ncvreg, readxl, scoringRules, MASS,
aod, knitr, dplyr, glmnet, rmarkdown, markdown, gtools, geoR,
doParallel
Depends:
R (>= 3.5.0)
LinkingTo:
Rcpp, RcppArmadillo
Imports:
GPvecchia,
Matrix,
fields,
ncvreg,
readxl,
scoringRules,
MASS,
aod,
knitr,
dplyr,
glmnet,
rmarkdown,
markdown,
gtools,
geoR,
doParallel
License: GPL-3
Encoding: UTF-8
VignetteBuilder: knitr
VignetteBuilder: knitr
RoxygenNote: 7.2.3
Collate: 'Log_Likelihood.R' 'PrestoGP_CreateU_Multivariate.R'
'PrestoGP_Model.R' 'PrestoGP_Vecchia_Spatiotemporal.R'
'PrestoGP_Full.R' 'PrestoGP_Vecchia_Spatial.R'
'PrestoGP_Full_Spatial.R' 'PrestoGP_Multivariate_Vecchia.R'
'PrestoGP_Util_Functions.R' 'RcppExports.R' 'Visualization.R'
'package.R'
NeedsCompilation: yes
Packaged: 2023-10-23 09:07:48 UTC; root
Author: Eric Bair [aut, cre],
Brian Kidd [aut],
Eric Wimberley [aut],
Deepak Mav [aut],
Kyle Messier [aut]
Maintainer: Eric Bair <eric.bair@sciome.com>
Collate:
'Log_Likelihood.R'
'PrestoGP_CreateU_Multivariate.R'
'PrestoGP_Model.R'
'PrestoGP_Vecchia_Spatiotemporal.R'
'PrestoGP_Full.R'
'PrestoGP_Vecchia_Spatial.R'
'PrestoGP_Full_Spatial.R'
'PrestoGP_Multivariate_Vecchia.R'
'PrestoGP_Util_Functions.R'
'RcppExports.R'
'Visualization.R'
'package.R'
21 changes: 21 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
PKGNAME := $(shell sed -n "s/Package: *\([^ ]*\)/\1/p" DESCRIPTION)
PKGVERS := $(shell sed -n "s/Version: *\([^ ]*\)/\1/p" DESCRIPTION)
PKGSRC := $(shell basename `pwd`)

all: clean build

clean:
echo "Clean"

build: doc
echo $(PKGNAME) $(PKGVERS)
cd ..;\
R CMD build $(PKGSRC)

doc:
R --slave -e 'library(roxygen2); roxygenise()'
R --slave -e 'library(devtools); build_manual()'
#-git add --all man/*.Rd

check:
R --slave -e 'library(devtools); check(error_on="error")'
2 changes: 1 addition & 1 deletion NAMESPACE
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ export(SpatialModel)
export(SpatiotemporalFullModel)
export(SpatiotemporalModel)
export(createUMultivariate)
export(negloglik.full)
export(negloglik_full_ST)
export(negloglik_full_spatial)
export(negloglik_vecchia)
export(negloglik_vecchia_ST)
export(vecchia_Mprediction)
Expand Down
38 changes: 24 additions & 14 deletions R/Log_Likelihood.R
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ negloglik_full_ST=function(logparms,locs,y,N){
-mvtnorm::dmvnorm(y, rep(0,N), cov.mat, log=TRUE)
}

#' negloglik_full_spatial
#' negloglik.full
#'
#' Spatial Full Kriging negative loglikelihood
#'
Expand All @@ -74,12 +74,15 @@ negloglik_full_ST=function(logparms,locs,y,N){
#'
#' @examples
#' @noRd
negloglik_full_spatial=function(logparms,locs,y,N){
parms = exp(logparms)
d <- fields::rdist(locs)
cov.mat=parms[1]*fields::Exponential(d,range=parms[2])+
parms[3]*diag(N)
-mvtnorm::dmvnorm(y,rep(0,N),cov.mat,log=TRUE)
negloglik.full=function(logparams,locs,y){
params <- c(exp(logparams[1:2]),
gtools::inv.logit(logparams[3], 0, 2.5),
exp(logparams[4]))
d <- fields::rdist(locs)
N <- nrow(d)
cov.mat=params[1]*fields::Matern(d,range=params[2], smoothness=params[3])+
params[4]*diag(N)
return(-1*mvtnorm::dmvnorm(y,rep(0,N),cov.mat,log=TRUE))
}


Expand Down Expand Up @@ -157,7 +160,7 @@ mvnegloglik_ST =function(logparams,vecchia.approx,y,param.seq,P,scaling,nscale){
for (j in 1:nscale) {
locs.scaled[vecchia.approx$ondx==i,scaling==j] <-
locs.scaled[vecchia.approx$ondx==i,scaling==j] /
params[param.seq[2,1]+nscale*(i-1)+j-1]
params[param.seq[2,1]+nscale*(i-1)+j-1]
}
}
vecchia.approx$locsord <- locs.scaled
Expand All @@ -171,7 +174,7 @@ mvnegloglik_ST =function(logparams,vecchia.approx,y,param.seq,P,scaling,nscale){
##############################################################################
### Full Multivariate Matern Negative Loglikelihood Function ###########

mvnegloglik.full=function(logparams,locs,y,param.seq,P){
mvnegloglik.full=function(logparams,locs,y,param.seq){
# Input-
# logparams: A numeric vector of length (4*P)+(4*choose(P,2)).
# To construct these parameters we unlist a list of the 7 covariance
Expand All @@ -191,6 +194,7 @@ mvnegloglik.full=function(logparams,locs,y,param.seq,P){

#P <- length(y)
# transform the postively constrained parameters from log-space to normal-space
P <- length(locs)
params <- c(exp(logparams[1:param.seq[2,2]]),
gtools::inv.logit(logparams[param.seq[3,1]:param.seq[3,2]], 0, 2.5),
exp(logparams[param.seq[4,1]:param.seq[4,2]]))
Expand Down Expand Up @@ -279,12 +283,17 @@ cat.covariances <- function(locs.list,sig2,range,smoothness,nugget){
for (iter in 1:nrow(combs)){
i <- combs[iter,1]
j <- combs[iter,2]
d <- fields::rdist.earth(locs.list[[i]],locs.list[[j]],miles = FALSE)

# d <- fields::rdist.earth(locs.list[[i]],locs.list[[j]],miles = FALSE)
d <- fields::rdist(locs.list[[i]],locs.list[[j]])
# Calculate the covariance matrix - if/then based on its location in the super-matrix
N <- nrow(d)
cov.mat.ij <- sig2[i,j]*geoR::matern(d,phi = range[i,j],kappa = smoothness[i,j])+
nugget[i,j]*diag(N)
N <- nrow(d)
if (i==j){ # To accomodate varying size outcomes- the nugget is not included on cross-covariances
cov.mat.ij <- sig2[i,j]*geoR::matern(d,phi = range[i,j],kappa = smoothness[i,j])+
nugget[i,j]*diag(N)
}else{
cov.mat.ij <- sig2[i,j]*geoR::matern(d,phi = range[i,j],kappa = smoothness[i,j])
}


if (combs[iter,1]==1){
row.idx <- 1:dims[1]
Expand All @@ -307,6 +316,7 @@ cat.covariances <- function(locs.list,sig2,range,smoothness,nugget){

}


return(cov.mat.out)
}

Expand Down
5 changes: 5 additions & 0 deletions R/PrestoGP_CreateU_Multivariate.R
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,11 @@ knn_indices <- function(ordered_locs, query, n_neighbors, dist_func){
#'
#' @return A list containing two matrices, each with one row per location: an indices matrix with the indices of nearest neighbors for each location, and a distance matrix with the associated distances
sparseNN <- function(ordered_locs, n_neighbors, dist_func, ordered_locs_pred=NULL){
ee <- min(apply(ordered_locs, 2, stats::sd))
n <- nrow(ordered_locs)
ordered_locs <- ordered_locs + matrix(ee*1e-04*
stats::rnorm(n*ncol(ordered_locs)),
n, ncol(ordered_locs))
indices_matrix = matrix(data=NA, nrow=nrow(ordered_locs), ncol=n_neighbors)
distances_matrix = matrix(data=NA, nrow=nrow(ordered_locs), ncol=n_neighbors)
for(row in 1:n_neighbors){
Expand Down
20 changes: 11 additions & 9 deletions R/PrestoGP_Model.R
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -60,14 +60,14 @@ setMethod("initialize", "PrestoGPModel", function(.Object, ...) {
})

setGeneric("show_theta", function(object, Y_names)standardGeneric("show_theta") )
setGeneric("prestogp_fit", function(model, Y, X, locs, scaling=NULL, apanasovich=FALSE, covparams = NULL, beta.hat = NULL, tol = 0.999999, max_iters = 100, verbose=FALSE, optim.method="Nelder-Mead", optim.control=list(trace=0, reltol=1e-4, maxit=5000), parallel=FALSE) standardGeneric("prestogp_fit") )
setGeneric("prestogp_fit", function(model, Y, X, locs, scaling=NULL, apanasovich=FALSE, covparams = NULL, beta.hat = NULL, tol = 0.999999, max_iters = 100, verbose=FALSE, optim.method="Nelder-Mead", optim.control=list(trace=0, reltol=1e-3, maxit=5000), parallel=FALSE, foldid=NULL) standardGeneric("prestogp_fit") )
setGeneric("prestogp_predict", function(model, X="matrix", locs="matrix", m="numeric", ordering.pred=c("obspred", "general"), pred.cond=c("independent", "general"), return.values=c("mean", "meanvar")) standardGeneric("prestogp_predict") )
setGeneric("calc_covparams", function(model, locs, Y) standardGeneric("calc_covparams") )
setGeneric("specify", function(model, locs, m)standardGeneric("specify") )
setGeneric("compute_residuals", function(model, Y, Y.hat) standardGeneric("compute_residuals") )
setGeneric("transform_data", function(model, Y, X) standardGeneric("transform_data") )
setGeneric("estimate_theta", function(model, locs, optim.control, method) standardGeneric("estimate_theta") )
setGeneric("estimate_betas", function(model, parallel) standardGeneric("estimate_betas") )
setGeneric("estimate_betas", function(model, parallel, foldid) standardGeneric("estimate_betas") )
setGeneric("compute_error", function(model, y, X) standardGeneric("compute_error") )
setGeneric("scale_locs", function(model, locs) standardGeneric("scale_locs") )
setGeneric("theta_names", function(model) standardGeneric("theta_names") )
Expand Down Expand Up @@ -181,7 +181,7 @@ setMethod("show_theta", "PrestoGPModel",
#' model <- prestogp_fit(model, logNO2, X, locs)
#' ...
setMethod("prestogp_fit", "PrestoGPModel",
function(model, Y, X, locs, scaling=NULL, apanasovich=FALSE, covparams = NULL, beta.hat = NULL, tol = 0.999999, max_iters=100, verbose=FALSE, optim.method="Nelder-Mead", optim.control=list(trace=0, reltol=1e-4, maxit=5000), parallel=FALSE) {
function(model, Y, X, locs, scaling=NULL, apanasovich=FALSE, covparams = NULL, beta.hat = NULL, tol = 0.999999, max_iters=100, verbose=FALSE, optim.method="Nelder-Mead", optim.control=list(trace=0, reltol=1e-3, maxit=5000), parallel=FALSE, foldid=NULL) {
#parameter validation
#TODO: This method should check for input errors in the
#multivariate case (where Y, X, and locs are lists)
Expand All @@ -193,7 +193,7 @@ setMethod("prestogp_fit", "PrestoGPModel",
if(!is.double(tol)){ stop("The tol parameter must be floating point number.") }
if (is.matrix(Y)) {
if(nrow(Y) != nrow(X)){ stop("Y must have the same number of rows as X.") }
if(ncol(Y) < 1){ stop("Y must have at least 1 column.") }
if(ncol(Y) != 1){ stop("Y must have only 1 column.") }
if(nrow(Y) != nrow(locs)){ stop("Y must have the same number of rows as locs.") }
}
if (is.null(scaling)) {
Expand Down Expand Up @@ -265,7 +265,9 @@ setMethod("prestogp_fit", "PrestoGPModel",
model <- specify(model, locs, m)

if (is.null(beta.hat)) {
beta0.glmnet <- cv.glmnet(model@X_train, model@Y_train)
beta0.glmnet <- cv.glmnet(model@X_train, model@Y_train,
parallel=parallel,
foldid=foldid)
beta.hat <- as.matrix(predict(beta0.glmnet,
type="coefficients",
s=beta0.glmnet$lambda.1se))
Expand All @@ -288,7 +290,7 @@ setMethod("prestogp_fit", "PrestoGPModel",
model <- specify(model, locs, m)
}
model <- transform_data(model, model@Y_train, model@X_train)
model <- estimate_betas(model, parallel)
model <- estimate_betas(model, parallel, foldid)
min.error <- compute_error(model)
### Check min-error against the previous error and tolerance
if(min.error<prev.error*tol) {
Expand Down Expand Up @@ -325,11 +327,11 @@ setMethod("prestogp_fit", "PrestoGPModel",
#' @param model the model to estimate coeffients for
#'
#' @return A model with updated coefficients
setMethod("estimate_betas", "PrestoGPModel", function(model, parallel) {
setMethod("estimate_betas", "PrestoGPModel", function(model, parallel, foldid) {
if(ncol(model@Y_train) > 1){
model@linear_model <- cv.glmnet(as.matrix(model@X_tilde), as.matrix(model@y_tilde), family="mgaussian", alpha = model@alpha, parallel=parallel)
model@linear_model <- cv.glmnet(as.matrix(model@X_tilde), as.matrix(model@y_tilde), family="mgaussian", alpha = model@alpha, parallel=parallel, foldid=foldid)
} else {
model@linear_model <- cv.glmnet(as.matrix(model@X_tilde), as.matrix(model@y_tilde), alpha = model@alpha, parallel=parallel)
model@linear_model <- cv.glmnet(as.matrix(model@X_tilde), as.matrix(model@y_tilde), alpha = model@alpha, parallel=parallel, foldid=foldid)
}
idmin <- which(model@linear_model$lambda == model@linear_model$lambda.min)
semin <- model@linear_model$cvm[idmin] + model@linear_model$cvsd[idmin]
Expand Down
Loading
Loading