Merge pull request #131 from TGuillerme/RAM.help

Merging pre 1.9 into master
TGuillerme · Nov 12, 2024 · b44bc36 · b44bc36
2 parents e5711ad + 7054bdd
commit b44bc36
Show file tree

Hide file tree

Showing 60 changed files with 1,848 additions and 651 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -8,7 +8,7 @@ Authors@R: c(person("Thomas", "Guillerme", role = c("aut", "cre", "cph"),
   			 )
 Maintainer: Thomas Guillerme <guillert@tcd.ie>
 Version: 1.8.11
-Date: 2024-06-06
+Date: 2024-10-25
 Description: A modular package for measuring disparity (multidimensional space occupancy). Disparity can be calculated from any matrix defining a multidimensional space. The package provides a set of implemented metrics to measure properties of the space and allows users to provide and test their own metrics. The package also provides functions for looking at disparity in a serial way (e.g. disparity through time) or per groups as well as visualising the results. Finally, this package provides several statistical tests for disparity analysis.
 Depends:
     R (>= 3.6.0),

diff --git a/NEWS.md b/NEWS.md
@@ -1,15 +1,34 @@
-dispRity v1.8.11 (2024-06-06) 
+dispRity v1.9 (2024-06-06)  *distant update*
 =========================
 
+
+<!-- Blurb: 
+ 
+ * `dispRity` has been not greatly optimised for using distance matrices: 1) it's now much faster thanks to the `dist.helper` new optional argument (storing the distance matrix in the cache) and 2) it now allows direct analyses of distance matrices in a dispRity pipeline.
+ * `boot.matrix` function has now been generalised to be able to bootstrap any elements of a matrix. Previously it only allowed to bootstrap elements (rows) but now can work on dimensions (columns) or both (distances).
+ * Redesigned `multi.ace` to be more modular and better handle both continuous and/or discrete characters. This is secretly a pre-release for a future version that will greatly improve pipelines with ancestral state estimations ;).
+ * Redesigned `dispRity.dtt` to be closely a 1 to 1 interface of `geiger::dtt` with default options.
+ * New utility functions (`set.root.time` to add root times to trees; `remove.dispRity` to cleanly remove parts of dispRity objects) and metrics (`count.neigbhours`).
+ * Loads of minor improvements and couple of bug fixes! Yay!
+
+ -->
+
 ### NEW FEATURES
  * Redesigned `multi.ace` to be more modular and handle both continuous and/or discrete characters. Changes include a **change in argument name** from `castor.options` to the generic `options.args` (the options can be provided the same way as before though); and a **change in default arguments** for `models` which can now be left missing (previously was `"ER"`) and applies `"ER"` and `"BM"` for respectively discrete and continuous characters by default.
+ * New design when using distance matrices: `dist.helper` now allows to save distance matrices in the cache, saving a lot of RAM and speeding up calculations. You can use the helper using `dispRity(..., dist.helper = my_distance_function)` or `dispRity(..., dist.helper = my_distance_matrix)`.
+ * *New dispRity, custom.subsets and chrono.subsets option*: these three functions can now use `dist.data = TRUE` to specify that the input data is a distance matrix (and handle it accordingly).
+ * *New bootstrap options*: you can now use `boot.by` to specify whether bootstrap the rows (previous behaviour), the columns or both (for distance matrices).
+
+ * Redesigned `dispRity.dtt` to be closely a 1 to 1 interface of `geiger::dtt` with default options.
+
  * *New utility function* `set.root.time` to add a root time to a tree (`"phylo"`), list of trees (`"multiPhylo"`) or `dispRity` object with trees.
  * *New utility function* `remove.dispRity` to cleanly remove specific parts of a `"dispRity"` object.
  * *New metric*: `count.neighbours` to count the number of neighbours for each elements within a certain radius (thanks to Rob MacDonald for the suggestion).
 
 ### MINOR IMPROVEMENTS
 
  * `custom.subsets` can now take a logical vector for the `group` argument.
+ * `custom.subsets` now recycles node names when using a tree to create clade groups.
  * `plot` functions doing scatter plot now centers them without changing the scale of both axes.
  * **changed default argument** for `tree.age`: the number of digits output by `tree.age` is now changed from 3 to 4 by default.
  * the random starting parameters in `reduce.space` are now drawn from the input data distribution which speeds up the function significantly.
@@ -23,6 +42,11 @@ dispRity v1.8.11 (2024-06-06)
  * `multi.ace` now correctly handles invariant characters when looking for NAs.
  * `dispRity` objects with a `$covar` component are not interpreted as bootstrapped by `boot.matrix` anymore.
 
+### DEPRECATED
+
+ * The `dimensions` argument from `boot.matrix` is now removed: it has been redundant with the `dimensions` argument in the `dispRity` since v0.3!
+
+
 dispRity v1.8 (2023-12-11) *dispRity.multi*
 =========================
 

diff --git a/R/MCMCglmm.subsets.R b/R/MCMCglmm.subsets.R
@@ -62,13 +62,19 @@ MCMCglmm.subsets <- function(data, posteriors, group, tree, rename.groups, set.l
         if(any(classifier)) {
             group_classifier <- data[,which(!numerics)[which(classifier)], drop = FALSE]
         }
+    } else {
+        cleaned_data <- data
+        group_classifier <- matrix(1, nrow = nrow(data), ncol = 1, dimnames = list(rownames(data)))
     }
 
     ## Checking the posteriors
     check.class(posteriors, "MCMCglmm")
 
     ## Check which dimensions where used
     dimensions <- match(MCMCglmm.traits(posteriors), colnames(cleaned_data))
+    if(all(is.na(dimensions))) {
+        stop.call(msg = "Could not match any column in the data with the posterior samples. Make sure the data column names are the same as the one used in the MCMCglmm.", call = "")
+    }
 
     ## Extracting the residuals and randoms
     posterior_levels <- MCMCglmm.levels(posteriors)

diff --git a/R/adonis.dispRity.R b/R/adonis.dispRity.R
@@ -72,6 +72,8 @@
 #' 
 #' 
 #' @author Thomas Guillerme
+#' @references Oksanen J, Simpson G, Blanchet F, Kindt R, Legendre P, Minchin P, O'Hara R, Solymos P, Stevens M, Szoecs E, Wagner H, Barbour M, Bedward M, Bolker B, Borcard D, Carvalho G, Chirico M, De Caceres M, Durand S, Evangelista H, FitzJohn R, Friendly M, Furneaux B, Hannigan G, Hill M, Lahti L, McGlinn D, Ouellette M, Ribeiro Cunha E, Smith T, Stier A, Ter Braak C, Weedon J (2024). vegan: Community Ecology Package_. R package version 2.6-8,
+
 # @export
 
 # source("sanitizing.R")
@@ -218,8 +220,8 @@ adonis.dispRity <- function(data, formula = matrix ~ group, method = "euclidean"
 
     ## Run adonis
     ## Modifying adonis2 to only check the parent environment (not the global one: matrix input here should be present in the environment
-    adonis2.modif <- vegan::adonis2
-    formals(adonis2.modif) <-c(formals(vegan::adonis2), "matrix_input" = NA)
+    adonis2.modif <- adonis2
+    formals(adonis2.modif) <-c(formals(adonis2), "matrix_input" = NA)
     body(adonis2.modif)[[5]] <- substitute(lhs <- matrix_input)
     adonis_out <- adonis2.modif(formula, predictors, method = method, matrix_input = matrix, ...)
     # adonis_out <- adonis2.modif(formula, predictors, method = method, matrix_input = matrix) ; warning("DEBUG adonis.dispRity")

diff --git a/R/boot.matrix.R b/R/boot.matrix.R
@@ -5,9 +5,9 @@
 #' @param data A \code{matrix} or a list of matrices (typically output from \link{chrono.subsets} or \link{custom.subsets} - see details).
 #' @param bootstraps The number of bootstrap pseudoreplicates (\code{default = 100}).
 #' @param rarefaction Either a \code{logical} value whether to fully rarefy the data, a set of \code{numeric} values used to rarefy the data or \code{"min"} to rarefy at the minimum level (see details).
-#' @param dimensions Optional, a vector of \code{numeric} value(s) or the proportion of the dimensions to keep.
-#' @param verbose A \code{logical} value indicating whether to be verbose or not.
 #' @param boot.type The bootstrap algorithm to use (\code{default = "full"}; see details).
+#' @param boot.by Which dimension of the data to bootstrap: either \code{"rows"} to bootstrap the elements (default), \code{"columns"} for the dimensions or \code{"dist"} for bootstrapping both equally (e.g. for distance matrices).
+#' @param verbose A \code{logical} value indicating whether to be verbose or not.
 #' @param prob Optional, a \code{matrix} or a \code{vector} of probabilities for each element to be selected during the bootstrap procedure. The \code{matrix} or the \code{vector} must have a row names or names attribute that corresponds to the elements in \code{data}.
 #' 
 #' @return
@@ -55,8 +55,6 @@
 #' boot.matrix(BeckLee_mat50, bootstraps = 20, rarefaction = TRUE)
 #' ## Bootstrapping an ordinated matrix with only elements 7, 10 and 11 sampled
 #' boot.matrix(BeckLee_mat50, bootstraps = 20, rarefaction = c(7, 10, 11))
-#' ## Bootstrapping an ordinated matrix with only 3 dimensions
-#' boot.matrix(BeckLee_mat50, bootstraps = 20, dimensions = 3)
 #' ## Bootstrapping an the matrix but without sampling Cimolestes and sampling Maelestes 10x more
 #' boot.matrix(BeckLee_mat50, bootstraps = 20, prob = c("Cimolestes" = 0, "Maelestes" = 10))
 #' 
@@ -87,7 +85,7 @@
 # bootstraps <- 3
 # rarefaction <- TRUE
 
-boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions = NULL, verbose = FALSE, boot.type = "full", prob = NULL) {
+boot.matrix <- function(data, bootstraps = 100, boot.type = "full", boot.by = "rows", rarefaction = FALSE, verbose = FALSE, prob = NULL) {
 
     match_call <- match.call()
     ## ----------------------
@@ -96,16 +94,25 @@ boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions
     is_multi <- FALSE
 
     ## DATA
+
+    ## Check boot.by
+    check.length(boot.by, 1, " must be one of the following: rows, columns, dist.")
+    check.method(boot.by, c("rows", "columns", "dist"), "boot.by")
+
     ## If class is dispRity, data is serial
     if(!is(data, "dispRity")) {
         ## Data must be a matrix
         data <- check.dispRity.data(data, returns = c("matrix", "multi"))
         is_multi <- any(is_multi, data$multi)
         data <- data$matrix
 
-        ## Check whether it is a distance matrix
-        if(check.dist.matrix(data[[1]], just.check = TRUE)) {
-            warning("boot.matrix is applied on what seems to be a distance matrix.\nThe resulting matrices won't be distance matrices anymore!", call. = FALSE)
+        ## Check whether it is a distance matrix (and the boot.by is set to both)
+        dist_check <- check.dist.matrix(data[[1]], just.check = TRUE)
+        if(dist_check && boot.by != "dist") {
+            warning("boot.matrix is applied on what seems to be a distance matrix.\nThe resulting matrices won't be distance matrices anymore!\nIf this isn't the desired behavior, you can use the argument:\nboot.by = \"dist\"", call. = FALSE)
+        }
+        if(!dist_check && boot.by == "dist") {
+            warning("boot.matrix is applied to both rows and columns but the input data seems to not be a distance matrix.\nThe resulting bootstraps might not resample it correctly.", call. = FALSE)
         }
 
         ## Creating the dispRity object
@@ -115,12 +122,6 @@ boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions
         if(!is.null(data$call$bootstrap) && data$call$bootstrap[[2]] != "covar") {
             stop.call(msg.pre = "", match_call$data, msg = " was already bootstrapped.")
         }
-
-        ## Must be correct format
-        # check.length(data, 4, " must be either a matrix or an output from the chrono.subsets or custom.subsets functions.")
-        # if(!all(names(data) %in% c("matrix", "call", "subsets")) {
-        #     stop.call(match_call$data, " must be either a matrix or an output from the chrono.subsets or custom.subsets functions.")
-        # }
 
         ## With the correct names
         data_names <- names(data)
@@ -142,6 +143,14 @@ boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions
         }
     }
 
+    ## Check boot.by and data
+    if(!is.null(data$call$dist.data) && data$call$dist.data && boot.by != "dist") {
+        warning(paste0("boot.by not set to \"dist\" (the data will not be treated as a distance matrix) even though ", match_call$data, " contains distance treated data."))
+        ## Toggling data dist
+        data$call$dist.data <- FALSE
+    }
+
+    ## Check verbose
     check.class(verbose, "logical")
 
     ## If is multi lapply the stuff
@@ -162,7 +171,7 @@ boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions
         if(verbose) message("Bootstrapping", appendLF = FALSE)
 
         ## Apply the custom.subsets
-        output <- dispRity.multi.apply(split_data, fun = boot.matrix.call, bootstraps = bootstraps, rarefaction = rarefaction, dimensions = dimensions, verbose = verbose, boot.type = boot.type, prob = prob)
+        output <- dispRity.multi.apply(split_data, fun = boot.matrix.call, bootstraps = bootstraps, rarefaction = rarefaction, verbose = verbose, boot.type = boot.type, boot.by = boot.by, prob = prob)
 
         if(verbose) message("Done.", appendLF = FALSE)
         return(output)
@@ -201,8 +210,11 @@ boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions
 
             ## Check if it has attributes
             prob_names <- attributes(prob)
+
             if(is.null(prob_names)) {
-                stop.call("", "prob argument must have names (vector) or dimnames (matrix) attributes.")
+                if(boot.by != "columns") {
+                    prob_names <- names(prob) <- rownames(data$matrix[[1]])
+                }
             } else {
                 if(is.null(prob_names$names)) {
                     prob_names <- prob_names$dimnames[[1]]
@@ -222,15 +234,17 @@ boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions
             }
 
             ## Check the names
-            if(!all(prob_names %in% rownames(data$matrix[[1]]))) {
-                stop.call(msg.pre = "prob argument contains elements not present in ", call =match_call$data, msg = ".")
-            } else {
-                ## Check if they are any names missing
-                missing_rows <- rownames(data$matrix[[1]]) %in% prob_names
-                if(any(missing_rows)) {
-                    extra_prob <- rep(1, length(which(!missing_rows)))
-                    names(extra_prob) <- rownames(data$matrix[[1]])[!missing_rows]
-                    prob <- c(extra_prob, prob)
+            if(boot.by != "columns") {
+                if(!all(prob_names %in% rownames(data$matrix[[1]]))) {
+                    stop.call(msg.pre = "prob argument contains elements not present in ", call =match_call$data, msg = ".")
+                } else {
+                    ## Check if they are any names missing
+                    missing_rows <- rownames(data$matrix[[1]]) %in% prob_names
+                    if(any(missing_rows)) {
+                        extra_prob <- rep(1, length(which(!missing_rows)))
+                        names(extra_prob) <- rownames(data$matrix[[1]])[!missing_rows]
+                        prob <- c(extra_prob, prob)
+                    }
                 }
             }
 
@@ -245,11 +259,12 @@ boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions
             }
 
             ## Renaming the elements to match the numbers in subsets
-            names(prob) <- match(names(prob), rownames(data$matrix[[1]]))
+            if(boot.by != "columns") {
+                names(prob) <- match(names(prob), rownames(data$matrix[[1]]))
+            }
 
             ## Update the dispRity object
             add.prob <- function(one_subset, prob) {
-
                 col1 <- one_subset$elements
                 col2 <- rep(NA, nrow(one_subset$elements))
                 col3 <- prob[match(one_subset$elements[,1], names(prob))]
@@ -261,7 +276,9 @@ boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions
                 return(one_subset)
             }
 
-            data$subsets <- lapply(data$subsets, add.prob, prob)
+            if(boot.by != "columns") {
+                data$subsets <- lapply(data$subsets, add.prob, prob)
+            }
         }
     }
 
@@ -297,7 +314,7 @@ boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions
     ## BOOT.TYPE
     check.class(boot.type, "character")
     boot.type <- tolower(boot.type)
-    check.length(boot.type, 1, " must be a single character string")
+    check.length(boot.type, 1, " must be one of the following: full, single, null.")
 
     ## Must be one of these methods
     check.method(boot.type, c("full", "single", "null"), "boot.type")
@@ -334,27 +351,26 @@ boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions
         }
     )
 
-    ##  ~~~
-    ##  Add some extra method i.e. proportion of bootstrap shifts?
-    ##  ~~~
-
-    ## RM.LAST.AXIS
-    ## If TRUE, set automatic threshold at 0.95
-    if(!is.null(dimensions)) {
-        ## Else must be a single numeric value (proportional)
-        check.class(dimensions, c("numeric", "integer"), " must be a proportional threshold value.")
-        if(length(dimensions == 1)) {
-            if(dimensions < 0) {
-                stop.call("", "Number of dimensions to remove cannot be less than 0.")
+    ## Add the dimensions to the call
+    if(is.null(data$call$dimensions)) {
+        data$call$dimensions <- 1:ncol(data$matrix[[1]])    
+    }
+
+    ## Switch all the elements
+    if(boot.by != "columns") {
+        ## elements are rows (or both)
+        all_elements <- matrix(1:dim(data$matrix[[1]])[1], ncol = 1)
+    } else {
+        ## elements are columns
+        if(!probabilistic_subsets) {
+            all_elements <- matrix(data$call$dimension, ncol = 1)
+        } else {
+            if(!is.null(prob)) {
+                all_elements <- cbind(data$call$dimension, NA, prob)
+            } else {
+                all_elements <- cbind(data$call$dimension, NA, rep(1, length(data$call$dimension)))
             }
-            if(dimensions < 1) dimensions <- 1:round(dimensions * ncol(data$matrix[[1]]))
-        } 
-        if(any(dimensions > ncol(data$matrix[[1]]))) {
-            stop.call("", "Number of dimensions to remove cannot be more than the number of columns in the matrix.")
         }
-        data$call$dimensions <- dimensions
-    } else {
-        data$call$dimensions <- 1:ncol(data$matrix[[1]])
     }
 
     ## Return object if BS = 0
@@ -384,21 +400,21 @@ boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions
                                         ## Fun 3: Split the data per tree
                                         do.split.subsets, n_trees = n_trees),
                                     ## Fun 2: Apply the bootstraps
-                                    lapply, bootstrap.wrapper, bootstraps_per_tree, rarefaction, boot.type.fun, verbose),
+                                    lapply, bootstrap.wrapper, bootstraps = bootstraps_per_tree, rarefaction = rarefaction, boot.type.fun = boot.type.fun, verbose = verbose, all.elements = all_elements, boot.by = boot.by),
                                 ## Fun 1: Merge into one normal bootstrap table
                                 merge.to.list
                             )
     } else {
         ## Bootstrap the data set 
-        bootstrap_results <- lapply(data$subsets, bootstrap.wrapper, bootstraps, rarefaction, boot.type.fun, verbose, all.elements = 1:dim(data$matrix[[1]])[1])
+        bootstrap_results <- lapply(data$subsets, bootstrap.wrapper, bootstraps = bootstraps, rarefaction = rarefaction, boot.type.fun = boot.type.fun, verbose = verbose, all.elements = all_elements, boot.by = boot.by)
     }
     if(verbose) message("Done.", appendLF = FALSE)
 
     ## Combining and storing the results back in the dispRity object
     data$subsets <- mapply(combine.bootstraps, bootstrap_results, data$subsets, SIMPLIFY = FALSE)
 
     ## Adding the call information about the bootstrap
-    data$call$bootstrap <- c(bootstraps, boot.type, list(rare_out))
+    data$call$bootstrap <- c(bootstraps, boot.type, list(rare_out), boot.by)
 
     return(data)
 }