Skip to content

Commit

Permalink
Merge pull request #131 from TGuillerme/RAM.help
Browse files Browse the repository at this point in the history
Merging pre 1.9 into master
  • Loading branch information
TGuillerme authored Nov 12, 2024
2 parents e5711ad + 7054bdd commit b44bc36
Show file tree
Hide file tree
Showing 60 changed files with 1,848 additions and 651 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Authors@R: c(person("Thomas", "Guillerme", role = c("aut", "cre", "cph"),
)
Maintainer: Thomas Guillerme <guillert@tcd.ie>
Version: 1.8.11
Date: 2024-06-06
Date: 2024-10-25
Description: A modular package for measuring disparity (multidimensional space occupancy). Disparity can be calculated from any matrix defining a multidimensional space. The package provides a set of implemented metrics to measure properties of the space and allows users to provide and test their own metrics. The package also provides functions for looking at disparity in a serial way (e.g. disparity through time) or per groups as well as visualising the results. Finally, this package provides several statistical tests for disparity analysis.
Depends:
R (>= 3.6.0),
Expand Down
26 changes: 25 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,34 @@
dispRity v1.8.11 (2024-06-06)
dispRity v1.9 (2024-06-06) *distant update*
=========================


<!-- Blurb:
* `dispRity` has been not greatly optimised for using distance matrices: 1) it's now much faster thanks to the `dist.helper` new optional argument (storing the distance matrix in the cache) and 2) it now allows direct analyses of distance matrices in a dispRity pipeline.
* `boot.matrix` function has now been generalised to be able to bootstrap any elements of a matrix. Previously it only allowed to bootstrap elements (rows) but now can work on dimensions (columns) or both (distances).
* Redesigned `multi.ace` to be more modular and better handle both continuous and/or discrete characters. This is secretly a pre-release for a future version that will greatly improve pipelines with ancestral state estimations ;).
* Redesigned `dispRity.dtt` to be closely a 1 to 1 interface of `geiger::dtt` with default options.
* New utility functions (`set.root.time` to add root times to trees; `remove.dispRity` to cleanly remove parts of dispRity objects) and metrics (`count.neigbhours`).
* Loads of minor improvements and couple of bug fixes! Yay!
-->

### NEW FEATURES
* Redesigned `multi.ace` to be more modular and handle both continuous and/or discrete characters. Changes include a **change in argument name** from `castor.options` to the generic `options.args` (the options can be provided the same way as before though); and a **change in default arguments** for `models` which can now be left missing (previously was `"ER"`) and applies `"ER"` and `"BM"` for respectively discrete and continuous characters by default.
* New design when using distance matrices: `dist.helper` now allows to save distance matrices in the cache, saving a lot of RAM and speeding up calculations. You can use the helper using `dispRity(..., dist.helper = my_distance_function)` or `dispRity(..., dist.helper = my_distance_matrix)`.
* *New dispRity, custom.subsets and chrono.subsets option*: these three functions can now use `dist.data = TRUE` to specify that the input data is a distance matrix (and handle it accordingly).
* *New bootstrap options*: you can now use `boot.by` to specify whether bootstrap the rows (previous behaviour), the columns or both (for distance matrices).

* Redesigned `dispRity.dtt` to be closely a 1 to 1 interface of `geiger::dtt` with default options.

* *New utility function* `set.root.time` to add a root time to a tree (`"phylo"`), list of trees (`"multiPhylo"`) or `dispRity` object with trees.
* *New utility function* `remove.dispRity` to cleanly remove specific parts of a `"dispRity"` object.
* *New metric*: `count.neighbours` to count the number of neighbours for each elements within a certain radius (thanks to Rob MacDonald for the suggestion).

### MINOR IMPROVEMENTS

* `custom.subsets` can now take a logical vector for the `group` argument.
* `custom.subsets` now recycles node names when using a tree to create clade groups.
* `plot` functions doing scatter plot now centers them without changing the scale of both axes.
* **changed default argument** for `tree.age`: the number of digits output by `tree.age` is now changed from 3 to 4 by default.
* the random starting parameters in `reduce.space` are now drawn from the input data distribution which speeds up the function significantly.
Expand All @@ -23,6 +42,11 @@ dispRity v1.8.11 (2024-06-06)
* `multi.ace` now correctly handles invariant characters when looking for NAs.
* `dispRity` objects with a `$covar` component are not interpreted as bootstrapped by `boot.matrix` anymore.

### DEPRECATED

* The `dimensions` argument from `boot.matrix` is now removed: it has been redundant with the `dimensions` argument in the `dispRity` since v0.3!


dispRity v1.8 (2023-12-11) *dispRity.multi*
=========================

Expand Down
6 changes: 6 additions & 0 deletions R/MCMCglmm.subsets.R
Original file line number Diff line number Diff line change
Expand Up @@ -62,13 +62,19 @@ MCMCglmm.subsets <- function(data, posteriors, group, tree, rename.groups, set.l
if(any(classifier)) {
group_classifier <- data[,which(!numerics)[which(classifier)], drop = FALSE]
}
} else {
cleaned_data <- data
group_classifier <- matrix(1, nrow = nrow(data), ncol = 1, dimnames = list(rownames(data)))
}

## Checking the posteriors
check.class(posteriors, "MCMCglmm")

## Check which dimensions where used
dimensions <- match(MCMCglmm.traits(posteriors), colnames(cleaned_data))
if(all(is.na(dimensions))) {
stop.call(msg = "Could not match any column in the data with the posterior samples. Make sure the data column names are the same as the one used in the MCMCglmm.", call = "")
}

## Extracting the residuals and randoms
posterior_levels <- MCMCglmm.levels(posteriors)
Expand Down
6 changes: 4 additions & 2 deletions R/adonis.dispRity.R
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@
#'
#'
#' @author Thomas Guillerme
#' @references Oksanen J, Simpson G, Blanchet F, Kindt R, Legendre P, Minchin P, O'Hara R, Solymos P, Stevens M, Szoecs E, Wagner H, Barbour M, Bedward M, Bolker B, Borcard D, Carvalho G, Chirico M, De Caceres M, Durand S, Evangelista H, FitzJohn R, Friendly M, Furneaux B, Hannigan G, Hill M, Lahti L, McGlinn D, Ouellette M, Ribeiro Cunha E, Smith T, Stier A, Ter Braak C, Weedon J (2024). vegan: Community Ecology Package_. R package version 2.6-8,

# @export

# source("sanitizing.R")
Expand Down Expand Up @@ -218,8 +220,8 @@ adonis.dispRity <- function(data, formula = matrix ~ group, method = "euclidean"

## Run adonis
## Modifying adonis2 to only check the parent environment (not the global one: matrix input here should be present in the environment
adonis2.modif <- vegan::adonis2
formals(adonis2.modif) <-c(formals(vegan::adonis2), "matrix_input" = NA)
adonis2.modif <- adonis2
formals(adonis2.modif) <-c(formals(adonis2), "matrix_input" = NA)
body(adonis2.modif)[[5]] <- substitute(lhs <- matrix_input)
adonis_out <- adonis2.modif(formula, predictors, method = method, matrix_input = matrix, ...)
# adonis_out <- adonis2.modif(formula, predictors, method = method, matrix_input = matrix) ; warning("DEBUG adonis.dispRity")
Expand Down
118 changes: 67 additions & 51 deletions R/boot.matrix.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
#' @param data A \code{matrix} or a list of matrices (typically output from \link{chrono.subsets} or \link{custom.subsets} - see details).
#' @param bootstraps The number of bootstrap pseudoreplicates (\code{default = 100}).
#' @param rarefaction Either a \code{logical} value whether to fully rarefy the data, a set of \code{numeric} values used to rarefy the data or \code{"min"} to rarefy at the minimum level (see details).
#' @param dimensions Optional, a vector of \code{numeric} value(s) or the proportion of the dimensions to keep.
#' @param verbose A \code{logical} value indicating whether to be verbose or not.
#' @param boot.type The bootstrap algorithm to use (\code{default = "full"}; see details).
#' @param boot.by Which dimension of the data to bootstrap: either \code{"rows"} to bootstrap the elements (default), \code{"columns"} for the dimensions or \code{"dist"} for bootstrapping both equally (e.g. for distance matrices).
#' @param verbose A \code{logical} value indicating whether to be verbose or not.
#' @param prob Optional, a \code{matrix} or a \code{vector} of probabilities for each element to be selected during the bootstrap procedure. The \code{matrix} or the \code{vector} must have a row names or names attribute that corresponds to the elements in \code{data}.
#'
#' @return
Expand Down Expand Up @@ -55,8 +55,6 @@
#' boot.matrix(BeckLee_mat50, bootstraps = 20, rarefaction = TRUE)
#' ## Bootstrapping an ordinated matrix with only elements 7, 10 and 11 sampled
#' boot.matrix(BeckLee_mat50, bootstraps = 20, rarefaction = c(7, 10, 11))
#' ## Bootstrapping an ordinated matrix with only 3 dimensions
#' boot.matrix(BeckLee_mat50, bootstraps = 20, dimensions = 3)
#' ## Bootstrapping an the matrix but without sampling Cimolestes and sampling Maelestes 10x more
#' boot.matrix(BeckLee_mat50, bootstraps = 20, prob = c("Cimolestes" = 0, "Maelestes" = 10))
#'
Expand Down Expand Up @@ -87,7 +85,7 @@
# bootstraps <- 3
# rarefaction <- TRUE

boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions = NULL, verbose = FALSE, boot.type = "full", prob = NULL) {
boot.matrix <- function(data, bootstraps = 100, boot.type = "full", boot.by = "rows", rarefaction = FALSE, verbose = FALSE, prob = NULL) {

match_call <- match.call()
## ----------------------
Expand All @@ -96,16 +94,25 @@ boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions
is_multi <- FALSE

## DATA

## Check boot.by
check.length(boot.by, 1, " must be one of the following: rows, columns, dist.")
check.method(boot.by, c("rows", "columns", "dist"), "boot.by")

## If class is dispRity, data is serial
if(!is(data, "dispRity")) {
## Data must be a matrix
data <- check.dispRity.data(data, returns = c("matrix", "multi"))
is_multi <- any(is_multi, data$multi)
data <- data$matrix

## Check whether it is a distance matrix
if(check.dist.matrix(data[[1]], just.check = TRUE)) {
warning("boot.matrix is applied on what seems to be a distance matrix.\nThe resulting matrices won't be distance matrices anymore!", call. = FALSE)
## Check whether it is a distance matrix (and the boot.by is set to both)
dist_check <- check.dist.matrix(data[[1]], just.check = TRUE)
if(dist_check && boot.by != "dist") {
warning("boot.matrix is applied on what seems to be a distance matrix.\nThe resulting matrices won't be distance matrices anymore!\nIf this isn't the desired behavior, you can use the argument:\nboot.by = \"dist\"", call. = FALSE)
}
if(!dist_check && boot.by == "dist") {
warning("boot.matrix is applied to both rows and columns but the input data seems to not be a distance matrix.\nThe resulting bootstraps might not resample it correctly.", call. = FALSE)
}

## Creating the dispRity object
Expand All @@ -115,12 +122,6 @@ boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions
if(!is.null(data$call$bootstrap) && data$call$bootstrap[[2]] != "covar") {
stop.call(msg.pre = "", match_call$data, msg = " was already bootstrapped.")
}

## Must be correct format
# check.length(data, 4, " must be either a matrix or an output from the chrono.subsets or custom.subsets functions.")
# if(!all(names(data) %in% c("matrix", "call", "subsets")) {
# stop.call(match_call$data, " must be either a matrix or an output from the chrono.subsets or custom.subsets functions.")
# }

## With the correct names
data_names <- names(data)
Expand All @@ -142,6 +143,14 @@ boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions
}
}

## Check boot.by and data
if(!is.null(data$call$dist.data) && data$call$dist.data && boot.by != "dist") {
warning(paste0("boot.by not set to \"dist\" (the data will not be treated as a distance matrix) even though ", match_call$data, " contains distance treated data."))
## Toggling data dist
data$call$dist.data <- FALSE
}

## Check verbose
check.class(verbose, "logical")

## If is multi lapply the stuff
Expand All @@ -162,7 +171,7 @@ boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions
if(verbose) message("Bootstrapping", appendLF = FALSE)

## Apply the custom.subsets
output <- dispRity.multi.apply(split_data, fun = boot.matrix.call, bootstraps = bootstraps, rarefaction = rarefaction, dimensions = dimensions, verbose = verbose, boot.type = boot.type, prob = prob)
output <- dispRity.multi.apply(split_data, fun = boot.matrix.call, bootstraps = bootstraps, rarefaction = rarefaction, verbose = verbose, boot.type = boot.type, boot.by = boot.by, prob = prob)

if(verbose) message("Done.", appendLF = FALSE)
return(output)
Expand Down Expand Up @@ -201,8 +210,11 @@ boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions

## Check if it has attributes
prob_names <- attributes(prob)

if(is.null(prob_names)) {
stop.call("", "prob argument must have names (vector) or dimnames (matrix) attributes.")
if(boot.by != "columns") {
prob_names <- names(prob) <- rownames(data$matrix[[1]])
}
} else {
if(is.null(prob_names$names)) {
prob_names <- prob_names$dimnames[[1]]
Expand All @@ -222,15 +234,17 @@ boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions
}

## Check the names
if(!all(prob_names %in% rownames(data$matrix[[1]]))) {
stop.call(msg.pre = "prob argument contains elements not present in ", call =match_call$data, msg = ".")
} else {
## Check if they are any names missing
missing_rows <- rownames(data$matrix[[1]]) %in% prob_names
if(any(missing_rows)) {
extra_prob <- rep(1, length(which(!missing_rows)))
names(extra_prob) <- rownames(data$matrix[[1]])[!missing_rows]
prob <- c(extra_prob, prob)
if(boot.by != "columns") {
if(!all(prob_names %in% rownames(data$matrix[[1]]))) {
stop.call(msg.pre = "prob argument contains elements not present in ", call =match_call$data, msg = ".")
} else {
## Check if they are any names missing
missing_rows <- rownames(data$matrix[[1]]) %in% prob_names
if(any(missing_rows)) {
extra_prob <- rep(1, length(which(!missing_rows)))
names(extra_prob) <- rownames(data$matrix[[1]])[!missing_rows]
prob <- c(extra_prob, prob)
}
}
}

Expand All @@ -245,11 +259,12 @@ boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions
}

## Renaming the elements to match the numbers in subsets
names(prob) <- match(names(prob), rownames(data$matrix[[1]]))
if(boot.by != "columns") {
names(prob) <- match(names(prob), rownames(data$matrix[[1]]))
}

## Update the dispRity object
add.prob <- function(one_subset, prob) {

col1 <- one_subset$elements
col2 <- rep(NA, nrow(one_subset$elements))
col3 <- prob[match(one_subset$elements[,1], names(prob))]
Expand All @@ -261,7 +276,9 @@ boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions
return(one_subset)
}

data$subsets <- lapply(data$subsets, add.prob, prob)
if(boot.by != "columns") {
data$subsets <- lapply(data$subsets, add.prob, prob)
}
}
}

Expand Down Expand Up @@ -297,7 +314,7 @@ boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions
## BOOT.TYPE
check.class(boot.type, "character")
boot.type <- tolower(boot.type)
check.length(boot.type, 1, " must be a single character string")
check.length(boot.type, 1, " must be one of the following: full, single, null.")

## Must be one of these methods
check.method(boot.type, c("full", "single", "null"), "boot.type")
Expand Down Expand Up @@ -334,27 +351,26 @@ boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions
}
)

## ~~~
## Add some extra method i.e. proportion of bootstrap shifts?
## ~~~

## RM.LAST.AXIS
## If TRUE, set automatic threshold at 0.95
if(!is.null(dimensions)) {
## Else must be a single numeric value (proportional)
check.class(dimensions, c("numeric", "integer"), " must be a proportional threshold value.")
if(length(dimensions == 1)) {
if(dimensions < 0) {
stop.call("", "Number of dimensions to remove cannot be less than 0.")
## Add the dimensions to the call
if(is.null(data$call$dimensions)) {
data$call$dimensions <- 1:ncol(data$matrix[[1]])
}

## Switch all the elements
if(boot.by != "columns") {
## elements are rows (or both)
all_elements <- matrix(1:dim(data$matrix[[1]])[1], ncol = 1)
} else {
## elements are columns
if(!probabilistic_subsets) {
all_elements <- matrix(data$call$dimension, ncol = 1)
} else {
if(!is.null(prob)) {
all_elements <- cbind(data$call$dimension, NA, prob)
} else {
all_elements <- cbind(data$call$dimension, NA, rep(1, length(data$call$dimension)))
}
if(dimensions < 1) dimensions <- 1:round(dimensions * ncol(data$matrix[[1]]))
}
if(any(dimensions > ncol(data$matrix[[1]]))) {
stop.call("", "Number of dimensions to remove cannot be more than the number of columns in the matrix.")
}
data$call$dimensions <- dimensions
} else {
data$call$dimensions <- 1:ncol(data$matrix[[1]])
}

## Return object if BS = 0
Expand Down Expand Up @@ -384,21 +400,21 @@ boot.matrix <- function(data, bootstraps = 100, rarefaction = FALSE, dimensions
## Fun 3: Split the data per tree
do.split.subsets, n_trees = n_trees),
## Fun 2: Apply the bootstraps
lapply, bootstrap.wrapper, bootstraps_per_tree, rarefaction, boot.type.fun, verbose),
lapply, bootstrap.wrapper, bootstraps = bootstraps_per_tree, rarefaction = rarefaction, boot.type.fun = boot.type.fun, verbose = verbose, all.elements = all_elements, boot.by = boot.by),
## Fun 1: Merge into one normal bootstrap table
merge.to.list
)
} else {
## Bootstrap the data set
bootstrap_results <- lapply(data$subsets, bootstrap.wrapper, bootstraps, rarefaction, boot.type.fun, verbose, all.elements = 1:dim(data$matrix[[1]])[1])
bootstrap_results <- lapply(data$subsets, bootstrap.wrapper, bootstraps = bootstraps, rarefaction = rarefaction, boot.type.fun = boot.type.fun, verbose = verbose, all.elements = all_elements, boot.by = boot.by)
}
if(verbose) message("Done.", appendLF = FALSE)

## Combining and storing the results back in the dispRity object
data$subsets <- mapply(combine.bootstraps, bootstrap_results, data$subsets, SIMPLIFY = FALSE)

## Adding the call information about the bootstrap
data$call$bootstrap <- c(bootstraps, boot.type, list(rare_out))
data$call$bootstrap <- c(bootstraps, boot.type, list(rare_out), boot.by)

return(data)
}
Loading

0 comments on commit b44bc36

Please sign in to comment.