Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: functional principal component analysis #45

Merged
merged 21 commits into from
Mar 28, 2024
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions .lintr
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
linters: linters_with_defaults(
# lintr defaults: https://github.com/jimhester/lintr#available-linters
# lintr defaults: https://lintr.r-lib.org/reference/default_linters.html
# the following setup changes/removes certain linters
assignment_linter = NULL, # do not force using <- for assignments
object_name_linter = object_name_linter(c("snake_case", "CamelCase")), # only allow snake case and camel case object names
cyclocomp_linter = NULL, # do not check function complexity
commented_code_linter = NULL, # allow code in comments
todo_comment_linter = NULL, # allow todo in comments
line_length_linter = line_length_linter(120),
object_length_linter = object_length_linter(40)
line_length_linter = line_length_linter(120L),
object_length_linter = object_length_linter(40L)
)
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ Collate:
'PipeOpFDAFlatten.R'
'PipeOpFDAInterpol.R'
'PipeOpFDASmooth.R'
'PipeOpFPCA.R'
'TaskClassif_phoneme.R'
'TaskRegr_dti.R'
'TaskRegr_fuel.R'
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ export(PipeOpFDAExtract)
export(PipeOpFDAFlatten)
export(PipeOpFDAInterpol)
export(PipeOpFDASmooth)
export(PipeOpFPCA)
import(R6)
import(checkmate)
import(data.table)
Expand Down
23 changes: 6 additions & 17 deletions R/PipeOpFDAExtract.R
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
#' @export
#' @examples
#' library(mlr3pipelines)
#'
#' task = tsk("fuel")
#' po_fmean = po("fda.extract", features = "mean")
#' task_fmean = po_fmean$train(list(task))[[1L]]
Expand All @@ -57,6 +58,7 @@ PipeOpFDAExtract = R6Class("PipeOpFDAExtract",
#' Identifier of resulting object, default is `"fda.extract"`.
#' @param param_vals (named `list`)\cr
#' List of hyperparameter settings, overwriting the hyperparameter settings that would
#' otherwise be set during construction. Default `list()`.
initialize = function(id = "fda.extract", param_vals = list()) {
param_set = ps(
drop = p_lgl(tags = c("train", "predict", "required")),
Expand Down Expand Up @@ -156,14 +158,7 @@ PipeOpFDAExtract = R6Class("PipeOpFDAExtract",
})
fextractor = make_fextractor(features)

features = map(
cols,
function(col) {
x = dt[[col]]
invoke(fextractor, x = x, left = left, right = right)
}
)

features = map(cols, function(col) invoke(fextractor, x = dt[[col]], left = left, right = right))
features = unlist(features, recursive = FALSE)
features = set_names(features, feature_names)
features = as.data.table(features)
Expand All @@ -188,19 +183,15 @@ make_fextractor = function(features) {
upper = interval[[2L]]

if (is.na(lower) || is.na(upper)) {
res = map(features, function(f) {
rep(NA_real_, length(x)) # no observation in the given interval [left, right]
})
res = map(features, function(f) rep(NA_real_, length(x))) # no observation in the given interval [left, right]
return(res)
}

values = tf::tf_evaluations(x)
arg = args[lower:upper]
res = map(seq_along(x), function(i) {
value = values[[i]]
map(features, function(f) {
f(arg = arg, value = value[lower:upper])
})
map(features, function(f) f(arg = arg, value = value[lower:upper]))
})
return(transform_list(res))
}
Expand All @@ -217,9 +208,7 @@ make_fextractor = function(features) {
if (is.na(lower) || is.na(upper)) {
rep(NA_real_, length(features)) # no observation in the given interval [left, right]
} else {
map(features, function(f) {
f(arg = arg[lower:upper], value = value[lower:upper])
})
map(features, function(f) f(arg = arg[lower:upper], value = value[lower:upper]))
}
})
transform_list(res)
Expand Down
1 change: 1 addition & 0 deletions R/PipeOpFDAFlatten.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#' @export
#' @examples
#' library(mlr3pipelines)
#'
#' task = tsk("fuel")
#' pop = po("fda.flatten")
#' task_flat = pop$train(list(task))
Expand Down
1 change: 1 addition & 0 deletions R/PipeOpFDAInterpol.R
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
#' @export
#' @examples
#' library(mlr3pipelines)
#'
#' task = tsk("fuel")
#' pop = po("fda.interpol")
#' task_interpol = pop$train(list(task))[[1]]
Expand Down
1 change: 1 addition & 0 deletions R/PipeOpFDASmooth.R
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
#' @export
#' @examples
#' library(mlr3pipelines)
#'
#' task = tsk("fuel")
#' po_smooth = po("fda.smooth", method = "rollmean", args = list(k = 5))
#' task_smooth = po_smooth$train(list(task))[[1L]]
Expand Down
95 changes: 95 additions & 0 deletions R/PipeOpFPCA.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
#' @title Functional Principal Component Analysis
#' @name mlr_pipeops_fda.fpca
#'
#' @format [`R6Class`] object inheriting from
#' [`PipeOpTaskPreproc`][mlr3pipelines::PipeOpTaskPreproc]
#'
#' @description
#' This is the class that extracts principal components from functional columns.
sebffischer marked this conversation as resolved.
Show resolved Hide resolved
#' See [`tfb_fpc()`][tf::tfb_fpc] for details.
#'
#' @section Parameters:
#' The parameters are the parameters inherited from [`PipeOpTaskPreproc`], as well as the following parameters:
#' * `pve` :: `numeric(1)` \cr
#' The percentage of variance explained that should be retained.
m-muecke marked this conversation as resolved.
Show resolved Hide resolved
#' * `n_components` :: `integer(1)` \cr
#' The number of principal components to extract.
m-muecke marked this conversation as resolved.
Show resolved Hide resolved
#'
#' @section Naming:
#' The new names generally append a `_pc_{number}` to the corresponding column name.
#' If a column was called `"x"` and the there are three principcal components, the corresponding
#' new columns will be called `"x_pc_1", "x_pc_2", "x_pc_3"`.
#'
#' @section Internals:
#' Uses the [`tfb_fpc()`][tf::tfb_fpc] function.
#'
#' @section Methods:
m-muecke marked this conversation as resolved.
Show resolved Hide resolved
#' Only methods inherited from [`PipeOpTaskPreproc`][mlr3pipelines::PipeOpTaskPreproc]/
#' [`PipeOp`][mlr3pipelines::PipeOp]
#'
#' @export
#' @examples
#' library(mlr3pipelines)
#'
#' task = tsk("fuel")
#' po_fpca = po("fda.fpca")
#' task_fpca = po_fpca$train(list(task))[[1L]]
PipeOpFPCA = R6Class("PipeOpFPCA",
inherit = mlr3pipelines::PipeOpTaskPreproc,
public = list(
#' @description Initializes a new instance of this Class.
#' @param id (`character(1)`)\cr
#' Identifier of resulting object, default is `"fda.fpca"`.
#' @param param_vals (named `list`)\cr
#' List of hyperparameter settings, overwriting the hyperparameter settings that would
#' otherwise be set during construction. Default `list()`.
initialize = function(id = "fda.fpca", param_vals = list()) {
param_set = ps(
pve = p_dbl(default = 0.995, lower = 0, upper = 1, tags = "train"),
n_components = p_int(1L, special_vals = list(Inf), tags = c("train", "required"))
)
param_set$set_values(n_components = Inf)

super$initialize(
id = id,
param_set = param_set,
param_vals = param_vals,
packages = c("mlr3fda", "mlr3pipelines", "tf"),
feature_types = "tfd_reg",
tags = "fda"
)
}
),
private = list(
.train_dt = function(dt, levels, target) {
pars = self$param_set$get_values()

dt = map_dtc(dt, function(x, nm) invoke(tf::tfb_fpc, data = x, .args = pars$pve))
m-muecke marked this conversation as resolved.
Show resolved Hide resolved
self$state = list(fpc = dt)

dt = imap_dtc(dt, function(col, nm) {
map(col, function(x) {
pc = as.list(x[2:min(pars$n_components + 1L, length(x))])
set_names(pc, sprintf("%s_pc_%d", nm, seq_along(pc)))
})
})
unnest(dt, colnames(dt))
},

.predict_dt = function(dt, levels) {
pars = self$param_set$get_values()

dt = imap_dtc(dt, function(col, nm) {
fpc = tf::tf_rebase(col, self$state$fpc[[nm]], arg = tf::tf_arg(col))
map(fpc, function(x) {
pc = as.list(x[2:min(pars$n_components + 1L, length(x))])
set_names(pc, sprintf("%s_pc_%d", nm, seq_along(pc)))
sebffischer marked this conversation as resolved.
Show resolved Hide resolved
})
})
unnest(dt, colnames(dt))
}
)
)

#' @include zzz.R
register_po("fda.fpca", PipeOpFPCA)
2 changes: 1 addition & 1 deletion R/TaskRegr_dti.R
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ load_task_dti = function(id = "dti") {
rcst = tf::tfd(dti$rcst, arg = seq(0L, 1L, length.out = 55L)),
sex = dti$sex
)
dti = na.omit(dti)
dti = stats::na.omit(dti)
b = as_data_backend(dti)

task = TaskRegr$new(
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,7 @@ glrn$predict(task, row_ids = ids$test)
|:-------------------------------------------------------------------------------|:-------------------------------------------------|:---------------------------------------------------|:--------------------|
| [fda.extract](https://mlr3fda.mlr-org.com/reference/mlr_pipeops_fda.extract) | Extracts Simple Features from Functional Columns | [tf](https://cran.r-project.org/package=tf) | fda, data transform |
| [fda.flatten](https://mlr3fda.mlr-org.com/reference/mlr_pipeops_fda.flatten) | Flattens Functional Columns | [tf](https://cran.r-project.org/package=tf) | fda, data transform |
| [fda.fpca](https://mlr3fda.mlr-org.com/reference/mlr_pipeops_fda.fpca) | Functional Principal Component Analysis | [tf](https://cran.r-project.org/package=tf) | fda, data transform |
| [fda.interpol](https://mlr3fda.mlr-org.com/reference/mlr_pipeops_fda.interpol) | Interpolate Functional Columns | [tf](https://cran.r-project.org/package=tf) | fda, data transform |
| [fda.smooth](https://mlr3fda.mlr-org.com/reference/mlr_pipeops_fda.smooth) | Smoothing Functional Columns | [tf](https://cran.r-project.org/package=tf), stats | fda, data transform |

Expand Down
4 changes: 3 additions & 1 deletion man/mlr_pipeops_fda.extract.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/mlr_pipeops_fda.flatten.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

110 changes: 110 additions & 0 deletions man/mlr_pipeops_fda.fpca.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/mlr_pipeops_fda.interpol.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/mlr_pipeops_fda.smooth.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading