Skip to content

v1.4.4: blitzen

Latest
Compare
Choose a tag to compare
@nhejazi nhejazi released this 23 Dec 19:26
· 243 commits to master since this release
2bc71d8

v1.4.4 ("Blitzen") is a major release, featuring numerous updates and bugfixes
(totaling 400+ commits spread across ~8 months), including

  • Updates to Lrnr_nnls to support binary outcomes, including support for convexity of the resultant model fit and warnings on prediction quality.
  • Changes to Lrnr_cv_selector to support improved computation of the CV-risk, averaging the risk strictly across validation/holdout sets.
  • Update Lrnr_sl by adding a new private slot .cv_risk to store the risk estimates, using this to avoid unnecessary re-computation in the print method (the .cv_risk slot is populated on the first print call, and only ever re-printed thereafter).
  • Fix Lrnr_screener_importance's pairing of (a) covariates returned by the importance function with (b) covariates as they are defined in the task. This issue only arose when discrete covariates were automatically one-hot encoded upon task initiation (i.e., when colnames(task$X) != task$nodes$covariates).
  • Enhanced functionality in sl3 task's add_interactions method to support interactions that involve factors. This method is most commonly used by Lrnr_define_interactions, which is intended for use with another learner (e.g., Lrnr_glmnet or Lrnr_glm) in a Pipeline.
  • Modified Lrnr_gam formula (if not specified by user) to not use mgcv's default k=10 degrees of freedom for each smooth s term when there are less than k=10 degrees of freedom. This bypasses an mgcv::gam error, and tends to be relevant only for small n.
  • Incorporated min_screen argument Lrnr_screener_coefs, which tries to ensure that at least min_screen number of covariates are selected. If this argument is specified and the learner argument in Lrnr_screener_coefs is a Lrnr_glmnet, then lambda is increased until min_screen number of covariates are selected and a warning is produced. If min_screen is specified and the learner argument in Lrnr_screener_coefs is not a Lrnr_glmnet then it will error.
  • Added formula parameter and process_formula function to the base learner, Lrnr_base, whose methods carry over to all other learners. When a formula is supplied as a learner parameter, the process_formula function constructs a design matrix by supplying the formulatomodel.matrix. This implementation allows formulato be supplied to all learners, even those without nativeformulasupport. Theformula should be an object of class "formula`", or a character string that can be coerced to that class.
  • Added factory function for performance-based risks for binary outcomes with ROCR performance measures custom_ROCR_risk. Supports cutoff-dependent and scalar ROCR performance measures. The risk is defined as 1 - performance, and is transformed back to the performance measure in cv_risk and importance functions. This change prompted the revision of argument name loss_fun and loss_function to eval_fun and eval_function, respectively, since the evaluation of predictions relative to the observations can be either a risk or a loss function. This argument name change impacted the following: Lrnr_solnp, Lrnr_optim, Lrnr_cv_selector, cv_risk, importance, and CV_Lrnr_sl.
  • Incorporated stratified cross-validation when folds are not supplied to the sl3_Task and the outcome is a discrete (i.e., binary or categorical) variable.
  • Added to the importance method the option to evaluate importance over covariate_groups, by removing/permuting all covariates in the same group together.
  • Added Lrnr_ga as another metalearner.

See the NEWS file for complete details.