diff --git a/NEWS.md b/NEWS.md index 28d516368..5fa518067 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,18 @@ +# parsnip 0.0.5 + +## Fixes + +* A bug ([#206](https://github.com/tidymodels/parsnip/issues/206) and [#234](https://github.com/tidymodels/parsnip/issues/234)) was fixed that caused an error when predicting with a multinomial `glmnet` model. + +## Other Changes + + * `glmnet` was removed as a dependency since the new version depends on 3.6.0 or greater. Keeping it would constrain `parsnip` to that same requirement. All `glmnet` tests are run locally. + +## New Features + + * `nnet` was added as an engine to `multinom_reg()` [#209](https://github.com/tidymodels/parsnip/issues/209) + + # parsnip 0.0.4 ## New Features diff --git a/R/misc.R b/R/misc.R index 754bcebc3..b26ee20b0 100644 --- a/R/misc.R +++ b/R/misc.R @@ -276,7 +276,8 @@ update_main_parameters <- function(args, param) { } param <- param[!has_extra_args] - - args <- utils::modifyList(args, param) } + + + diff --git a/R/multinom_reg.R b/R/multinom_reg.R index 532489eb9..abbf52cd9 100644 --- a/R/multinom_reg.R +++ b/R/multinom_reg.R @@ -34,7 +34,7 @@ #' The model can be created using the `fit()` function using the #' following _engines_: #' \itemize{ -#' \item \pkg{R}: `"glmnet"` (the default) +#' \item \pkg{R}: `"glmnet"` (the default), `"nnet"` #' \item \pkg{Stan}: `"stan"` #' \item \pkg{keras}: `"keras"` #' } @@ -49,6 +49,10 @@ #' #' \Sexpr[results=rd]{parsnip:::show_fit(parsnip:::multinom_reg(), "glmnet")} #' +#' \pkg{nnet} +#' +#' \Sexpr[results=rd]{parsnip:::show_fit(parsnip:::multinom_reg(), "nnet")} +#' #' \pkg{spark} #' #' \Sexpr[results=rd]{parsnip:::show_fit(parsnip:::multinom_reg(), "spark")} @@ -197,6 +201,13 @@ organize_multnet_prob <- function(x, object) { as_tibble(x) } +organize_nnet_prob <- function(x, object) { + format_classprobs(x) +} + + + + # ------------------------------------------------------------------------------ # glmnet call stack for multinomial regression using `predict` when object has # classes "_multnet" and "model_fit" (for class predictions): diff --git a/R/multinom_reg_data.R b/R/multinom_reg_data.R index 1ae95896c..b56f9c2cf 100644 --- a/R/multinom_reg_data.R +++ b/R/multinom_reg_data.R @@ -226,3 +226,85 @@ set_pred( x = quote(as.matrix(new_data))) ) ) + + +# ------------------------------------------------------------------------------ + +set_model_engine("multinom_reg", "classification", "nnet") +set_dependency("multinom_reg", "nnet", "nnet") + +set_model_arg( + model = "multinom_reg", + eng = "nnet", + parsnip = "penalty", + original = "decay", + func = list(pkg = "dials", fun = "penalty"), + has_submodel = FALSE +) + +set_fit( + model = "multinom_reg", + eng = "nnet", + mode = "classification", + value = list( + interface = "formula", + protect = c("formula", "data", "weights"), + func = c(pkg = "nnet", fun = "multinom"), + defaults = list(trace = FALSE) + ) +) + + +set_pred( + model = "multinom_reg", + eng = "nnet", + mode = "classification", + type = "class", + value = list( + pre = NULL, + post = NULL, + func = c(fun = "predict"), + args = + list( + object = quote(object$fit), + newdata = quote(new_data), + type = "class" + ) + ) +) + +set_pred( + model = "multinom_reg", + eng = "nnet", + mode = "classification", + type = "prob", + value = list( + pre = NULL, + post = organize_nnet_prob, + func = c(fun = "predict"), + args = + list( + object = quote(object$fit), + newdata = quote(new_data), + type = "prob" + ) + ) +) + +set_pred( + model = "multinom_reg", + eng = "nnet", + mode = "classification", + type = "raw", + value = list( + pre = NULL, + post = NULL, + func = c(fun = "predict"), + args = + list( + object = quote(object$fit), + newdata = quote(new_data) + ) + ) +) + diff --git a/docs/dev/articles/articles/Classification.html b/docs/dev/articles/articles/Classification.html index 22e2b5a86..b0c493db9 100644 --- a/docs/dev/articles/articles/Classification.html +++ b/docs/dev/articles/articles/Classification.html @@ -109,12 +109,12 @@
In parsnip
, the predict
function can be used:.
test_results <-
credit_test %>%
@@ -190,15 +190,15 @@ Classification Example
#> # A tibble: 1 x 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
-#> 1 accuracy binary 0.801
+#> 1 accuracy binary 0.807
test_results %>% conf_mat(truth = Status, nnet_class)
#> Truth
#> Prediction bad good
-#> bad 184 93
-#> good 129 707
multinom_reg()
+Models can be added by the user too. See the “Making a parsnip
model from scratch” vignette.
Look at the formula code that was printed out, one function uses the argument name ntree
and the other uses num.trees
. parsnip
doesn’t require you to know the specific names of the main arguments.
Now suppose that we want to modify the value of mtry
based on the number of predictors in the data. Usually, the default value would be floor(sqrt(num_predictors))
. To use a pure bagging model would require an mtry
value equal to the total number of parameters. There may be cases where you may not know how many predictors are going to be present (perhaps due to the generation of indicator variables or a variable filter) so that might be difficult to know exactly.
When the model it being fit by parsnip
, data descriptors are made available. These attempt to let you know what you will have available when the model is fit. When a model object is created (say using rand_forest
), the values of the arguments that you give it are immediately evaluated… unless you delay them. To delay the evaluation of any argument, you can used rlang::expr
to make an expression.
When the model it being fit by parsnip
, data descriptors are made available. These attempt to let you know what you will have available when the model is fit. When a model object is created (say using rand_forest
), the values of the arguments that you give it are immediately evaluated… unless you delay them. To delay the evaluation of any argument, you can used rlang::expr
to make an expression.
Two relevant descriptors for what we are about to do are:
This vignette describes the process of creating a new model function. Before proceeding, take a minute and read our guidelines on creating modeling packages to get the general themes and conventions that we use.
As an example, we’ll create a function for mixture discriminant analysis. There are a few packages that do this but we’ll focus on mda::mda
:
str(mda::mda)
-#> function (formula = formula(data), data = sys.frame(sys.parent()),
-#> subclasses = 3, sub.df = NULL, tot.df = NULL, dimension = sum(subclasses) -
-#> 1, eps = .Machine$double.eps, iter = 5, weights = mda.start(x,
-#> g, subclasses, trace, ...), method = polyreg, keep.fitted = (n *
-#> dimension < 5000), trace = FALSE, ...)
The main hyper-parameter is the number of subclasses. We’ll name our function mixture_da
.
Before proceeding, it helps to to review how parsnip
categorizes models:
The model type is related to the structural aspect of the model. For example, the model type linear_reg
represents linear models (slopes and intercepts) that model a numeric outcome. Other model types in the package are neighest_nighbors
, decision_tree`, and so on.
The model type is related to the structural aspect of the model. For example, the model type linear_reg
represents linear models (slopes and intercepts) that model a numeric outcome. Other model types in the package are nearest_neighbor
, decision_tree
, and so on.
Within a model type is the mode. This relates to the modeling goal. Currently the two modes in the package are “regression” and “classification”. Some models have methods for both models (e.g. nearest neighbors) while others are specific to a single mode (e.g. logistic regression).
The computation engine is a combination of the estimation method and the implementation. For example, for linear regression, one model is "lm"
and this uses ordinal least squares analysis using the lm
package. Another engine is "stan"
which uses the Stan infrastructure to estimate parameters using Bayes rule.
The computation engine is a combination of the estimation method and the implementation. For example, for linear regression, one model is "lm"
and this uses ordinary least squares analysis using the lm
package. Another engine is "stan"
which uses the Stan infrastructure to estimate parameters using Bayes rule.
When adding a model into parsnip
, the user has to specific which modes and engines are used. The package also enables users to add a new mode or engine to an existing model.
The name that parsnip
uses for the argument. In general, we try to use non-jargony names for arguments (e.g. “penalty” instead of “lambda” for regularized regression). We recommend consulting this page to see if an existing argument name can be used before creating a new one.
The argument name that is used by the underlying modeling function.
A function reference for a constructor that will be used to generate tuning parameter values. This should be a character vector that has a named element called fun
that is the constructor function. There is an optional element pkg
that can be used to call the function using its namespace. If referencing functions from the dials
package, quantitative parameters can have additional arguments in the list for trans
and range
while qualitative parameters can pass values
via this list.
A logical value for wether the argument can be used to generate multiple predictions for a single R object. For example, for boosted trees, if a model is fit with 10 boosting iterations, many modeling packages allow the model object to make predictions for any iterations less than the one used to fit the model. In general this is not the case so one would use has_submodels = FALSE
.
A logical value for whether the argument can be used to generate multiple predictions for a single R object. For example, for boosted trees, if a model is fit with 10 boosting iterations, many modeling packages allow the model object to make predictions for any iterations less than the one used to fit the model. In general this is not the case so one would use has_submodels = FALSE
.
For mda::mda()
, the main tuning parameter is subclasses
which we will rewrite as sub_classes
.
set_model_arg(
@@ -209,7 +208,7 @@
}
# Capture the arguments in quosures
- args <- list(sub_classes = rlang::enquo(sub_classes))
+ args <- list(sub_classes = rlang::enquo(sub_classes))
# Save some empty slots for future parts of the specification
out <- list(args = args, eng_args = NULL,
@@ -270,7 +269,7 @@
-
func
is the prediction function (in the same format as above). In many cases, packages have a predict method for their model’s class but this is typically not exported. In this case (and the example below), it is simple enough to make a generic call to predict
with no associated package.
-
-
args
is a list of arguments to pass to the prediction function. These will mostly likely be wrapped in rlang::expr
so that they are not evaluated when defining the method. For mda
, the code would be predict(object, newdata, type = "class")
. What is actually given to the function is the parsnip
model fit object, which includes a sub-object called fit
and this houses the mda
model object. If the data need to be a matrix or data frame, you could also use newdata = quote(as.data.frame(newdata))
and so on.
+args
is a list of arguments to pass to the prediction function. These will mostly likely be wrapped in rlang::expr
so that they are not evaluated when defining the method. For mda
, the code would be predict(object, newdata, type = "class")
. What is actually given to the function is the parsnip
model fit object, which includes a sub-object called fit
and this houses the mda
model object. If the data need to be a matrix or data frame, you could also use newdata = quote(as.data.frame(newdata))
and so on.
The parsnip
prediction code will expect the result to be an unnamed character string or factor. This will be coerced to a factor with the same levels as the original data.
To add this method to the model environment, a similar set
function is used:
@@ -379,7 +378,7 @@
mda_fit
#> parsnip model object
#>
-#> Fit in: 25msCall:
+#> Fit in: 21msCall:
#> mda::mda(formula = formula, data = data, subclasses = ~2)
#>
#> Dimension: 4
@@ -463,7 +462,7 @@
fit(mpg ~ ., data = mtcars)
#> parsnip model object
#>
-#> Fit in: 5msCall:
+#> Fit in: 3msCall:
#> rlm(formula = formula, data = data)
#> Converged in 8 iterations
#>
@@ -559,7 +558,7 @@
## Registered S3 method overwritten by 'xts':
## method from
## as.zoo.xts zoo
-## ── Attaching packages ────────────────────────────────────────────────────────────────────────────────────────────────────────── tidymodels 0.0.3 ──
-## ✔ broom 0.5.2 ✔ recipes 0.1.7
-## ✔ dials 0.0.3 ✔ rsample 0.0.5
-## ✔ dplyr 0.8.3 ✔ tibble 2.1.3
-## ✔ infer 0.5.0 ✔ yardstick 0.0.4
+## ── Attaching packages ────────────────────────────────────────────────────────────────────────── tidymodels 0.0.3 ──
+## ✔ broom 0.5.2 ✔ recipes 0.1.7.9001
+## ✔ dials 0.0.3.9002 ✔ rsample 0.0.5
+## ✔ dplyr 0.8.3 ✔ tibble 2.99.99.9010
+## ✔ infer 0.5.0 ✔ yardstick 0.0.4
## ✔ purrr 0.3.3
-## ── Conflicts ───────────────────────────────────────────────────────────────────────────────────────────────────────────── tidymodels_conflicts() ──
+## ── Conflicts ───────────────────────────────────────────────────────────────────────────── tidymodels_conflicts() ──
## ✖ purrr::discard() masks scales::discard()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
@@ -229,7 +229,7 @@ Evaluating Submodels with the Same Model Object

name: Bug report or feature request about: Describe a bug you’ve seen or make a case for a new feature —
+Please follow the template below.
+If the question is related at all to a specific data analysis, please include a minimal reprex (reproducible example). If you’ve never heard of a reprex before, start by reading “What is a reprex”, and follow the advice further down that page.
+Tips:
+Here is a good example issue: #139
Issues without a reprex will have a lower priority than the others.
We don’t want you to use confidential data; you can blind the data or simulate other data to demonstrate the issue. The functions caret::twoClassSim()
or caret::SLC14_1()
might be good tools to simulate data for you.
Unless the problem is explicitly about parallel processing, please run sequentially.
+Please use set.seed()
to ensure any randomness in your code is reproducible.
Please check https://stackoverflow.com/ or https://community.rstudio.com/ to see if someone has already asked the same question (see: Yihui’s Rule).
You might need to install these:
When are ready to file the issue, please delete the parts above this line: < – ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ –>
+ + +NEWS.md
glmnet
was removed as a dependency since the new version depends on 3.6.0 or greater. Keeping it would constrain parsnip
to that same requirement. All glmnet
tests are run locally.nnet
was added as an engine to multinom_reg()
#209
+The time elapsed during model fitting is stored in the $elapsed
slot of the parsnip model object, and is printed when the model object is printed.
Some default parameter ranges were updated for SVM, KNN, and MARS models.
The model udpate()
methods gained a parameters
argument for cases when the parameters are contained in a tibble or list.
fit_control()
is soft-deprecated in favor of control_parsnip()
.
A bug was fixed standardizing the output column types of multi_predict
and predict
for multinom_reg
.
A bug was fixed related to using data descriptors and fit_xy()
.
For glmnet
models, the full regularization path is always fit regardless of the value given to penalty
. Previously, the model was fit with passing penalty
to glmnet
’s lambda
argument and the model could only make predictions at those specific values. (#195)
add_rowindex()
can create a column called .row
to a data frame.
If a computational engine is not explicitly set, a default will be used. Each default is documented on the corresponding model page. A warning is issued at fit time unless verbosity is zero.
Small release driven by changes in sample()
in the current r-devel.
A “null model” is now available that fits a predictor-free model (using the mean of the outcome for regression or the mode for classification).
fit_xy()
can take a single column data frame or matrix for y
without error
varying_args()
now has a full
argument to control whether the full set of possible varying arguments is returned (as opposed to only the arguments that are actually varying).
fit_control()
not returns an S3 method.
C5.0_train(x, y, weights = NULL, trials = 15, minCases = 2, - sample = 0, ...)+
C5.0_train(x, y, weights = NULL, trials = 15, minCases = 2, sample = 0, ...)