Closed
Description
When training what seems like the same model (same model specification, same formula, same data) using parsnip vs. using workflows, it is surprising to see different results. I found this behavior quite unexpected, especially what workflows did.
Some options to reduce user surprise 😮 would be more clarity in the functions either in parsnip, in workflows, or both.
lm(Sepal.Length ~ ., iris)
#>
#> Call:
#> lm(formula = Sepal.Length ~ ., data = iris)
#>
#> Coefficients:
#> (Intercept) Sepal.Width Petal.Length Petal.Width
#> 2.1713 0.4959 0.8292 -0.3152
#> Speciesversicolor Speciesvirginica
#> -0.7236 -1.0235
library(parsnip)
lm_spec <- linear_reg() %>%
set_engine(engine = "lm")
## parsnip version looks the same as lm
lm_spec %>%
fit(Sepal.Length ~ ., data = iris)
#> parsnip model object
#>
#> Fit time: 2ms
#>
#> Call:
#> stats::lm(formula = formula, data = data)
#>
#> Coefficients:
#> (Intercept) Sepal.Width Petal.Length Petal.Width
#> 2.1713 0.4959 0.8292 -0.3152
#> Speciesversicolor Speciesvirginica
#> -0.7236 -1.0235
## workflows version has made a different choice about dummy variables
library(workflows)
workflow() %>%
add_model(lm_spec) %>%
add_formula(Sepal.Length ~ .) %>%
fit(data = iris)
#> ══ Workflow [trained] ═══════════════════════════════════════════════════════════════════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: linear_reg()
#>
#> ── Preprocessor ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> Sepal.Length ~ .
#>
#> ── Model ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#>
#> Call:
#> stats::lm(formula = formula, data = data)
#>
#> Coefficients:
#> (Intercept) Sepal.Width Petal.Length Petal.Width
#> 1.1478 0.4959 0.8292 -0.3152
#> Speciessetosa Speciesversicolor Speciesvirginica
#> 1.0235 0.2999 NA
Created on 2020-02-06 by the reprex package (v0.3.0)