Skip to content

Feature Request: ability to pass dataframe to validation argument of xgboost #765

Open
@joeycouse

Description

@joeycouse

Related to #760

Current implementation of the validation parameter in boost_tree is to only set the proportion of training data to use as the validation set. It would be great to have the ability to pass a dataframe as an argument to validation as well. This would be really helpful if there is a grouping structure within the data and you want to test if the model generalizes to difference groups, and would align the parsnip capabilities to match xbg.train()

Not a great example but just for demonstration

library(modeldata)

data("penguins")

train <- 
  penguins |> 
  filter(species %in% c("Gentoo", "Adelie"))

valid <-
  penguins |> 
  filter(!(species %in% c("Gentoo", "Adelie")))


boost_tree(mode = 'regression',
           mtry = 3,
           tree_depth = 2,
           stop_iter = 5) |> 
  set_engine(validation = valid)

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions