Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PipeEncodeImpact: Add CV #423

Open
pfistfl opened this issue May 8, 2020 · 3 comments · May be fixed by #471
Open

PipeEncodeImpact: Add CV #423

pfistfl opened this issue May 8, 2020 · 3 comments · May be fixed by #471
Assignees
Labels
Status: Needs Discussion We still need to think about what the solution should look like

Comments

@pfistfl
Copy link
Member

pfistfl commented May 8, 2020

See vtreat Webinar for more info.

Concretely, this would mean that we switch out the standard learner with a cross-validated learner.

@sumny sumny added the Status: Needs Discussion We still need to think about what the solution should look like label May 28, 2020
@sumny
Copy link
Member

sumny commented May 28, 2020

Can we have some discussion about how to implement this? Because "training" the encoding happens here and imo this is not straightforward to swap out for a resampled/cross-validated version.

@pfistfl
Copy link
Member Author

pfistfl commented Jun 1, 2020

In principle what we'd basically do:

  • get_state_dt stays as it is. This is used only during predict on new data. We should perhaps factor out the switch logic into a separate function.
  • During train, we estimate the state as-is, but instead of transforming using the state, we transform data using the CV strategy.
    Two comments:
  • We should perhaps consider factoring out those CV methods if we see more of them popping up.
  • Before we do anything we should perhaps see whether we might not want to simply pull in vtreat as a dependency.

@sumny sumny self-assigned this Jun 5, 2020
@pfistfl
Copy link
Member Author

pfistfl commented Jul 14, 2020

this thread alternatively advocates adding a small noise to avoid overfitting and contains some additional interesting info

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Needs Discussion We still need to think about what the solution should look like
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants