-
Notifications
You must be signed in to change notification settings - Fork 630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Formula vs non-formula interface with train() #803
Comments
Hello, Review this presentation for the explanation (slide 16):
Regards, |
Most[*] models require numeric representations of the data so you would have to convert them to dummy variables before using the non-formula method, or use the formula or recipe interfaces to [*] 99.X% of them but trees, rule-based models, and a few others (naive Bayes) generally do not. |
I figured much of this out by now. Thanks to both of you for comments. |
For caret newbies like me, here is another caveat for using the non-formula method: |
There is a workaround for people who wants to makes use of the flexibility of formulas when modelling with catboost. Define a formula as
where |
Hello,
First, I cannot thank you enough for all your tremendous contributions with packages and book/seminars/webinars/courses. I am new to caret, and learning a lot everyday. I have the following issue that I reported elsewhere, and can see that others have had similar problems too, but haven't found a solution yet. The issue is that I can get
glmnet, ranger
andxgbTree
working with formula interface for both classification and regression problem, but they all fail with non-formula interface. Judging by the consistency with which I am facing this issue, it could be either a feature of the models/methods that I am yet to understand, or perhaps something is wrong with my own setup. The only thing I know (?) is that the formula interface causestrain()
to convert each categorical variable into indicator variables, but not sure if that could be the source of this difference. All my datasets have multiple categorical predictors, so could not test with one that does not.I could use my own codes/datasets here, but thought a better illustration could be with one of your example codes pasted below (taken from DataCamp course). The formula/non-formula interface and method can be chosen by uncommenting appropriately. The error I get with this code is different from what I get with my own code, but the pattern of failure seems similar. I am using RStudio v1.1.382, R v3.4.3_x64, caret v6.0-78 (devtools version).
Thanks again,
Manojit
The text was updated successfully, but these errors were encountered: