Strafied K Cross Validation python script #5129
emmabartholomeeusen
started this conversation in
General
Replies: 1 comment
-
There's a large discussion on why we do not support over/undersampling: #3269 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
0
I am using Orange to predict customer churn and compare different learners based on accuracy, F1, etc.
As my problem is unbalanced (10% churn - 90% not churn), I want to oversample. However, when using orange, this is not possible to do the oversampling within the cross-validation (test & score block).
Therefore, I want to, based on my input data, generate first 10 folds (stratified - where the distribution 10 % churn / 90 % not churn) is preserved. Then, oversample within each fold to get 50 - 50 distribution. Then, add for each instance the fold number as a feature. Lastly, within the test & score block, do cross validation by feature, namely the fold number. I think I have to implement this myself by using a Python script. Is there anyone that could help me doing this?
Thank you! Emma
Beta Was this translation helpful? Give feedback.
All reactions