Strafied K Cross Validation python script #5129

emmabartholomeeusen · 2020-12-15T14:58:48Z

emmabartholomeeusen
Dec 15, 2020

0

I am using Orange to predict customer churn and compare different learners based on accuracy, F1, etc.

As my problem is unbalanced (10% churn - 90% not churn), I want to oversample. However, when using orange, this is not possible to do the oversampling within the cross-validation (test & score block).

Therefore, I want to, based on my input data, generate first 10 folds (stratified - where the distribution 10 % churn / 90 % not churn) is preserved. Then, oversample within each fold to get 50 - 50 distribution. Then, add for each instance the fold number as a feature. Lastly, within the test & score block, do cross validation by feature, namely the fold number. I think I have to implement this myself by using a Python script. Is there anyone that could help me doing this?

Thank you! Emma

ajdapretnar · 2020-12-18T08:21:26Z

ajdapretnar
Dec 18, 2020
Maintainer

There's a large discussion on why we do not support over/undersampling: #3269

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strafied K Cross Validation python script #5129

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Strafied K Cross Validation python script #5129

emmabartholomeeusen Dec 15, 2020

Replies: 1 comment

ajdapretnar Dec 18, 2020 Maintainer

emmabartholomeeusen
Dec 15, 2020

ajdapretnar
Dec 18, 2020
Maintainer