-
Notifications
You must be signed in to change notification settings - Fork 121
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Hello,
In my Kaggle journey I use quite often the IQR technique to fill out-of-scale values with predefined or data driven values.
I already have a scikit-compatible implementation of such a method that I use in pipelines to easy validate my models against KFold.
I think that it would be a waste of code to do not include this feature in Sklego, so I'm proposing it to the community. 🧑🤝🧑
Use case scenario:
import pandas as pd
import numpy as np
data = {
'A': np.random.randint(10, 20, size=10),
'B': np.random.randint(100, 200, size=10),
'C': np.random.randint(50, 80, size=10),
'D': np.random.randint(1, 3, size=10)
}
df = pd.DataFrame(data)
df = pd.concat([df, pd.DataFrame({
# Adding by hand some out of scale values
'A': [300, -100],
'B': [1200, -200],
'C': [360, -10],
'D': [30, -40]
})], axis=0)
array([[ 11, 168, 62, 1],
[ 12, 154, 64, 2],
[ 16, 156, 76, 2],
[ 10, 176, 50, 2],
[ 19, 121, 57, 2],
[ 14, 130, 73, 1],
[ 17, 107, 56, 1],
[ 12, 184, 67, 1],
[ 17, 139, 60, 1],
[ 18, 128, 54, 2],
[ 300, 1200, 360, 30],
[-100, -200, -10, -40]])
In this example I decide to fill the values with the column mean (excluding the out-of-scale values detected by IQR)
After transformation:
array([[ 11. , 189. , 77. , 1. ],
[ 14. , 151. , 50. , 1. ],
[ 10. , 177. , 53. , 1. ],
[ 19. , 197. , 63. , 1. ],
[ 19. , 146. , 65. , 2. ],
[ 10. , 189. , 62. , 2. ],
[ 10. , 197. , 54. , 1. ],
[ 19. , 146. , 56. , 1. ],
[ 14. , 162. , 69. , 1. ],
[ 12. , 148. , 75. , 2. ],
[ 13.8, 170.2, 62.4, 1.3],
[ 13.8, 170.2, 62.4, 1.3]])
Do you think such feature will add value to the lego toolkit?
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request