Skip to content

[FEATURE] Preprocessing - IQR Transformer  #717

@fabioscantamburlo

Description

@fabioscantamburlo

Hello,

In my Kaggle journey I use quite often the IQR technique to fill out-of-scale values with predefined or data driven values.

I already have a scikit-compatible implementation of such a method that I use in pipelines to easy validate my models against KFold.

I think that it would be a waste of code to do not include this feature in Sklego, so I'm proposing it to the community. 🧑‍🤝‍🧑

Use case scenario:

import pandas as pd
import numpy as np 


data = {
    'A': np.random.randint(10, 20, size=10),
    'B': np.random.randint(100, 200, size=10),
    'C': np.random.randint(50, 80, size=10),
    'D': np.random.randint(1, 3, size=10)
}
df = pd.DataFrame(data)
df = pd.concat([df, pd.DataFrame({
    # Adding by hand some out of scale values 
    'A': [300, -100],
    'B': [1200, -200],
    'C': [360, -10],
    'D': [30, -40]
    })], axis=0)
array([[  11,  168,   62,    1],
       [  12,  154,   64,    2],
       [  16,  156,   76,    2],
       [  10,  176,   50,    2],
       [  19,  121,   57,    2],
       [  14,  130,   73,    1],
       [  17,  107,   56,    1],
       [  12,  184,   67,    1],
       [  17,  139,   60,    1],
       [  18,  128,   54,    2],
       [ 300, 1200,  360,   30],
       [-100, -200,  -10,  -40]])

In this example I decide to fill the values with the column mean (excluding the out-of-scale values detected by IQR)
After transformation:

array([[ 11. , 189. ,  77. ,   1. ],
       [ 14. , 151. ,  50. ,   1. ],
       [ 10. , 177. ,  53. ,   1. ],
       [ 19. , 197. ,  63. ,   1. ],
       [ 19. , 146. ,  65. ,   2. ],
       [ 10. , 189. ,  62. ,   2. ],
       [ 10. , 197. ,  54. ,   1. ],
       [ 19. , 146. ,  56. ,   1. ],
       [ 14. , 162. ,  69. ,   1. ],
       [ 12. , 148. ,  75. ,   2. ],
       [ 13.8, 170.2,  62.4,   1.3],
       [ 13.8, 170.2,  62.4,   1.3]])

Do you think such feature will add value to the lego toolkit?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions