You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I have unordered categorical data (aka nominal data), then it doesn't theoretically matter how the LabelEncoder decides to order the categories.
However in practice, certain order are better than others. In particular, an ascending-descending pattern of frequency will allow the data to more closely resemble a bell-curve, which is useful for data science.
Expected behavior
Add another option for the order_by parameter called 'frequency_inverted_v' (name TBD).
When set, the transformer should
Compute the frequencies of each category
Sort the frequencies in an ascending then descending pattern, such that the most common value is in the middle and the overall pattern is an inverted "V" shape (most similar to a bell-shaped curve)
Additional context
Empirically, this seems to produce drastically better results than the default.
Default ordering: Order is assigned first-come, first-serve
V-shaped ordering: Order is assigned by frequency, in an inverted V shape to resemble a bell-shaped distribution.
One way to accomplish this is by sorting the categories by frequency and then assigning them in an alternating fashion from the middle out.
Problem Description
If I have unordered categorical data (aka nominal data), then it doesn't theoretically matter how the LabelEncoder decides to order the categories.
However in practice, certain order are better than others. In particular, an ascending-descending pattern of frequency will allow the data to more closely resemble a bell-curve, which is useful for data science.
Expected behavior
Add another option for the
order_by
parameter called'frequency_inverted_v'
(name TBD).When set, the transformer should
Additional context
Empirically, this seems to produce drastically better results than the default.
Default ordering: Order is assigned first-come, first-serve
V-shaped ordering: Order is assigned by frequency, in an inverted V shape to resemble a bell-shaped distribution.
One way to accomplish this is by sorting the categories by frequency and then assigning them in an alternating fashion from the middle out.
The text was updated successfully, but these errors were encountered: