-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The order of categorical variables #93
Comments
Thanks for picking this up. Version 0.7.5 now respects the order of categorical variables. For example: import pandas as pd
from tableone import TableOne
day_cat = pd.Categorical(["mon", "wed", "tue", "thu"],
categories=["wed", "thu", "mon", "tue"], ordered=True)
alph_cat = pd.Categorical(["a", "b", "c", "a"],
categories=["b", "c", "d", "a"], ordered=False)
mon_cat = pd.Categorical(["jan", "feb", "mar", "apr"],
categories=["feb", "jan", "mar", "apr"], ordered=True)
data = pd.DataFrame({"A": ["a", "b", "c", "a"]})
data["day"] = day_cat
data["alph"] = alph_cat
data["month"] = mon_cat
data Input DataFrame.Note that the order specified in the DataFrame for day is # the categorical order reflects the order in the DataFrame
t1 = TableOne(data, label_suffix=False)
t1 Table 1 uses the order specified in the DataFrame.The order of day and month is retained in Table 1: The
|
@epimedplotly please test the sorting if you have the opportunity (the latest version can be pip/conda installed) and let us know if it works as expected.
Sounds reasonable to me. We haven't implemented this yet, but can look into it. |
Hello, @tompollard ! I finally had the opportunity to test the latest version of tableone. It is really working how I expected, thank you so much! I reiterate that if there isn't an order for a categorical variable it would be awesome if it could be ordered by the percentual of each category, but the order argument is already making my life easier. Thanks! |
thanks @epimedplotly, glad to hear this helps :)
Point taken, and let's keep this issue open for now. If you come up with new bugs, suggestions, etc, please feel free to raise more issues. |
I'd also be interested in this functionality. It looks like it's already implemented in the Lines 1501 to 1516 in bfd6fba
|
For anyone that wants a adhoc fix: # Function to sort values in a column by frequency
def sort_by_frequency(series):
freq = series.value_counts()
sorted_values = freq.index.tolist()
return sorted_values
# Apply the function to each column
sorted_values_by_column = {col: sort_by_frequency(dfcol]) for col in df[columns].columns}
mytable = TableOne(df, ..., order=sorted_values_by_column) |
Hello.
I’d like to suggest you to allow for categorical variables be ordered in TableOne.
For example:
Suppose I have a variable that can assume values: “<10”,”10-20”,”>20”
I’d like to see it on TableOne in exactly order above.
But, instead of that, it seems to assume an alphabetic order like ”10-20”,”<10”,”<20”.
It would be usefull to see the correctly order for that.
Also, if that isn't an order for a categorical variable, it should be ordered by the percentual of each category, don't you agree?
Thanks for your attention.
Best regards,
Lunna
The text was updated successfully, but these errors were encountered: