Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend support for ordinal data type #240

Open
dorisjlee opened this issue Jan 23, 2021 · 1 comment
Open

Extend support for ordinal data type #240

dorisjlee opened this issue Jan 23, 2021 · 1 comment
Assignees
Labels
easy Easy to fix; Good issues for newcomers

Comments

@dorisjlee
Copy link
Member

dorisjlee commented Jan 23, 2021

Ordinal data are common in rating scales for surveys, as well as attributes like Age or number of years for X.
Ordinal data currently gets classified as categorical, especially if the column contains NaN values.
The young people survey dataset on Kaggle is a good example of this, since it contains lots of rating scale data.
image
This issue should extend support for ordinal data type detection, as well as better visualizations to display for ordinal data type. For example, ordinal data bar charts should be ordered instead of sorted based on the measure values. In addition, correlation of one or more ordinal attribute would be relevant to show.

@dorisjlee dorisjlee added the easy Easy to fix; Good issues for newcomers label Jan 23, 2021
@dorisjlee dorisjlee added this to the S2: February 2021 milestone Feb 19, 2021
@dorisjlee
Copy link
Member Author

The absenteeism dataset actually has a couple very interesting columns (e.g., Body mass index, Height) that are quantitative but due to the integer nature and low~medium cardinality, it is detected as nominal. I'm wondering if this would actually be a good use case for the ordinal data type as some intermediate in between. In particular, I feel that nominal is especially inappropriate since we would ideally want a scatterplot for something like BMI and not have these columns be part of Filters with equalities.

df = pd.read_csv("../lux-datasets/data/absenteeism.csv")
df.intent = ["Weight"]
df

image.png

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
easy Easy to fix; Good issues for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants