Pre-Encoding for Correlations #488

reza1615 · 2021-05-20T15:30:03Z

As you know Correlations just works on the numerical columns and for the categorical doesn't work.
It would be nice to have an option to one-hot (encode) all categorical columns in a batch after that user can get the Correlations.

aschonfeld · 2021-05-24T21:11:22Z

Can you refresh my memory on how the "one-hot" (encode) works? We made column builders for this, correct?

reza1615 · 2021-05-24T22:30:33Z

yes we have column builder.
Image the data has 20 categorical column. to have onehot for all of them to get correlation we should 20 times in column builder create onehot column
this request is to create 20 column's one hot in one request

aschonfeld · 2021-05-25T02:57:07Z

So what if I added a toggle which checks for any string or categorical column and auto convert using OneHotEncoder
Also what if there was a toggle to add in date columns which have been converted to millisecond timestamps

reza1615 · 2021-05-25T10:34:28Z

In my opinion when we open correlation window 1-A multi select drop down lists all dtype(object) 2-An exclud multi select drop down list all column with high cardinality (df['A'].nunique()>50) as exclud from one-hot 3-by default user only get numerical col correlation 4-by click on a button Dtale convert all object columns -exclud the second list of high cardinality then calculate correlation. encoding high cordinal columns generats many column which effects performance for the date why you want check military second?

aschonfeld · 2021-05-29T00:17:37Z

added in v1.48.0

aschonfeld added the enhancement New feature or request label May 21, 2021

aschonfeld added a commit that referenced this issue May 27, 2021

#488: string encoding for correlations

37a1d0d

aschonfeld added a commit that referenced this issue May 28, 2021

#488: string encoding for correlations

49a905e

aschonfeld closed this as completed May 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-Encoding for Correlations #488

Pre-Encoding for Correlations #488

reza1615 commented May 20, 2021

aschonfeld commented May 24, 2021

reza1615 commented May 24, 2021 •

edited

Loading

aschonfeld commented May 25, 2021

reza1615 commented May 25, 2021 via email •

edited

Loading

aschonfeld commented May 29, 2021

Pre-Encoding for Correlations #488

Pre-Encoding for Correlations #488

Comments

reza1615 commented May 20, 2021

aschonfeld commented May 24, 2021

reza1615 commented May 24, 2021 • edited Loading

aschonfeld commented May 25, 2021

reza1615 commented May 25, 2021 via email • edited Loading

aschonfeld commented May 29, 2021

reza1615 commented May 24, 2021 •

edited

Loading

reza1615 commented May 25, 2021 via email •

edited

Loading