You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I like this idea. Before we implement it I want to consider how this might affect other input values that aren't percentages.
For example, what if the user passes a DataFrame with class labels that are, say, ">50%" and "<=50%"? We obviously don't want to parse that into numerical percentages, nor do we want to remove the percentages.
Perhaps one way to accomplish this is:
Check if the column is of type 'object'. If not, then it won't contain a '%' anyway.
Check if any entry in the column contains a '%'. If not, skip the column.
Make a copy of the column and apply the transformation you suggested. If it doesn't crash, then it very likely was a string encoding of a percentage. If it does crash, then it probably was some other string(s) that contained %s.
In the non-crashing case, apply the change to the column.
Are there any cases that such a procedure would miss and incorrectly encode?
You have some datasets that have % values strings e.g. '95%',''82%' etc.
It would be great if this could be automatically dealt with. On Pandas dataframe this can be done with
df = df.replace('%','',regex=True).astype('float')
The text was updated successfully, but these errors were encountered: