Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out better autocleaning comparison #220

Open
1 task done
paddymul opened this issue Feb 14, 2024 · 0 comments
Open
1 task done

Figure out better autocleaning comparison #220

paddymul opened this issue Feb 14, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@paddymul
Copy link
Owner

Checks

  • I have checked that this enhancement has not already been requested

How would you categorize this request. You can select multiple if not sure

Auto Cleaning, Performance

Enhancement Description

polars makes some autocleaning functionality very difficult, particularly comparing original to modfified across different dtypes. This makes it much more difficult to color and add tooltips to the resulting dataframe based on modifications.

pl.DataFrame({'a_raw':["not_parseable", "30"], 'a_cleaned': [None, 30]})
pl.select(pl.col("a_raw").eq("a_cleaned"))

which they shouldn't equal each other because their different types... but you cant do this either

pl.DataFrame({'a_raw': pl.Series(["not_parseable", 30], dtype=pl.Object), 'a_cleaned': [None, 30]})
pl.select(pl.col("a_raw").eq("a_cleaned"))

you can't even do this

pl.DataFrame({'a_raw':["not_parseable", 30], 'a_cleaned': [None, 30]})
pl.select(pl.struct(["a_raw", "a_cleaned"]).map_elements(lambda x: x[0] == x[1]))

Because you can't put an object into a struct

Pseudo Code Implementation

This might require writing some custom expressions. particularly a version of cast that returns a struct with the original

Prior Art

N/A

@paddymul paddymul added the enhancement New feature or request label Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant