-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add cat.len_chars
and cat.len_bytes
#20211
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #20211 +/- ##
=======================================
Coverage 79.62% 79.63%
=======================================
Files 1565 1565
Lines 218187 218243 +56
Branches 2475 2475
=======================================
+ Hits 173734 173794 +60
+ Misses 43886 43882 -4
Partials 567 567 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @mcrumiller. I've left some comments.
T: PolarsDataType, | ||
{ | ||
let ca = s.categorical()?; | ||
let (categories, phys) = match &**ca.get_rev_map() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this move into its own function, that saves monomorphizaton bloat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this resolve it? 2b9edc3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, great!
}; | ||
|
||
// Apply function to categories | ||
let categories = StringChunked::with_chunk(PlSmallStr::EMPTY, categories.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should take the name of s
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let categories = StringChunked::with_chunk(PlSmallStr::EMPTY, categories.clone()); | ||
let result = op(&categories).into_series(); | ||
|
||
let out = result.take(phys.idx()?)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can do a take_unchecked
.
@@ -171,6 +171,21 @@ impl PyStringFunction { | |||
} | |||
} | |||
|
|||
#[pyclass(name = "CategoricalFunction", eq)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove these. I don't think we need those for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted the whole file, 0a83dbe
@@ -793,8 +808,16 @@ pub(crate) fn into_py(py: Python<'_>, expr: &AExpr) -> PyResult<PyObject> { | |||
FunctionExpr::BinaryExpr(_) => { | |||
return Err(PyNotImplementedError::new_err("binary expr")) | |||
}, | |||
FunctionExpr::Categorical(_) => { | |||
return Err(PyNotImplementedError::new_err("categorical expr")) | |||
FunctionExpr::Categorical(catfun) => match catfun { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can remain NotImplemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted the whole file, 0a83dbe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. 👍
cat.str_len
and cat.str_bytes
cat.len_chars
and cat.len_bytes
Good stuff. Bringing I'd certainly be happy to see more fast-path |
@alexander-beedie as per the #9773 discussion I don't think we will get most string ops in the |
I think we should just map |
+1 for starts_with/ends_with/contains |
@connor-elliott that should now be an easy add, I'll take a look tonight when I'm home from work. |
The reason we often prefer categoricals is because they take up about 1/4 the space (4 bytes per element versus 16), not necessarily for performance reasons. In many of these cases, it would really nice to have a |
And that will be fixed by streaming. Where categoricals are yet again problematic and complex. They seem like a good solution. But are a band aid. There are better solutions;) |
This adds a fast path for operations that can be performed on the categories of a categorical series. I've added
len_bytes
andlen_chars
for now. If desired, it would be trivial to add a few others such asstarts_with
andends_with
.These require the
strings
feature because they dispatch tostr.len_bytes
andstr.len_chars
.