-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add cat.contains
and cat.contains_any
#20582
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #20582 +/- ##
==========================================
+ Coverage 79.86% 79.90% +0.03%
==========================================
Files 1593 1596 +3
Lines 228268 228623 +355
Branches 2600 2608 +8
==========================================
+ Hits 182311 182678 +367
+ Misses 45360 45349 -11
+ Partials 597 596 -1 ☔ View full report in Codecov by Sentry. |
Any movement on this one? |
7744eed
to
63b98d5
Compare
1ffb831
to
a026db3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very minor nits aside. I think this is okay to merge.
Use the Aho-Corasick algorithm to find matches. | ||
|
||
Determines if any of the patterns are contained in the string representation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. I think these two lines should be swapped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this (with modifications) from the str.contains_any
description, but yes I agree. I've swapped, and also swapped for the str.contains_any
docstring as well since it's very minor. Updated in cface6b.
py-polars/polars/expr/categorical.py
Outdated
Use the Aho-Corasick algorithm to find matches. | ||
|
||
Determines if any of the patterns are contained in the string representation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[cfg(all(feature = "strings", feature = "regex"))] | ||
pub(super) fn contains(s: &Column, pat: &str, literal: bool, strict: bool) -> PolarsResult<Column> { | ||
let ca = s.categorical()?; | ||
if literal { | ||
try_apply_to_cats(ca, |s| s.contains_literal(pat)) | ||
} else { | ||
try_apply_to_cats(ca, |s| s.contains(pat, strict)) | ||
} | ||
} | ||
|
||
#[cfg(all(feature = "strings", feature = "find_many"))] | ||
fn contains_many(s: &[Column], ascii_case_insensitive: bool) -> PolarsResult<Column> { | ||
let ca = s[0].categorical()?; | ||
let patterns = s[1].str()?; | ||
try_apply_to_cats(ca, |s| { | ||
strings::contains_any(s, patterns, ascii_case_insensitive) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Followup to #20257. Again, expression input patterns not allowed in
cat.contains
, only string inputs. Forcontains_any
, expression inputs are allowed, because these are interpreted as a list of possible patterns.