-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arrow-ord: lt
and eq
for nested list
#5408
Conversation
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
I wonder if it might be more efficient to make use of DynComparator for this, especially for smaller list elements? |
I can apply DynComparator for non-nested part, for nested part, I did't find a good way to build something like |
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
arrow-ord/src/cmp.rs
Outdated
use arrow_schema::DataType::*; | ||
if let (List(_), List(_)) = (l_t, r_t) { | ||
// Process nested data types | ||
match op { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to construct the DynComparator once for l.values()
and r.values()
and then use the offsets to drive the comparison?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure about using l.values and offsets for the comparison. Instead, I find looping through the index with .value(i)
much clearer and straightforward for me.
arrow-ord/src/cmp.rs
Outdated
let values = values.finish(); | ||
Ok(Some(values)) | ||
} | ||
Op::Equal => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This duplication could be eliminated by extracting out a function that maps the Ordering result of DynComparator to the boolean result
@@ -198,12 +316,16 @@ fn compare_op(op: Op, lhs: &dyn Datum, rhs: &dyn Datum) -> Result<BooleanArray, | |||
let r = r_v.map(|x| x.values().as_ref()).unwrap_or(r); | |||
let r_t = r.data_type(); | |||
|
|||
if l_t != r_t || l_t.is_nested() { | |||
if l_t != r_t { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now allows hitting unreachable in the below code block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I return an error in process_nested
, so the nested type that is NYI will not go down there
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
28c7fc6
to
ac7e789
Compare
arrow-ord/src/cmp.rs
Outdated
let l = l.as_list::<i32>(); | ||
let r = r.as_list::<i32>(); | ||
|
||
Box::new(move |i, j| compare_list(l.value(i), r.value(j))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got a lifetime error when moving this to fn that returns Dyncompartor. To avoid getting it too complex, I leave it here.
error: lifetime may not live long enough
--> arrow-ord/src/ord.rs:36:9
|
32 | fn compare_ist(l: &dyn Array, r: &dyn Array) -> DynComparator {
| - let's call the lifetime of this reference `'2`
...
36 | Box::new(move |i, j| compare_list(l.value(i), r.value(j)))
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ returning this value requires that `'2` must outlive `'static`
|
help: to declare that the trait object captures data from argument `r`, you can add a lifetime parameter `'a` in the type alias
|
30 | pub type DynComparator<'a> = Box<dyn Fn(usize, usize) -> Ordering + Send + Sync + 'a>;
| ++++ ++++
error: could not compile `arrow-ord` (lib test) due to 2 previous errors
I will try to find some time over the next few days to see if i can't simplify this |
@@ -702,4 +784,216 @@ mod tests { | |||
|
|||
neq(&col.slice(0, col.len() - 1), &col.slice(1, col.len() - 1)).unwrap(); | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to see some tests of
- Scalar arguments
- Nulls masking non-empty slices
- DictionaryArray of ListArray (returning an error would be perfectly valid for this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
Ok, so I have had a play with this and I think I would recommend the following course of action:
I would recommend doing these as separate PRs. This seems like the only way to support arbitrarily nested lists, including lists of lists. |
Actually I remember now why DynComparator lacks support for nested types, the ordering of nulls is not well defined. Tricky... 🤔 Edit: Filed #5426 with how I think we should proceed, feedback welcome |
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
I find a way to build dyncompare for list, but nulls handling is not considered yet. |
Which issue does this PR close?
Ref #5407
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?