Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Added support for custom sort build_compare_fn #1016

Merged
merged 1 commit into from
May 28, 2022

Conversation

b41sh
Copy link
Contributor

@b41sh b41sh commented May 27, 2022

The build_compare function only supports build compare functions for common data types and will generate errors for other data types like DataType::Extension. We have some custom data types stored as DataType::Extension need to be sorted, but there is no proper way to implement them.

This PR adds the following functions to support the user implement custom build_compare function for their own data types.

pub fn lexsort_to_indices_impl<I: Index>(
    columns: &[SortColumn],
    limit: Option<usize>,
    build_compare_fn: &dyn Fn(&dyn Array, &dyn Array) -> Result<DynComparator>,
) -> Result<PrimitiveArray<I>> {
}

pub(crate) fn build_compare_impl(
    array: &dyn Array,
    sort_option: SortOptions,
    build_compare_fn: &dyn Fn(&dyn Array, &dyn Array) -> Result<DynComparator>,
) -> Result<DynComparator> {
}

pub fn build_comparator_impl<'a>(
    pairs: &'a [(&'a [&'a dyn Array], &SortOptions)],
    build_compare_fn: &dyn Fn(&dyn Array, &dyn Array) -> Result<DynComparator>,
) -> Result<Comparator<'a>> {
}

@codecov
Copy link

codecov bot commented May 27, 2022

Codecov Report

Merging #1016 (a4133b7) into main (b5bbe9f) will increase coverage by 0.00%.
The diff coverage is 80.00%.

@@           Coverage Diff           @@
##             main    #1016   +/-   ##
=======================================
  Coverage   71.59%   71.59%           
=======================================
  Files         359      359           
  Lines       19879    19890   +11     
=======================================
+ Hits        14232    14240    +8     
- Misses       5647     5650    +3     
Impacted Files Coverage Δ
src/compute/sort/mod.rs 26.66% <ø> (ø)
src/compute/sort/lex_sort.rs 68.29% <75.00%> (+1.16%) ⬆️
src/compute/merge_sort/mod.rs 87.50% <100.00%> (+0.13%) ⬆️
src/compute/arithmetics/time.rs 25.68% <0.00%> (-0.92%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b5bbe9f...a4133b7. Read the comment docs.

@jorgecarleitao
Copy link
Owner

Thanks a lot!

Just to understand, are the extension types using the same comparison as the datatype that they are wrapping, or is the comparison entirely different?

@b41sh
Copy link
Contributor Author

b41sh commented May 28, 2022

Thanks a lot!

Just to understand, are the extension types using the same comparison as the datatype that they are wrapping, or is the comparison entirely different?

The datatype is actually similar to JSON and stored as LargeBinary in arrow2. We define a set of comparison rules for it, which is entirely different from the inner datatype.

@jorgecarleitao jorgecarleitao changed the title sort support custom build_compare_fn Added support for custom sort build_compare_fn May 28, 2022
@jorgecarleitao jorgecarleitao added the feature A new feature label May 28, 2022
@jorgecarleitao jorgecarleitao merged commit 93bdde8 into jorgecarleitao:main May 28, 2022
@jorgecarleitao
Copy link
Owner

Thanks for the explanation. Got it. Yeap, makes a lot of sense this implementation, then!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature A new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants