-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Is your feature request related to a problem? Please describe.
Sorting RDF terms is currently implemented by encoding RDF terms in a special Sortable encoding.
This causes the following problems:
- We need to materialize this encoding (performance ?)
- Comparing some numeric values is difficult as
f64(the numeric value of theSortableencoding) can't represent all other numeric values correctly (especially once we go to arbitrary precision decimals/integers).
Describe the solution you'd like
Provide a custom comparison function that implements the SPARQL specification and directly implement this sort on the typed value encoding. However, currently DataFusion does not support that. We had a PoC pull request that went stale. Maybe we should try to revive that and rebase it to the newest version. Possibly, with some additional involvement of the DataFusion community.
Describe alternatives you've considered
Creating a custom Sort operator. I believe this should be the last resort. Maybe even keeping the Sortable encoding is preferable to this solution, as implementing sorting efficiently will be non-trivial. And we would need a DataFusion (and maybe arrow-rs) fork just for that.
Additional context