Skip to content

Directly Sort Typed Value Encoding #1

@tobixdev

Description

@tobixdev

Is your feature request related to a problem? Please describe.

Sorting RDF terms is currently implemented by encoding RDF terms in a special Sortable encoding.
This causes the following problems:

  • We need to materialize this encoding (performance ?)
  • Comparing some numeric values is difficult as f64 (the numeric value of the Sortable encoding) can't represent all other numeric values correctly (especially once we go to arbitrary precision decimals/integers).

Describe the solution you'd like

Provide a custom comparison function that implements the SPARQL specification and directly implement this sort on the typed value encoding. However, currently DataFusion does not support that. We had a PoC pull request that went stale. Maybe we should try to revive that and rebase it to the newest version. Possibly, with some additional involvement of the DataFusion community.

Describe alternatives you've considered

Creating a custom Sort operator. I believe this should be the last resort. Maybe even keeping the Sortable encoding is preferable to this solution, as implementing sorting efficiently will be non-trivial. And we would need a DataFusion (and maybe arrow-rs) fork just for that.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    datafusionRequires changes in DataFusionenhancementNew feature or requestperformancePerformance-related task

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions