Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert datafusion table scan filter into iceberg table scan' filter. #585

Closed
liurenjie1024 opened this issue Aug 26, 2024 · 8 comments · Fixed by #588
Closed

Convert datafusion table scan filter into iceberg table scan' filter. #585

liurenjie1024 opened this issue Aug 26, 2024 · 8 comments · Fixed by #588
Labels
enhancement New feature or request
Milestone

Comments

@liurenjie1024
Copy link
Contributor

This would help to pruning unnecessary data when doing table scan from datafusion.

@liurenjie1024 liurenjie1024 added the enhancement New feature or request label Aug 26, 2024
@FANNG1
Copy link
Contributor

FANNG1 commented Aug 26, 2024

@liurenjie1024 have you worked on this? if not, I'd like to take it.

@liurenjie1024
Copy link
Contributor Author

Hi @FANNG1 Thanks for your interest. @a-agmon also volunteered to work on it.

@a-agmon
Copy link
Contributor

a-agmon commented Aug 26, 2024

Hi @FANNG1 ,
I started looking into this and asked @liurenjie1024 to open the issue.
I will update here and would be grateful for your review and advice.

@FANNG1
Copy link
Contributor

FANNG1 commented Aug 26, 2024

got it, @a-agmon please go ahead

@a-agmon
Copy link
Contributor

a-agmon commented Aug 26, 2024

Hi
@sdd , @FANNG1 , @liurenjie1024.
I just want to make sure im in the right direction with this and would be happy to hear your feedback on this.
The main functionality will be implemented in IcebergTableScan here.
we can pass the filters &[Expr] via the IcebergTableScan::new function, and then convert them into Iceberg Predicates. Something like this:

  pub(crate) fn new(table: Table, schema: ArrowSchemaRef, filters: &[Expr]) -> Self {
      let plan_properties = Self::compute_properties(schema.clone());
      let predicates = Self::convert_to_predicates(filters);
      Self {
          table,
          schema,
          plan_properties,
          predicates,
      }
  }

Then the predicates needs to be applied as followes:
The impl of the ExecutionPlan::execute() function, calls get_batch_stream(), which in turn calls table.scan() link
In this function, the predicate can be applied to the TableScanBuilder

Another important issue is that it seems that according to DataFusion docs, in order to get the filters passed to the scan() function, we need to impl the TableProvider::supports_filters_pushdown to return the filters we can push down [docs] (https://docs.rs/datafusion/41.0.0/datafusion/catalog/trait.TableProvider.html#method.supports_filters_pushdown)

what do you think? does this make sense?
(as a start I think I will try to implement just BinaryExpressions and see from there)

@FANNG1
Copy link
Contributor

FANNG1 commented Aug 27, 2024

LGTM

@liurenjie1024
Copy link
Contributor Author

Hi @a-agmon

Then the predicates needs to be applied as followes:
The impl of the ExecutionPlan::execute() function, calls get_batch_stream(), which in turn calls table.scan() link
In this function, the predicate can be applied to the TableScanBuilder

Yes, that's I think how it should work.

Another important issue is that it seems that according to DataFusion docs, in order to get the filters passed to the scan() function, we need to impl the TableProvider::supports_filters_pushdown to return the filters we can push down [docs] (https://docs.rs/datafusion/41.0.0/datafusion/catalog/trait.TableProvider.html#method.supports_filters_pushdown)

what do you think? does this make sense?
(as a start I think I will try to implement just BinaryExpressions and see from there)

Yes, it makes sense to me.

@a-agmon
Copy link
Contributor

a-agmon commented Aug 29, 2024

Hi,

I think this is ready for review (will also ping you on slack).
#588

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
3 participants