Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose filter APIs via Python #53

Closed
eddyxu opened this issue Jul 26, 2022 · 2 comments
Closed

Expose filter APIs via Python #53

eddyxu opened this issue Jul 26, 2022 · 2 comments
Labels
enhancement New feature or request python

Comments

@eddyxu
Copy link
Contributor

eddyxu commented Jul 26, 2022

Problem Statement

Lance format itself does support nested columns projection pushdown, filter/predicates pushdown as well as LIMIT / OFFSET pushdown (#45). However, the Apache Arrow ScanOptions and Duckdb Pyarrow integration do not fully support these optimizations yet (#46).

We could expose these filters via our Python and C++ API first to make them usable first.

Desired Behavior

Enrich Python / C++ API support for nested column projection pushdown, filter / predicates pushdown as well as limit/offset clause pushdown.

Proposed python API change:

# lance/__init__.py
def dataset(
   uri: str,
   columns: Optional[list[str]] = None,
   filters: Optional[pyarrow.Expression] = None,
   limit: Optional[int] = None,
   offset: int = 0
) -> pyarrow.dataset.Dataset:
    ...
@eddyxu eddyxu added enhancement New feature or request python labels Jul 26, 2022
@eddyxu
Copy link
Contributor Author

eddyxu commented Jul 28, 2022

So Arrow can not directly support the (VERY) nested column (list<struct>) push down #60 . We can use our own API to by-pass the limitation.

@eddyxu
Copy link
Contributor Author

eddyxu commented Aug 1, 2022

Done in #61

@eddyxu eddyxu closed this as completed Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python
Projects
None yet
Development

No branches or pull requests

1 participant