Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move roadmap to a pinned issue #1129

Merged
merged 1 commit into from
Feb 5, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 2 additions & 41 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,45 +4,6 @@ Development on this project is mostly driven by volunteer contributors. We welco

Please take note of our [code of conduct](CODE_OF_CONDUCT.md).

## Good first issues
If you want to start contributing, first look at our good first issues: https://github.com/delta-io/delta-rs/contribute

These projects have been scoped and are available for new contributors to work on. If you are new to the project, consider picking one of these up. You can comment in the issue (linked) to claim it and ask any questions there; we are happy to guide your implementation.

* [ ] Write Rust user guide (#831): get the Rust docs homepage up to par with the Python user guide.
* [ ] Publish Python bindings to Conda (#735)
* [ ] Implement batch delete for faster vacuum (#408 / #407)
* [ ] Improve Python writer Pandas support (#686)
* [ ] Simple delete transactions (#832)

This list was last updated in September 2022. See more good first issues here: https://github.com/delta-io/delta-rs/contribute

And if you have an idea, feel free to open a new issue to discuss it.

## Roadmap 2022H2

These projects have been scoped and a contributor(s) who will work on them. They are also reflected on the official Delta Lake roadmap: https://delta.io/roadmap/

* [x] (P0) **Support V2 writer protocol** (#575; @wjones127) 2022H2
* [ ] (P1) **Data Acceptance Tests running** in CI (@wjones127) P0 for 2023H1 & P1 for 2022H2
* [ ] (P0) **First pass at Rust documentation**: Create some basic documentation that shows how to create a Delta Lake, how to read a Delta Lake into a Polars DataFrame, and how to append a Polars DataFrame to an existing Delta Lake. Publish the Rust documentation, so it’s easily accessible. (@MrPowers)
* [ ] (P0) **Blogging for Rust community**: grow Delta Rust awareness in the Rust community by blogging and posting in the Rust subreddit (@MrPowers)
* [ ] (P0) **Fully protocol compliant optimistic commit protocol (conflict resolution)**. - (#632) (@roeap) 2023H1 P0 for 2023H1 & P1 for 2022H2
* [ ] (P0) **Refactor Rust writer API to be flexible for others wishing to build upon delta-rs.** (#851) Separate three layers: First, a transaction layer for those who want to use their own Parquet writer to handle data writes (you write data; we write transaction); second, a parametrized writer layer, who want to use their own query engine but will use the built-in data writer (you verify data; we write data and transaction); third, a DataFusion-based writer that handles everything (verification, writing, transaction). Ideally we can design this so that delta-rs does most of the work enforcing the Delta Lake protocol (make it hard to mess up the table; possibly provide a test suite). (@wjones127, @roeap) 2023H1
* [ ] (P2) **Release improvements:** Commit to releasing more often; On Rust, at least in cadence with arrow-rs and datafusion. Enhance the change log. Try out dev releases of Python module. (@wjones127, @roeap) 2022H1

## Backlog

These issues are planned eventually, but need investigation or to be unblocked by work currently in the roadmap.

* Polars read support: make it possible to read a Delta Lake into a Polars DataFrame
* Polars write support: make it possible to write a Polars DataFrame to a Delta Lake
* Support operations that require rewriting data files (these are blocked on #851):
* Delete transactions via Rust: Implement proper Delta delete row operations via the Rust interface. Something like this: deltaTable.delete("birthDate < '1955-01-01'")
* Delete transactions via Python: Python bindings for the Delta delete row operations
* Merge transactions via Rust: Full support for merge operations via Rust
* Merge transactions via Python: Python bindings for merge operations
* Airbyte connector support to read/write Delta Table as a source using the Python binding CDK source and destination (Airbyte issue: 16322)
* Blog post on how to use delta-rs Airbyte connector
* Add Z-ORDER to the Rust bindings
* (P2) Some sort of file caching layer (#769). It would be equivalent to the Databricks disk cache, but as an object store (if that is viable)..
* (P2) Substrait support (#830): Generate query plans for operations requiring current table data as substrait (https://substrait.io) plans to integrate with different query backends
If you want to contribute something more substantial, see our "Projects seeking contributors" section on our roadmap: https://github.com/delta-io/delta-rs/issues/1128