-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: update Readme #1440
docs: update Readme #1440
Changes from 9 commits
ac2f511
1967dc3
7c0f098
334b204
bf1c7ad
e88e6bb
004db54
ff08646
88855c3
fbc34b3
1c71be1
bd0f9fe
026b55d
f61684d
03213e8
332c8c4
76356f3
e5803d0
f75b77c
fa149f1
a44f148
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,170 @@ | ||
<p align="center"> | ||
<a href="https://delta.io/"> | ||
<img src="https://github.com/delta-io/delta-rs/blob/main/logo.png?raw=true" alt="delta-rs logo" height="250"> | ||
</a> | ||
</p> | ||
<p align="center"> | ||
A native Rust library for Delta Lake, with bindings into Python | ||
<br> | ||
<a href="https://delta-io.github.io/delta-rs/python/">Python docs</a> | ||
· | ||
<a href="https://docs.rs/deltalake/latest/deltalake/">Rust docs</a> | ||
· | ||
<a href="https://github.com/delta-io/delta-rs/issues/new?template=bug_report.md">Report a bug</a> | ||
· | ||
<a href="https://github.com/delta-io/delta-rs/issues/new?template=feature_request.md">Request a feature</a> | ||
· | ||
<a href="https://github.com/delta-io/delta-rs/issues/1128">Roadmap</a> | ||
<br> | ||
<br> | ||
<a href="https://pypi.python.org/pypi/deltalake"> | ||
<img alt="Deltalake" src="https://img.shields.io/pypi/l/deltalake.svg?style=flat-square&color=00ADD4&logo=apache"> | ||
</a> | ||
<a target="_blank" href="https://github.com/delta-io/delta-rs" style="background:none"> | ||
<img src="https://img.shields.io/github/stars/delta-io/delta-rs?logo=github&color=F75101"> | ||
</a> | ||
<a target="_blank" href="https://crates.io/crates/deltalake" style="background:none"> | ||
<img alt="Crate" src="https://img.shields.io/crates/v/deltalake.svg?style=flat-square&color=00ADD4&logo=rust" > | ||
</a> | ||
<a href="https://pypi.python.org/pypi/deltalake"> | ||
<img alt="Deltalake" src="https://img.shields.io/pypi/v/deltalake.svg?style=flat-square&color=F75101&logo=pypi" > | ||
</a> | ||
<a href="https://pypi.python.org/pypi/deltalake"> | ||
<img alt="Deltalake" src="https://img.shields.io/pypi/pyversions/deltalake.svg?style=flat-square&color=00ADD4&logo=python"> | ||
</a> | ||
<a target="_blank" href="https://go.delta.io/slack"> | ||
<img alt="#delta-rs in the Delta Lake Slack workspace" src="https://img.shields.io/badge/slack-delta-blue.svg?logo=slack&style=flat-square&color=F75101"> | ||
</a> | ||
Comment on lines
+35
to
+37
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we add back a "Get involved section"? Could list:
|
||
</p> | ||
|
||
The delta-rs project aims to unlock the power of the Deltalake for as many users and projects as possible | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please update to "Delta Lake" |
||
by providing native low level APIs aimed at developers and integrators, as well as a high level operations | ||
API that lets you query, inspect, and operate your Deltalake with ease. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please update to "Delta Lake" |
||
|
||
| Source | Downloads | Installation Command | Docs | | ||
| --------------------- | --------------------------------- | ----------------------- | --------------- | | ||
| **[PyPi][pypi]** | [![Downloads][pypi-dl]][pypi] | `pip install deltalake` | [Docs][py-docs] | | ||
| **[Crates.io][pypi]** | [![Downloads][crates-dl]][crates] | `cargo add deltalake` | [Docs][rs-docs] | | ||
|
||
[pypi]: https://pypi.org/project/deltalake/ | ||
[pypi-dl]: https://img.shields.io/pypi/dm/deltalake?style=flat-square&color=00ADD4 | ||
[py-docs]: https://delta-io.github.io/delta-rs/python/ | ||
[rs-docs]: https://docs.rs/deltalake/latest/deltalake/ | ||
[crates]: https://crates.io/crates/deltalake | ||
[crates-dl]: https://img.shields.io/crates/d/deltalake?color=F75101 | ||
|
||
## Table of contents | ||
|
||
- [Quick Start](#quick-start) | ||
- [Integartions](#integrations) | ||
- [Features](#features) | ||
|
||
## Quick Start | ||
|
||
The deltalake library aim to adopt familiar patterns from other libraries in data processing, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggest either "The Delta Lake library aims to..." or "The |
||
so getting started should look famililiar. | ||
|
||
```py3 | ||
from deltalake import DeltaTable | ||
from deltalake.write import write_deltalake | ||
import pandas as pd | ||
|
||
# write some data into a delta table | ||
df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]}) | ||
write_deltalake("./data/delta", df) | ||
|
||
# load data from delta table | ||
dt = DeltaTable("./data/delta") | ||
df2 = dt.to_pandas() | ||
|
||
assert df == df2 | ||
``` | ||
|
||
The same table written cal also be loaded using the core rust crate. | ||
roeap marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```rs | ||
use deltalake::{open_table, DeltaTableError}; | ||
|
||
#[tokio::main] | ||
async fn main() -> Result<(), DeltaTableError> { | ||
// open the table written in python | ||
let table = open_table("./data/delta").await?; | ||
|
||
// show all active files in the table | ||
let files = table.get_files(); | ||
println!("{files}"); | ||
|
||
Ok(()) | ||
} | ||
``` | ||
|
||
## Integrations | ||
|
||
- [polars](https://www.pola.rs/) | ||
- [datafusion][datafusion] | ||
- [ballista][ballista] | ||
- [DuckDB](https://duckdb.org/) | ||
- [Dask](https://github.com/dask-contrib/dask-deltatable) | ||
- [datahub](https://datahubproject.io/) | ||
- [Ray](https://github.com/delta-incubator/deltaray) | ||
- [AWS SDK for Pandas](https://github.com/aws/aws-sdk-pandas) | ||
|
||
## Features | ||
|
||
### Cloud Integrations | ||
|
||
| Storage | Rust | Python | Comment | | ||
| -------------------- | :-------------------: | :-------------------: | ----------------------------------- | | ||
| Local | ![done] | ![done] | | | ||
| S3 - AWS | ![done] | ![done] | requires lock for concurrent writes | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we point to the docs to explain how to spin up DynamoDB for the lock? |
||
| S3 - MinIO | ![done] | ![done] | requires lock for concurrent writes | | ||
| S3 - R2 | ![done] | ![done] | requires lock for concurrent writes | | ||
| Azure Blob | ![done] | ![done] | | | ||
| Azure ADLS Gen2 | ![done] | ![done] | | | ||
| Micorosft OneLake | [![open]][onelake-rs] | [![open]][onelake-rs] | | | ||
| Google Cloud Storage | ![done] | ![done] | | | ||
|
||
### Supported Operations | ||
|
||
| Operation | Rust | Python | Description | | ||
| --------------------- | :-----------------: | :-----------------: | ------------------------------------- | | ||
| Create | ![done] | ![done] | Create a new table | | ||
| Read | ![done] | ![done] | Read data from a table | | ||
| Vacuum | ![done] | ![done] | Remove unused files and log entries | | ||
| Delete - partitions | | ![done] | Delete a table partition | | ||
| Delete - predicates | ![done] | | Delete data based on a predicate | | ||
| Optimize - compaction | ![done] | ![done] | Harmonize the size of data file | | ||
| Optimize - Z-order | ![done] | | Place similar data into the same file | | ||
| Merge | [![open]][merge-rs] | [![open]][merge-py] | | | ||
| FS check | ![done] | | Remove corrupted files from table | | ||
|
||
### Protocol Support Level | ||
|
||
| Writer Version | Requirement | Status | | ||
| -------------- | --------------------------------------------- | :------------------: | | ||
| Version 2 | Append Only Tables | [![open]][roadmap] | | ||
| Version 2 | Column Invatiants | ![done] | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Little typo here (Invatiants => Invariants)
rtyler marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Version 3 | Enforce `delta.checkpoint.writeStatsAsJson` | [![open]][writer-rs] | | ||
| Version 3 | Enforce `delta.checkpoint.writeStatsAsStruct` | [![open]][writer-rs] | | ||
| Version 3 | CHECK constraints | [![open]][writer-rs] | | ||
| Version 4 | Change Data Feed | | | ||
| Version 4 | Generated Columns | | | ||
| Version 5 | Column Mapping | | | ||
| Version 6 | Identity Columns | | | ||
| Version 7 | Table Features | | | ||
|
||
| Reader Version | Requirement | Status | | ||
| -------------- | ----------------------------------- | ------ | | ||
| Version 2 | Collumn Mapping | | | ||
| Version 3 | Table Features (requires reader V7) | | | ||
|
||
[datafusion]: https://github.com/apache/arrow-datafusion | ||
[ballista]: https://github.com/apache/arrow-ballista | ||
[polars]: https://github.com/pola-rs/polars | ||
[open]: https://cdn.jsdelivr.net/gh/Readme-Workflows/Readme-Icons@main/icons/octicons/IssueOpened.svg | ||
[done]: https://cdn.jsdelivr.net/gh/Readme-Workflows/Readme-Icons@main/icons/octicons/IssueClosed.svg | ||
wjones127 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
[roadmap]: https://github.com/delta-io/delta-rs/issues/1128 | ||
[merge-py]: https://github.com/delta-io/delta-rs/issues/1357 | ||
[merge-rs]: https://github.com/delta-io/delta-rs/issues/850 | ||
[writer-rs]: https://github.com/delta-io/delta-rs/issues/851 | ||
[onelake-rs]: https://github.com/delta-io/delta-rs/issues/1418 |
wjones127 marked this conversation as resolved.
Show resolved
Hide resolved
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of adding a "Powered By" section and listing Apache Arrow and DataFusion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 especially from the context that we're updating delta.rs to keep up with them, eh?! ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we then also submit a PR to the official arrow "pwered_by" section?
https://arrow.apache.org/powered_by/