Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: update Readme #1440

Merged
merged 21 commits into from
Sep 15, 2023
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
170 changes: 170 additions & 0 deletions README.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of adding a "Powered By" section and listing Apache Arrow and DataFusion?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 especially from the context that we're updating delta.rs to keep up with them, eh?! ;)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we then also submit a PR to the official arrow "pwered_by" section?

https://arrow.apache.org/powered_by/

Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
<p align="center">
<a href="https://delta.io/">
<img src="https://github.com/delta-io/delta-rs/blob/main/logo.png?raw=true" alt="delta-rs logo" height="250">
</a>
</p>
<p align="center">
A native Rust library for Delta Lake, with bindings into Python
<br>
<a href="https://delta-io.github.io/delta-rs/python/">Python docs</a>
·
<a href="https://docs.rs/deltalake/latest/deltalake/">Rust docs</a>
·
<a href="https://github.com/delta-io/delta-rs/issues/new?template=bug_report.md">Report a bug</a>
·
<a href="https://github.com/delta-io/delta-rs/issues/new?template=feature_request.md">Request a feature</a>
·
<a href="https://github.com/delta-io/delta-rs/issues/1128">Roadmap</a>
<br>
<br>
<a href="https://pypi.python.org/pypi/deltalake">
<img alt="Deltalake" src="https://img.shields.io/pypi/l/deltalake.svg?style=flat-square&color=00ADD4&logo=apache">
</a>
<a target="_blank" href="https://github.com/delta-io/delta-rs" style="background:none">
<img src="https://img.shields.io/github/stars/delta-io/delta-rs?logo=github&color=F75101">
</a>
<a target="_blank" href="https://crates.io/crates/deltalake" style="background:none">
<img alt="Crate" src="https://img.shields.io/crates/v/deltalake.svg?style=flat-square&color=00ADD4&logo=rust" >
</a>
<a href="https://pypi.python.org/pypi/deltalake">
<img alt="Deltalake" src="https://img.shields.io/pypi/v/deltalake.svg?style=flat-square&color=F75101&logo=pypi" >
</a>
<a href="https://pypi.python.org/pypi/deltalake">
<img alt="Deltalake" src="https://img.shields.io/pypi/pyversions/deltalake.svg?style=flat-square&color=00ADD4&logo=python">
</a>
<a target="_blank" href="https://go.delta.io/slack">
<img alt="#delta-rs in the Delta Lake Slack workspace" src="https://img.shields.io/badge/slack-delta-blue.svg?logo=slack&style=flat-square&color=F75101">
</a>
Comment on lines +35 to +37
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add back a "Get involved section"?

Could list:

</p>

The delta-rs project aims to unlock the power of the Deltalake for as many users and projects as possible
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update to "Delta Lake"

by providing native low level APIs aimed at developers and integrators, as well as a high level operations
API that lets you query, inspect, and operate your Deltalake with ease.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update to "Delta Lake"


| Source | Downloads | Installation Command | Docs |
| --------------------- | --------------------------------- | ----------------------- | --------------- |
| **[PyPi][pypi]** | [![Downloads][pypi-dl]][pypi] | `pip install deltalake` | [Docs][py-docs] |
| **[Crates.io][pypi]** | [![Downloads][crates-dl]][crates] | `cargo add deltalake` | [Docs][rs-docs] |

[pypi]: https://pypi.org/project/deltalake/
[pypi-dl]: https://img.shields.io/pypi/dm/deltalake?style=flat-square&color=00ADD4
[py-docs]: https://delta-io.github.io/delta-rs/python/
[rs-docs]: https://docs.rs/deltalake/latest/deltalake/
[crates]: https://crates.io/crates/deltalake
[crates-dl]: https://img.shields.io/crates/d/deltalake?color=F75101

## Table of contents

- [Quick Start](#quick-start)
- [Integartions](#integrations)
- [Features](#features)

## Quick Start

The deltalake library aim to adopt familiar patterns from other libraries in data processing,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest either "The Delta Lake library aims to..." or "The deltalake library aims to..."

so getting started should look famililiar.

```py3
from deltalake import DeltaTable
from deltalake.write import write_deltalake
import pandas as pd

# write some data into a delta table
df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]})
write_deltalake("./data/delta", df)

# load data from delta table
dt = DeltaTable("./data/delta")
df2 = dt.to_pandas()

assert df == df2
```

The same table written cal also be loaded using the core rust crate.
roeap marked this conversation as resolved.
Show resolved Hide resolved

```rs
use deltalake::{open_table, DeltaTableError};

#[tokio::main]
async fn main() -> Result<(), DeltaTableError> {
// open the table written in python
let table = open_table("./data/delta").await?;

// show all active files in the table
let files = table.get_files();
println!("{files}");

Ok(())
}
```

## Integrations

- [polars](https://www.pola.rs/)
- [datafusion][datafusion]
- [ballista][ballista]
- [DuckDB](https://duckdb.org/)
- [Dask](https://github.com/dask-contrib/dask-deltatable)
- [datahub](https://datahubproject.io/)
- [Ray](https://github.com/delta-incubator/deltaray)
- [AWS SDK for Pandas](https://github.com/aws/aws-sdk-pandas)

## Features

### Cloud Integrations

| Storage | Rust | Python | Comment |
| -------------------- | :-------------------: | :-------------------: | ----------------------------------- |
| Local | ![done] | ![done] | |
| S3 - AWS | ![done] | ![done] | requires lock for concurrent writes |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we point to the docs to explain how to spin up DynamoDB for the lock?

| S3 - MinIO | ![done] | ![done] | requires lock for concurrent writes |
| S3 - R2 | ![done] | ![done] | requires lock for concurrent writes |
| Azure Blob | ![done] | ![done] | |
| Azure ADLS Gen2 | ![done] | ![done] | |
| Micorosft OneLake | [![open]][onelake-rs] | [![open]][onelake-rs] | |
| Google Cloud Storage | ![done] | ![done] | |

### Supported Operations

| Operation | Rust | Python | Description |
| --------------------- | :-----------------: | :-----------------: | ------------------------------------- |
| Create | ![done] | ![done] | Create a new table |
| Read | ![done] | ![done] | Read data from a table |
| Vacuum | ![done] | ![done] | Remove unused files and log entries |
| Delete - partitions | | ![done] | Delete a table partition |
| Delete - predicates | ![done] | | Delete data based on a predicate |
| Optimize - compaction | ![done] | ![done] | Harmonize the size of data file |
| Optimize - Z-order | ![done] | | Place similar data into the same file |
| Merge | [![open]][merge-rs] | [![open]][merge-py] | |
| FS check | ![done] | | Remove corrupted files from table |

### Protocol Support Level

| Writer Version | Requirement | Status |
| -------------- | --------------------------------------------- | :------------------: |
| Version 2 | Append Only Tables | [![open]][roadmap] |
| Version 2 | Column Invatiants | ![done] |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little typo here (Invatiants => Invariants)

rtyler marked this conversation as resolved.
Show resolved Hide resolved
| Version 3 | Enforce `delta.checkpoint.writeStatsAsJson` | [![open]][writer-rs] |
| Version 3 | Enforce `delta.checkpoint.writeStatsAsStruct` | [![open]][writer-rs] |
| Version 3 | CHECK constraints | [![open]][writer-rs] |
| Version 4 | Change Data Feed | |
| Version 4 | Generated Columns | |
| Version 5 | Column Mapping | |
| Version 6 | Identity Columns | |
| Version 7 | Table Features | |

| Reader Version | Requirement | Status |
| -------------- | ----------------------------------- | ------ |
| Version 2 | Collumn Mapping | |
| Version 3 | Table Features (requires reader V7) | |

[datafusion]: https://github.com/apache/arrow-datafusion
[ballista]: https://github.com/apache/arrow-ballista
[polars]: https://github.com/pola-rs/polars
[open]: https://cdn.jsdelivr.net/gh/Readme-Workflows/Readme-Icons@main/icons/octicons/IssueOpened.svg
[done]: https://cdn.jsdelivr.net/gh/Readme-Workflows/Readme-Icons@main/icons/octicons/IssueClosed.svg
wjones127 marked this conversation as resolved.
Show resolved Hide resolved
[roadmap]: https://github.com/delta-io/delta-rs/issues/1128
[merge-py]: https://github.com/delta-io/delta-rs/issues/1357
[merge-rs]: https://github.com/delta-io/delta-rs/issues/850
[writer-rs]: https://github.com/delta-io/delta-rs/issues/851
[onelake-rs]: https://github.com/delta-io/delta-rs/issues/1418
wjones127 marked this conversation as resolved.
Show resolved Hide resolved
File renamed without changes.
7 changes: 5 additions & 2 deletions python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,12 @@ license = {file = "LICENSE.txt"}
requires-python = ">=3.7"
keywords = ["deltalake", "delta", "datalake", "pandas", "arrow"]
classifiers = [
"Development Status :: 3 - Alpha",
"Development Status :: 4 - Beta",
wjones127 marked this conversation as resolved.
Show resolved Hide resolved
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python :: 3 :: Only"
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11"
]
dependencies = [
"pyarrow>=7",
Expand Down