Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: update Readme #1440

Merged
merged 21 commits into from
Sep 15, 2023
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 0 additions & 104 deletions README.adoc

This file was deleted.

187 changes: 187 additions & 0 deletions README.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of adding a "Powered By" section and listing Apache Arrow and DataFusion?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 especially from the context that we're updating delta.rs to keep up with them, eh?! ;)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we then also submit a PR to the official arrow "pwered_by" section?

https://arrow.apache.org/powered_by/

Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
<p align="center">
<a href="https://delta.io/">
<img src="https://github.com/delta-io/delta-rs/blob/main/logo.png?raw=true" alt="delta-rs logo" height="250">
</a>
</p>
<p align="center">
A native Rust library for Delta Lake, with bindings into Python
<br>
<a href="https://delta-io.github.io/delta-rs/python/">Python docs</a>
·
<a href="https://docs.rs/deltalake/latest/deltalake/">Rust docs</a>
·
<a href="https://github.com/delta-io/delta-rs/issues/new?template=bug_report.md">Report a bug</a>
·
<a href="https://github.com/delta-io/delta-rs/issues/new?template=feature_request.md">Request a feature</a>
·
<a href="https://github.com/delta-io/delta-rs/issues/1128">Roadmap</a>
<br>
<br>
<a href="https://pypi.python.org/pypi/deltalake">
<img alt="Deltalake" src="https://img.shields.io/pypi/l/deltalake.svg?style=flat-square&color=00ADD4&logo=apache">
</a>
<a target="_blank" href="https://github.com/delta-io/delta-rs" style="background:none">
<img src="https://img.shields.io/github/stars/delta-io/delta-rs?logo=github&color=F75101">
</a>
<a target="_blank" href="https://crates.io/crates/deltalake" style="background:none">
<img alt="Crate" src="https://img.shields.io/crates/v/deltalake.svg?style=flat-square&color=00ADD4&logo=rust" >
</a>
<a href="https://pypi.python.org/pypi/deltalake">
<img alt="Deltalake" src="https://img.shields.io/pypi/v/deltalake.svg?style=flat-square&color=F75101&logo=pypi" >
</a>
<a href="https://pypi.python.org/pypi/deltalake">
<img alt="Deltalake" src="https://img.shields.io/pypi/pyversions/deltalake.svg?style=flat-square&color=00ADD4&logo=python">
</a>
<a target="_blank" href="https://go.delta.io/slack">
<img alt="#delta-rs in the Delta Lake Slack workspace" src="https://img.shields.io/badge/slack-delta-blue.svg?logo=slack&style=flat-square&color=F75101">
</a>
Comment on lines +35 to +37
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add back a "Get involved section"?

Could list:

</p>

The Delta Lake project aims to unlock the power of the Deltalake for as many users and projects as possible
by providing native low level APIs aimed at developers and integrators, as well as a high level operations
API that lets you query, inspect, and operate your Delta Lake with ease.

| Source | Downloads | Installation Command | Docs |
| --------------------- | --------------------------------- | ----------------------- | --------------- |
| **[PyPi][pypi]** | [![Downloads][pypi-dl]][pypi] | `pip install deltalake` | [Docs][py-docs] |
| **[Crates.io][pypi]** | [![Downloads][crates-dl]][crates] | `cargo add deltalake` | [Docs][rs-docs] |

[pypi]: https://pypi.org/project/deltalake/
[pypi-dl]: https://img.shields.io/pypi/dm/deltalake?style=flat-square&color=00ADD4
[py-docs]: https://delta-io.github.io/delta-rs/python/
[rs-docs]: https://docs.rs/deltalake/latest/deltalake/
[crates]: https://crates.io/crates/deltalake
[crates-dl]: https://img.shields.io/crates/d/deltalake?color=F75101

## Table of contents

- [Quick Start](#quick-start)
- [Get Involved](#get-involved)
- [Integartions](#integrations)
- [Features](#features)

## Quick Start

The `deltalake` library aim to adopt familiar patterns from other libraries in data processing,
so getting started should look famililiar.

```py3
from deltalake import DeltaTable
from deltalake.write import write_deltalake
import pandas as pd

# write some data into a delta table
df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]})
write_deltalake("./data/delta", df)

# load data from delta table
dt = DeltaTable("./data/delta")
df2 = dt.to_pandas()

assert df == df2
```

The same table written can also be loaded using the core Rust crate:

```rs
use deltalake::{open_table, DeltaTableError};

#[tokio::main]
async fn main() -> Result<(), DeltaTableError> {
// open the table written in python
let table = open_table("./data/delta").await?;

// show all active files in the table
let files = table.get_files();
println!("{files}");

Ok(())
}
```

## Get Involved

We encourage you to reach out, and are [commited](https://github.com/delta-io/delta-rs/blob/main/CODE_OF_CONDUCT.md)
to provide a welcoming community.

- [Join us in our Slack workspace](https://go.delta.io/slack)
- [Report an issue](https://github.com/delta-io/delta-rs/issues/new?template=bug_report.md)
- Looking to contribute? See our [good first issues](https://github.com/delta-io/delta-rs/contribute).

## Integrations

Libraries and framewors that interoperate with delta-rs - in alphabetical order.

- [AWS SDK for Pandas](https://github.com/aws/aws-sdk-pandas)
- [ballista][ballista]
- [datafusion][datafusion]
- [Dask](https://github.com/dask-contrib/dask-deltatable)
- [datahub](https://datahubproject.io/)
- [DuckDB](https://duckdb.org/)
- [polars](https://www.pola.rs/)
- [Ray](https://github.com/delta-incubator/deltaray)

## Features

The following section outline some core features like supported [storage backends](#cloud-integrations)
and [operations](#supported-operations) that can be performed against tables. The state of implementation
of features outlined in the Delta [protocol][protocol] is also [tracked](#protocol-support-level).

### Cloud Integrations

| Storage | Rust | Python | Comment |
| -------------------- | :-------------------: | :-------------------: | ----------------------------------- |
| Local | ![done] | ![done] | |
| S3 - AWS | ![done] | ![done] | requires lock for concurrent writes |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we point to the docs to explain how to spin up DynamoDB for the lock?

| S3 - MinIO | ![done] | ![done] | requires lock for concurrent writes |
| S3 - R2 | ![done] | ![done] | requires lock for concurrent writes |
| Azure Blob | ![done] | ![done] | |
| Azure ADLS Gen2 | ![done] | ![done] | |
| Micorosft OneLake | [![open]][onelake-rs] | [![open]][onelake-rs] | |
| Google Cloud Storage | ![done] | ![done] | |

### Supported Operations

| Operation | Rust | Python | Description |
| --------------------- | :-----------------: | :-----------------: | ------------------------------------- |
| Create | ![done] | ![done] | Create a new table |
| Read | ![done] | ![done] | Read data from a table |
| Vacuum | ![done] | ![done] | Remove unused files and log entries |
| Delete - partitions | | ![done] | Delete a table partition |
| Delete - predicates | ![done] | | Delete data based on a predicate |
| Optimize - compaction | ![done] | ![done] | Harmonize the size of data file |
| Optimize - Z-order | ![done] | ![done] | Place similar data into the same file |
| Merge | [![open]][merge-rs] | [![open]][merge-py] | |
| FS check | ![done] | | Remove corrupted files from table |

### Protocol Support Level

| Writer Version | Requirement | Status |
| -------------- | --------------------------------------------- | :------------------: |
| Version 2 | Append Only Tables | [![open]][roadmap] |
| Version 2 | Column Invatiants | ![done] |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little typo here (Invatiants => Invariants)

rtyler marked this conversation as resolved.
Show resolved Hide resolved
| Version 3 | Enforce `delta.checkpoint.writeStatsAsJson` | [![open]][writer-rs] |
| Version 3 | Enforce `delta.checkpoint.writeStatsAsStruct` | [![open]][writer-rs] |
| Version 3 | CHECK constraints | [![open]][writer-rs] |
| Version 4 | Change Data Feed | |
| Version 4 | Generated Columns | |
| Version 5 | Column Mapping | |
| Version 6 | Identity Columns | |
| Version 7 | Table Features | |

| Reader Version | Requirement | Status |
| -------------- | ----------------------------------- | ------ |
| Version 2 | Collumn Mapping | |
| Version 3 | Table Features (requires reader V7) | |

[datafusion]: https://github.com/apache/arrow-datafusion
[ballista]: https://github.com/apache/arrow-ballista
[polars]: https://github.com/pola-rs/polars
[open]: https://cdn.jsdelivr.net/gh/Readme-Workflows/Readme-Icons@main/icons/octicons/IssueOpened.svg
[done]: https://cdn.jsdelivr.net/gh/Readme-Workflows/Readme-Icons@main/icons/octicons/IssueClosed.svg
wjones127 marked this conversation as resolved.
Show resolved Hide resolved
[roadmap]: https://github.com/delta-io/delta-rs/issues/1128
[merge-py]: https://github.com/delta-io/delta-rs/issues/1357
[merge-rs]: https://github.com/delta-io/delta-rs/issues/850
[writer-rs]: https://github.com/delta-io/delta-rs/issues/851
[onelake-rs]: https://github.com/delta-io/delta-rs/issues/1418
[protocol]: https://github.com/delta-io/delta/blob/master/PROTOCOL.md
7 changes: 5 additions & 2 deletions python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,12 @@ license = {file = "LICENSE.txt"}
requires-python = ">=3.7"
keywords = ["deltalake", "delta", "datalake", "pandas", "arrow"]
classifiers = [
"Development Status :: 3 - Alpha",
"Development Status :: 4 - Beta",
wjones127 marked this conversation as resolved.
Show resolved Hide resolved
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python :: 3 :: Only"
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11"
]
dependencies = [
"pyarrow>=7",
Expand Down