You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The docs should open by explaining what the basic use cases are for a lakehouse and delta-rs in particular. At present we have
This is the documentation for the native Rust/Python implementation of Delta Lake. It is based on the delta-rs Rust library and requires no Spark or JVM dependencies. For the PySpark implementation, see delta-spark instead.
This module provides the capability to read, write, and manage Delta Lake tables with Python or Rust without Spark or Java. It uses Apache Arrow under the hood, so is compatible with other Arrow-native or integrated libraries such as pandas, DuckDB, and Polars.
This assumes knowledge of Delta Lake and indeed what a lakehouse is.
Proposed approach
The docs should open with a succinct paragraph that explains what deltalake is in a way that is understandable to anyone. Polars got a lot of feedback on their intro being too technical and ended up (after a lot of thought) with this:
Polars is an open-source library for data manipulation, known for being one of the fastest data processing solutions on a single machine. It features a well-structured, typed API that is both expressive and easy to use.
As a first draft I propose these as the opening paras for deltalake:
deltalake is an open source library for managing tabular datasets that evolve over time. With deltalake you can add, delete or overwrite rows in a dataset as new data arrives and even time travel back to previous versions of a dataset. deltalake can be used to manage data stored on a local file system or in the cloud. deltalake integrates with data manipulation libraries such as Pandas, Polars, DuckDB and DataFusion.
deltalake is an example of a lakehouse approach to managing data storage. With the lakehouse approach you manage your datasets with a DeltaTable object and then deltalake manages the underlying files. With a DeltaTable your data is stored in Parquet files while deltalake stores metadata about the DeltaTable in a set of JSON files called a transaction log.
deltalake is a Rust-based re-implementation of the DeltaLake lakehouse protocol developed by DataBricks. The deltalake library has APIs in Rust and Python. The deltalake library implementation has no dependencies on Java, Spark or DataBricks.
The text was updated successfully, but these errors were encountered:
Description
The docs should open by explaining what the basic use cases are for a lakehouse and delta-rs in particular. At present we have
This is the documentation for the native Rust/Python implementation of Delta Lake. It is based on the delta-rs Rust library and requires no Spark or JVM dependencies. For the PySpark implementation, see delta-spark instead.
This module provides the capability to read, write, and manage Delta Lake tables with Python or Rust without Spark or Java. It uses Apache Arrow under the hood, so is compatible with other Arrow-native or integrated libraries such as pandas, DuckDB, and Polars.
This assumes knowledge of Delta Lake and indeed what a lakehouse is.
Proposed approach
The docs should open with a succinct paragraph that explains what deltalake is in a way that is understandable to anyone. Polars got a lot of feedback on their intro being too technical and ended up (after a lot of thought) with this:
Polars is an open-source library for data manipulation, known for being one of the fastest data processing solutions on a single machine. It features a well-structured, typed API that is both expressive and easy to use.
As a first draft I propose these as the opening paras for deltalake:
deltalake
is an open source library for managing tabular datasets that evolve over time. Withdeltalake
you can add, delete or overwrite rows in a dataset as new data arrives and even time travel back to previous versions of a dataset.deltalake
can be used to manage data stored on a local file system or in the cloud.deltalake
integrates with data manipulation libraries such as Pandas, Polars, DuckDB and DataFusion.deltalake
is an example of a lakehouse approach to managing data storage. With the lakehouse approach you manage your datasets with aDeltaTable
object and thendeltalake
manages the underlying files. With aDeltaTable
your data is stored in Parquet files whiledeltalake
stores metadata about theDeltaTable
in a set of JSON files called a transaction log.deltalake
is a Rust-based re-implementation of the DeltaLake lakehouse protocol developed by DataBricks. Thedeltalake
library has APIs in Rust and Python. Thedeltalake
library implementation has no dependencies on Java, Spark or DataBricks.The text was updated successfully, but these errors were encountered: