delta-io · ion-elgreco · Nov 22, 2024 · Nov 19, 2024 · Nov 21, 2024 · Nov 21, 2024
diff --git a/docs/index.md b/docs/index.md
@@ -1,22 +1,78 @@
-# The deltalake package
+`deltalake` is an open source library that makes working with tabular datasets easier, more robust and more performant. With deltalake you can add, remove or update rows in a dataset as new data arrives. You can time travel back to earlier versions of a dataset. You can optimize dataset storage from small files to large files. 
 
-This is the documentation for the native Rust/Python implementation of Delta Lake. It is based on the delta-rs Rust library and requires no Spark or JVM dependencies. For the PySpark implementation, see [delta-spark](https://docs.delta.io/latest/api/python/spark/index.html) instead.
+`deltalake` can be used to manage data stored on a local file system or in the cloud. `deltalake` integrates with data manipulation libraries such as Pandas, Polars, DuckDB and DataFusion.
 
-This module provides the capability to read, write, and manage [Delta Lake](https://delta.io/) tables with Python or Rust without Spark or Java. It uses [Apache Arrow](https://arrow.apache.org/) under the hood, so is compatible with other Arrow-native or integrated libraries such as [pandas](https://pandas.pydata.org/), [DuckDB](https://duckdb.org/), and [Polars](https://www.pola.rs/).
+`deltalake` uses a lakehouse framework for managing datasets. With this lakehouse approach you manage your datasets with a `DeltaTable` object and then `deltalake` takes care of the underlying files. Within a `DeltaTable` your data is stored in high performance Parquet files while metadata is stored in a set of JSON files called a transaction log.
 
-## Important terminology
+`deltalake` is a Rust-based re-implementation of the DeltaLake protocol originally developed at DataBricks. The `deltalake` library has APIs in Rust and Python. The `deltalake` implementation has no dependencies on Java, Spark or DataBricks.
 
-* "Rust deltalake" refers to the Rust API of delta-rs (no Spark dependency)
-* "Python deltalake" refers to the Python API of delta-rs (no Spark dependency)
-* "Delta Spark" refers to the Scala implementation of the Delta Lake transaction log protocol.  This depends on Spark and Java.
 
-## Why implement the Delta Lake transaction log protocol in Rust and Scala?
+## Important terminology
 
-Delta Spark depends on Java and Spark, which is fine for many use cases, but not all Delta Lake users want to depend on these libraries.  delta-rs allows using Delta Lake in Rust or other native projects when using a JVM is often not an option.
+* `deltalake` refers to the Rust or Python API of delta-rs
+* "Delta Spark" refers to the Scala implementation of the Delta Lake transaction log protocol.  This depends on Spark and Java.
 
-Python deltalake lets you query Delta tables without depending on Java/Scala.
+## Why implement the Delta Lake transaction log protocol in Rust?
+
+Delta Spark depends on Java and Spark, which is fine for many use cases, but not all Delta Lake users want to depend on these libraries.  `deltalake` allows you to manage your dataset using a Delta Lake approach without any Java or Spark dependencies.
+
+A `DeltaTable` on disk is simply a directory that stores metadata in JSON files and data in Parquet files.  
+
+## Quick start
+
+You can install `deltalake` in Python with `pip`
+```bash
+pip install deltalake
+```
+We create a Pandas `DataFrame` and write it to a `DeltaTable`:
+```python
+import pandas as pd
+from deltalake import DeltaTable,write_deltalake
+
+df = pd.DataFrame(
+    {
+        "id": [1, 2, 3],
+        "name": ["Aadhya", "Bob", "Chen"],
+    }
+)
+
+(
+    write_deltalake(
+        table_or_uri="delta_table_dir",
+        data=df,
+    )
+)
+```
+We create a `DeltaTable` object that holds the metadata for the Delta table:
+```python
+dt = DeltaTable("delta_table_dir")
+```
+We load the `DeltaTable` into a Pandas `DataFrame` with `to_pandas` on a `DeltaTable`:
+```python
+new_df = dt.to_pandas()
+```
+
+Or we can load the data into a Polars `DataFrame` with `pl.read_delta`:
+```python
+import polars as pl
+new_df = pl.read_delta("delta_table_dir")
+```
+
+Or we can load the data with DuckDB:
+```python
+import duckdb
+duckdb.query("SELECT * FROM delta_scan('./delta_table_dir')")
+```
+
+Or we can load the data with DataFusion:
+```python
+from datafusion import SessionContext
+
+ctx = SessionContext()
+ctx.register_dataset("my_delta_table", dt.to_pyarrow_dataset())
+ctx.sql("select * from my_delta_table")
+```
 
-Suppose you want to query a Delta table with pandas on your local machine.  Python deltalake makes it easy to query the table with a simple `pip install` command - no need to install Java.
 
 ## Contributing