Skip to content

Commit

Permalink
docs: update best practices
Browse files Browse the repository at this point in the history
  • Loading branch information
MrPowers authored and ion-elgreco committed Feb 19, 2024
1 parent 8fde83e commit a67ed9c
Showing 1 changed file with 1 addition and 11 deletions.
12 changes: 1 addition & 11 deletions docs/delta-lake-best-practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Hive-style partitioning also has some significant downsides.

* It’s only suitable for low-cardinality columns.
* It can create many small files, especially if you use the wrong partition key or frequently update the Delta table.
* It can cause some queries that don’t rely on the partition key to run slower (because of the excessive number of small files)
* It can cause some queries that don’t rely on the partition key to run slower (because of the excessive number of small files). A large number of small files is problematic for I/O throughput.

Hive-style partitioning can be a great data management tactic and a fantastic option for many Delta tables. Beware of the downsides before partitioning your tables.

Expand Down Expand Up @@ -80,16 +80,6 @@ You only need to vacuum when you perform operations that mark files for removal

Create a good vacuum strategy for your tables to minimize your storage costs.

## Registering tables in a metastore/catalog

You can register Delta tables in a metastore or catalog, making them easier to access.

Suppose you are working on a team with data engineers building ETL pipelines with Delta Lake. The data analysts want to query these tables via a SQL interface.

It’s nice for the data engineers to register the Delta tables in a catalog so the data analysts can easily query them by name. It’s easier for an analyst to query the `students` table than having to `/know/the/path/for/every/table/like/students`.

It’s good to register Delta tables in a catalog whenever a catalog is readily available in your production environment, making querying your tables easier.

## Delta Lake best practices to minimize costs

Delta Lake helps you minimize costs in many ways:
Expand Down

0 comments on commit a67ed9c

Please sign in to comment.