forked from delta-io/delta-rs
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: move dynamo docs into new docs page (delta-io#2093)
# Description Adds the dynamo docs into our new docs, within the python write_deltalake I am pointing to the guide since it's quite extensive and only for S3 users. @rtyler @dispanser
- Loading branch information
1 parent
5d020d4
commit 61ca275
Showing
5 changed files
with
78 additions
and
25 deletions.
There are no files selected for viewing
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
# Writing to S3 with a locking provider | ||
|
||
A locking mechanism is needed to prevent unsafe concurrent writes to a | ||
delta lake directory when writing to S3. | ||
|
||
### DynamoDB | ||
DynamoDB is the only available locking provider at the moment in delta-rs. To enable DynamoDB as the locking provider, you need to set the ``AWS_S3_LOCKING_PROVIDER`` to 'dynamodb' as a ``storage_options`` or as an environment variable. | ||
|
||
Additionally, you must create a DynamoDB table with the name ``delta_log`` | ||
so that it can be automatically recognized by delta-rs. Alternatively, you can | ||
use a table name of your choice, but you must set the ``DELTA_DYNAMO_TABLE_NAME`` | ||
variable to match your chosen table name. The required schema for the DynamoDB | ||
table is as follows: | ||
|
||
```json | ||
"Table": { | ||
"AttributeDefinitions": [ | ||
{ | ||
"AttributeName": "fileName", | ||
"AttributeType": "S" | ||
}, | ||
{ | ||
"AttributeName": "tablePath", | ||
"AttributeType": "S" | ||
} | ||
], | ||
"TableName": "delta_log", | ||
"KeySchema": [ | ||
{ | ||
"AttributeName": "tablePath", | ||
"KeyType": "HASH" | ||
}, | ||
{ | ||
"AttributeName": "fileName", | ||
"KeyType": "RANGE" | ||
} | ||
], | ||
} | ||
``` | ||
|
||
Here is an example writing to s3 using this mechanism: | ||
|
||
```python | ||
from deltalake import write_deltalake | ||
df = pd.DataFrame({'x': [1, 2, 3]}) | ||
storage_options = {'AWS_S3_LOCKING_PROVIDER': 'dynamodb', 'DELTA_DYNAMO_TABLE_NAME': 'custom_table_name'} | ||
write_deltalake('s3a://path/to/table', df, 'storage_options'= storage_options) | ||
``` | ||
|
||
This locking mechanism is compatible with the one used by Apache Spark. The `tablePath` property, denoting the root url of the delta table itself, is part of the primary key, and all writers intending to write to the same table must match this property precisely. In Spark, S3 URLs are prefixed with `s3a://`, and a table in delta-rs must be configured accordingly. | ||
|
||
The following code allows creating the necessary table from the AWS cli: | ||
|
||
```sh | ||
aws dynamodb create-table \ | ||
--table-name delta_log \ | ||
--attribute-definitions AttributeName=tablePath,AttributeType=S AttributeName=fileName,AttributeType=S \ | ||
--key-schema AttributeName=tablePath,KeyType=HASH AttributeName=fileName,KeyType=RANGE \ | ||
--provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 | ||
``` | ||
|
||
You can find additional information in the [delta-rs-documentation](https://docs.delta.io/latest/delta-storage.html#multi-cluster-setup), which also includes recommendations on configuring a time-to-live (TTL) for the table to avoid growing the table indefinitely. | ||
|
||
|
||
### Enable unsafe writes in S3 (opt-in) | ||
If for some reason you don't want to use dynamodb as your locking mechanism you can | ||
choose to set the `AWS_S3_ALLOW_UNSAFE_RENAME` variable to ``true`` in order to enable S3 unsafe writes. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters