- Initial migration of large-volume DynamoDB table (tens of GB and above) from AWS global region to China regions (as AWS China regions are separated from AWS commercial parition, aka global regions, the common practice of DynamoDB global table is not available for cross-paritition migration)
- DynamoDB cross region replication provides a nice example of continous bi-directional replication but doesn't cover the inital migration. This PoC can be combined with the former as a total solution for DynamoDB migration and replication.
- Due to data privacy law, filtering on the source tables is required before migration
- Cross-border Internet network is not stable and the migration step over Internet requires reliable architecture
The general approach is to filter and export to S3 in source global region and replicate to China. In target region, import the data from S3 to DynamoDB.
- Using Glue, crawl and filter DynamoDB table to export to S3 in global region
- Replicate to S3 in China using S3 Plugin of Data Transfer Hub
- Use Glue to load data from S3 to DynamoDB
We will use the sample data that's generated from the DynamoDB cross region replication which is a fake user profile table and in every item of the table, there is field "country". The Glue job will filter on this field.
-
Glue Crawler
Setup Glue Crawler on US region to crawl over source DynamoDB table
The catalog will be as below
-
Upload ETL script
git clone https://github.com/yizhizoe/dynamodb-init-migration.git aws s3 cp source_ddb_filter.py s3://aws-glue-scripts-{account_id}-us-west-2/ --region us-west-2
-
Set up and run Glue ETL job
Create Glue job as below and specify the script S3 path to "s3://aws-glue-scripts-{account_id}-us-west-2/source_ddb_filter.py"
In Job parameter, input the key "export_s3_bucket" and the bucket for export in US region. Set the appropriate worker number and use Glue 2.0
After creating the job, run the job directly.
Follow the deployment guide to set up Data Transfer Hub. Add the replication from the source <export_s3_path> to the s3 path <transfer_target_s3_path> in China region. The Data Transfer Hub transfers Amazon S3 objects between AWS China regions and Global regions has auto retry mechanism and error handling so as to provide high resiliency in data transfer over Internet. As it also supports incremental data transfer, the setup for s3 replication can be one-time setup and you can reuse the export path for multiple tables replication.
-
Set up Glue crawler in China region to crawl over the s3 target <transfer_target_s3_path>. The role should have both AWSGlueServicePolicy and access to the S3 target path.
-
The catalog should be similar to the one in US region.
-
Create the target DynamoDB table user_migrated_cn with the same Partition key and Sort Key as in US region. Set the Capacity mode to "On-demand".
-
Upload ETL script
aws s3 cp dump_target_ddb.py s3://aws-glue-scripts-{account_id}-cn-north-1/ --region cn-north-1
-
Set up and run Glue ETL job
Create Glue job as below and specify the script S3 path to "s3://aws-glue-scripts-{account_id}-cn-north-1/dump_target_ddb.py"
In Job parameters, add "--target_ddb_table_name=user_migrated_cn"
-
Save and run the job. Note that Glue crawler is case insensitive so in this step, it's important to double check on the target item attribute names, e.g. we deliberated mapped "pk" to "PK" in the target table. After the job is finished, go to DynamoDB table "user_migrated_cn" and verify that the items are created and they all have attribute "country"= "China".
To further verify the item number, run "Get live item count".
- The PoC provides a serverless solution for one-time migration of DynamoDB table from AWS global region to China. Combined with the Data Transfer Hub solution, it provides a reliable transfer conduit over cross-border Internet.
- Instead of using EMR Hive for importing/exporting S3 data to DynamoDB, this is more light-weight for developer-based application team to migrate DynamoDB tables on their own.
- As the solution is based on Glue, serverless ETL service, the cost of Glue running job for migration is minimum.