"S3-to-Blob" is a tool for copying objects from an AWS S3 bucket into an Azure Blob Storage container. The solution is built on top of Azure Data Factory (ADF) v2 and utilizes its built-in S3 connector for accessing S3 buckets. However, ADFv2 does not support incremental data copy out of the box. S3-to-Blob extends ADFv2 to support performing incremental copying from the source bucket into the target container so that only deltas are considered and copied over the network between the AWS and Azure data centers.
- One-time Run: S3-to-Blob can be used as a one-time tool for copying files and other artifacts from S3 to Azure Blob Storage.
- Scheduled: It can be used for monitoring an AWS S3 bucket and keeping track of its contents. This is done by virtue of using ADFv2's scheduled triggers which can be scheduled to run on a reguler interval. Upon detecting a new object being uploaded into the S3 bucket, it will be copied into an Azure Storage account (blob-only or general purpose, v1 or v2).
You can deploy this solution by clicking on the following button:
- Once the deployment finishes (it takes about 2 minutes), go to the resource group, click on the Azure Data Factory resource that was created and then click on "Author & Monitor".
- Click on the "Author" button on the left navigation menu. This loads the editor.
- Expand the "Pipelines" list and click on the "MasterCopyFromAmazonS3ToAzureBlob" pipeline. Click on the "Debug" button to test it.
- Under triggers, locate the "SyncObjectsTrigger" trigger. Edit it and check the "Activated" checkbox (it's unchecked by default). This trigger is configured to run once every 1 hour.
- This assumes that you have access to an S3 account (via a key and a secret) but cannot make midifications to the AWS account like creating an AWS Lambda function to spy on an S3 bucket and trigger the copying to the Azure Storage account.
The ADF documentation specifies the required permissions. The following policy definition can be used as an example:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt00001",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion"
],
"Resource": [
"arn:aws:s3:::mybucket/*"
]
},
{
"Sid": "Stmt00002",
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:ListAllMyBuckets"
],
"Resource": [
"arn:aws:s3:::*"
]
}
]
}
ADF concepts: https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipelines-activities
Incremental data load: https://docs.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-overview