Skip to content
This repository has been archived by the owner on Sep 23, 2024. It is now read-only.

AP1011 - Custom archive bucket/folder #180

Merged
merged 2 commits into from
Jun 23, 2021
Merged

Conversation

llehtinen
Copy link
Contributor

@llehtinen llehtinen commented Jun 21, 2021

Problem

The archive_load_files feature added in #178 only allows archival in the same S3 bucket as is used to the Snowflake imports.

This means that S3 bucket versioning will either be enabled for both archived files and the temporary load files, or neither.

Versioning makes sense for archived files, where accidental deletes could be problematic. It does not, however, make sense for the temporary load files, and having it enabled for them will incur unnecessary cost.

Proposed changes

This PR allows configuring a separate S3 bucket to be used as the archive destination, as well as customizing the S3 prefix used.

Types of changes

What types of changes does your code introduce to PipelineWise?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)

Checklist

  • Description above provides context of the change
  • I have added tests that prove my fix is effective or that my feature works
  • Unit tests for changes (not needed for documentation changes)
  • CI checks pass with my changes
  • Bumping version in setup.py is an individual PR and not mixed with feature or bugfix PRs
  • Commit message/PR title starts with [AP-NNNN] (if applicable. AP-NNNN = JIRA ID)
  • Branch name starts with AP-NNN (if applicable. AP-NNN = JIRA ID)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions

@llehtinen llehtinen added Draft PR or issue still in draft mode enhancement New feature or request and removed Draft PR or issue still in draft mode labels Jun 21, 2021
@llehtinen llehtinen force-pushed the ap1011_custom_bucket_folder branch from 261b96e to 1ee77ec Compare June 21, 2021 14:11
copy_source = '{}/{}'.format(source_bucket, s3_source_key)

self.logger.info('Copying %s to archive location %s', copy_source, prefixed_archive_key)
self.upload_client.copy_object(copy_source, archive_bucket, prefixed_archive_key, s3_archive_metadata)
Copy link
Contributor

@koszti koszti Jun 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does the authentication work if we copy between two buckets? Can we use the same s3 upload_client for both buckets that initialised at

The aws_profile , aws_access_key_id, aws_secret_access_key , aws_session_token , s3_region_name, and s3_endpoint_url optional parameters should be valid for both buckets. Is that fine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Boto3 docs:

All copy requests must be authenticated. Additionally, you must have read access to the source object and write access to the destination bucket. For more information, see REST Authentication . Both the Region that you want to copy the object from and the Region that you want to copy the object to must be enabled for your account.

So, if we want to use the copy_object method (rather than uploading object to two different buckets and using two sets of credentials), the credentials used for Snowflake imports must be valid for writing to the archive bucket as well. I tested this in staging and it works. Not sure how different region or endpoint url would play out.

@koszti koszti merged commit 9b90754 into master Jun 23, 2021
@koszti koszti deleted the ap1011_custom_bucket_folder branch June 23, 2021 10:22
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants