Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an owner.txt to the ingestion bucket #67

Merged
merged 1 commit into from
Jul 31, 2023
Merged

Add an owner.txt to the ingestion bucket #67

merged 1 commit into from
Jul 31, 2023

Conversation

philerooski
Copy link
Contributor

Synapse already has read/write access to the ingestion bucket, so the only thing to do to enable the use of the bucket as an external storage location is to write an owner.txt.

In develop, owner.txt is written to main.
In prod, owner.txt is written to the bucket root. This is where production data (e.g., adults\/v1 and pediatric\/v1) already live.

@philerooski philerooski requested a review from a team as a code owner July 31, 2023 23:11
@philerooski philerooski temporarily deployed to develop July 31, 2023 23:13 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop July 31, 2023 23:13 — with GitHub Actions Inactive
Copy link
Contributor

@rxu17 rxu17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just one question

parameters:
BucketName: !stack_output_external recover-dev-ingestion-bucket::Bucket
SynapseIds: "3461799,3455604" # RecoverETL and synapse-service-sysbio-recover-data-storage-01
OwnerTxtKeyPrefix: {{ stack_group_config.namespace }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a particular reason we are removing this for the prod version and having the owner.txt being written to the root of the bucket? i think the prod config for the input bucket has this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really control the organization of data in the production ingestion bucket. So far data has been written to two different folders at the bucket root (adults\/v1 and pediatric\/v1), so unless we move the production data elsewhere, digital health will need to have access to the bucket root and update their scripts to copy data to that "elsewhere" folder. Having everything at the bucket root doesn't really bother me, because there is only one set of production data. We might send that data through different namespaces (e.g., main and staging), but there isn't a separate set of production data for main and another for staging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah makes sense

@philerooski philerooski temporarily deployed to develop July 31, 2023 23:19 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop July 31, 2023 23:24 — with GitHub Actions Inactive
@philerooski philerooski merged commit 4a74929 into main Jul 31, 2023
14 checks passed
@philerooski philerooski deleted the etl-506 branch July 31, 2023 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants