-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redshift Destination: update spec #12100
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some suggestions and comments.
@@ -110,10 +110,10 @@ | |||
"maximum": 100, | |||
"examples": ["10"], | |||
"description": "Optional. Increase this if syncing tables larger than 100GB. Only relevant for COPY. Files are streamed to S3 in parts. This determines the size of each part, in MBs. As S3 has a limit of 10,000 parts per file, part size affects the table size. This is 10MB by default, resulting in a default limit of 100GB tables. Note, a larger part size will result in larger memory requirements. A rule of thumb is to multiply the part size by 10 to get the memory requirement. Modify this with care.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"description": "Optional. Increase this if syncing tables larger than 100GB. Only relevant for COPY. Files are streamed to S3 in parts. This determines the size of each part, in MBs. As S3 has a limit of 10,000 parts per file, part size affects the table size. This is 10MB by default, resulting in a default limit of 100GB tables. Note, a larger part size will result in larger memory requirements. A rule of thumb is to multiply the part size by 10 to get the memory requirement. Modify this with care.", | |
"description": "Increase this if syncing tables larger than 100GB. Only relevant for COPY. Files are streamed to S3 in parts. This determines the size of each part, in MBs. As S3 has a limit of 10,000 parts per file, part size affects the table size. This is 10MB by default, resulting in a default limit of 100GB tables. Note: a larger part size will result in larger memory requirements. A rule of thumb is to multiply the part size by 10 to get the memory requirement. Modify this with care.", |
@@ -95,13 +95,13 @@ | |||
"access_key_id": { | |||
"type": "string", | |||
"description": "The Access Key Id granting allow one to access the above S3 staging bucket. Airbyte requires Read and Write permissions to the given bucket.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"description": "The Access Key Id granting allow one to access the above S3 staging bucket. Airbyte requires Read and Write permissions to the given bucket.", | |
"description": "This ID grants access to the above S3 staging bucket. Airbyte requires Read and Write permissions to the given bucket.", |
"type": "string", | ||
"description": "The directory under the S3 bucket where data will be written. If not provided, then defaults to the root directory.", | ||
"examples": ["data_sync/test"] | ||
}, | ||
"s3_bucket_region": { | ||
"title": "S3 Bucket Region", | ||
"title": "S3 Bucket Region (Optional)", | ||
"type": "string", | ||
"default": "", | ||
"description": "The region of the S3 staging bucket to use if utilising a copy strategy.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"description": "The region of the S3 staging bucket to use if utilising a copy strategy.", | |
"description": "The region of the S3 staging bucket to use if utilising a COPY strategy.", |
}, | ||
"purge_staging_data": { | ||
"title": "Purge Staging Files and Tables", | ||
"title": "Purge Staging Files and Tables (Optional)", | ||
"type": "boolean", | ||
"description": "Whether to delete the staging files from S3 after completing the sync. See the docs for details. Only relevant for COPY. Defaults to true.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are the docs?
"See the docs for details."
"type": "string", | ||
"description": "The name of the staging S3 bucket to use if utilising a COPY strategy. COPY is recommended for production workloads for better speed and scalability. See <a href=\"https://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html\">AWS docs</a> for more details.", | ||
"examples": ["airbyte.staging"] | ||
}, | ||
"s3_bucket_path": { | ||
"title": "S3 Bucket Path", | ||
"title": "S3 Bucket Path (Optional)", | ||
"type": "string", | ||
"description": "The directory under the S3 bucket where data will be written. If not provided, then defaults to the root directory.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there no relevant docs to link here?
"airbyte_secret": true | ||
}, | ||
"part_size": { | ||
"type": "integer", | ||
"minimum": 10, | ||
"maximum": 100, | ||
"examples": ["10"], | ||
"description": "Optional. Increase this if syncing tables larger than 100GB. Only relevant for COPY. Files are streamed to S3 in parts. This determines the size of each part, in MBs. As S3 has a limit of 10,000 parts per file, part size affects the table size. This is 10MB by default, resulting in a default limit of 100GB tables. Note, a larger part size will result in larger memory requirements. A rule of thumb is to multiply the part size by 10 to get the memory requirement. Modify this with care.", | ||
"title": "Stream Part Size" | ||
"description": "Increase this if syncing tables larger than 100GB. Only relevant for COPY. Files are streamed to S3 in parts. This determines the size of each part, in MBs. As S3 has a limit of 10,000 parts per file, part size affects the table size. This is 10MB by default, resulting in a default limit of 100GB tables. Note: a larger part size will result in larger memory requirements. A rule of thumb is to multiply the part size by 10 to get the memory requirement. Modify this with care.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are they no docs to link here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"type": "boolean", | ||
"description": "Whether to delete the staging files from S3 after completing the sync. See the docs for details. Only relevant for COPY. Defaults to true.", | ||
"description": "Whether to delete the staging files from S3 after completing the sync. See the docs for details. https://docs.airbyte.com/integrations/destinations/redshift#2a-fill-up-s3-info-for-copy-strategy. Only relevant for COPY. Defaults to true.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are we including the URL directly and not hyperlinking the text?
"type": "string", | ||
"default": "", | ||
"description": "The region of the S3 staging bucket to use if utilising a copy strategy.", | ||
"description": "The region of the S3 staging bucket to use if utilising a COPY strategy. See the docs for details. See the docs for details. https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html#:~:text=In-,Region,-%2C%20choose%20the%20AWS", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the URL not hyperlinked? And I think you've put the text in twice for 'see the docs..'?
@misteryeo added changes according to your suggestions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had one comment here where I wasn't seeing docs but otherwise LGTM!
"airbyte_secret": true | ||
}, | ||
"part_size": { | ||
"type": "integer", | ||
"minimum": 10, | ||
"maximum": 100, | ||
"examples": ["10"], | ||
"description": "Optional. Increase this if syncing tables larger than 100GB. Only relevant for COPY. Files are streamed to S3 in parts. This determines the size of each part, in MBs. As S3 has a limit of 10,000 parts per file, part size affects the table size. This is 10MB by default, resulting in a default limit of 100GB tables. Note, a larger part size will result in larger memory requirements. A rule of thumb is to multiply the part size by 10 to get the memory requirement. Modify this with care.", | ||
"title": "Stream Part Size" | ||
"description": "Increase this if syncing tables larger than 100GB. Only relevant for COPY. Files are streamed to S3 in parts. This determines the size of each part, in MBs. As S3 has a limit of 10,000 parts per file, part size affects the table size. This is 10MB by default, resulting in a default limit of 100GB tables. Note: a larger part size will result in larger memory requirements. A rule of thumb is to multiply the part size by 10 to get the memory requirement. Modify this with care.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@misteryeo added link for part_size field |
* Redshift Destination: update spec * update spec.json * update links in spec.json * added more links to spec.json | refactoring * updated docs with stadard connector template * added hyperlink to documentation for part_size field
What
Updated spec to follow connector release requirements
Recommended reading order
x.java
y.python
🚨 User Impact 🚨
none
Pre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/SUMMARY.md
docs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereUpdating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereConnector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changesTests
Unit
Put your unit tests output here.
Integration
Put your integration tests output here.
Acceptance
Put your acceptance tests output here.