-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Destination Redshift: Introduces configurable value for file buffer count #20879
Conversation
Affected Connector ReportNOTE
|
Connector | Version | Changelog | Publish |
---|---|---|---|
source-alloydb |
1.0.34 |
✅ | ✅ |
source-alloydb-strict-encrypt |
1.0.34 |
🔵 (ignored) |
🔵 (ignored) |
source-bigquery |
0.2.3 |
✅ | ✅ |
source-clickhouse |
0.1.14 |
✅ | ✅ |
source-clickhouse-strict-encrypt |
0.1.14 |
🔵 (ignored) |
🔵 (ignored) |
source-cockroachdb |
0.1.18 |
✅ | ✅ |
source-cockroachdb-strict-encrypt |
0.1.18 |
🔵 (ignored) |
🔵 (ignored) |
source-db2 |
0.1.16 |
✅ | ✅ |
source-db2-strict-encrypt |
0.1.16 |
🔵 (ignored) |
🔵 (ignored) |
source-dynamodb |
0.1.0 |
✅ | ✅ |
source-e2e-test |
2.1.3 |
✅ | ✅ |
source-e2e-test-cloud |
2.1.1 |
🔵 (ignored) |
🔵 (ignored) |
source-elasticsearch |
0.1.1 |
✅ | ✅ |
source-jdbc |
0.3.5 |
🔵 (ignored) |
🔵 (ignored) |
source-kafka |
0.2.3 |
✅ | ✅ |
source-mongodb-strict-encrypt |
0.1.19 |
🔵 (ignored) |
🔵 (ignored) |
source-mongodb-v2 |
0.1.19 |
✅ | ✅ |
source-mssql |
0.4.26 |
✅ | ✅ |
source-mssql-strict-encrypt |
0.4.26 |
🔵 (ignored) |
🔵 (ignored) |
source-mysql |
1.0.18 |
✅ | ✅ |
source-mysql-strict-encrypt |
1.0.18 |
🔵 (ignored) |
🔵 (ignored) |
source-oracle |
0.3.21 |
✅ | ✅ |
source-oracle-strict-encrypt |
0.3.21 |
🔵 (ignored) |
🔵 (ignored) |
source-postgres |
1.0.34 |
✅ | ✅ |
source-postgres-strict-encrypt |
1.0.34 |
🔵 (ignored) |
🔵 (ignored) |
source-redshift |
0.3.15 |
✅ | ✅ |
source-relational-db |
0.3.1 |
🔵 (ignored) |
🔵 (ignored) |
source-scaffold-java-jdbc |
0.1.0 |
🔵 (ignored) |
🔵 (ignored) |
source-sftp |
0.1.2 |
✅ | ✅ |
source-snowflake |
0.1.27 |
✅ | ✅ |
source-tidb |
0.2.1 |
✅ | ✅ |
- See "Actionable Items" below for how to resolve warnings and errors.
❌ Destinations (46)
Connector | Version | Changelog | Publish |
---|---|---|---|
destination-azure-blob-storage |
0.1.6 |
✅ | ✅ |
destination-bigquery |
1.2.9 |
✅ | ✅ |
destination-bigquery-denormalized |
1.2.9 |
✅ | ✅ |
destination-cassandra |
0.1.4 |
✅ | ✅ |
destination-clickhouse |
0.2.1 |
✅ | ✅ |
destination-clickhouse-strict-encrypt |
0.2.1 |
🔵 (ignored) |
🔵 (ignored) |
destination-csv |
0.2.10 |
✅ | ✅ |
destination-databricks |
0.3.1 |
✅ | ✅ |
destination-dev-null |
0.2.7 |
🔵 (ignored) |
🔵 (ignored) |
destination-doris |
0.1.0 |
✅ | ✅ |
destination-dynamodb |
0.1.7 |
✅ | ✅ |
destination-e2e-test |
0.2.4 |
✅ | ✅ |
destination-elasticsearch |
0.1.6 |
✅ | ✅ |
destination-elasticsearch-strict-encrypt |
0.1.6 |
🔵 (ignored) |
🔵 (ignored) |
destination-gcs |
0.2.12 |
✅ | ✅ |
destination-iceberg |
0.1.0 |
✅ | ✅ |
destination-jdbc |
0.3.14 |
🔵 (ignored) |
🔵 (ignored) |
destination-kafka |
0.1.10 |
✅ | ✅ |
destination-keen |
0.2.4 |
✅ | ✅ |
destination-kinesis |
0.1.5 |
✅ | ✅ |
destination-local-json |
0.2.11 |
✅ | ✅ |
destination-mariadb-columnstore |
0.1.7 |
✅ | ✅ |
destination-mongodb |
0.1.9 |
✅ | ✅ |
destination-mongodb-strict-encrypt |
0.1.9 |
🔵 (ignored) |
🔵 (ignored) |
destination-mqtt |
0.1.3 |
✅ | ✅ |
destination-mssql |
0.1.22 |
✅ | ✅ |
destination-mssql-strict-encrypt |
0.1.22 |
🔵 (ignored) |
🔵 (ignored) |
destination-mysql |
0.1.20 |
✅ | ✅ |
destination-mysql-strict-encrypt |
❌ 0.1.21 (mismatch: 0.1.20 ) |
🔵 (ignored) |
🔵 (ignored) |
destination-oracle |
0.1.19 |
✅ | ✅ |
destination-oracle-strict-encrypt |
0.1.19 |
🔵 (ignored) |
🔵 (ignored) |
destination-postgres |
0.3.26 |
✅ | ✅ |
destination-postgres-strict-encrypt |
0.3.26 |
🔵 (ignored) |
🔵 (ignored) |
destination-pubsub |
0.2.0 |
✅ | ✅ |
destination-pulsar |
0.1.3 |
✅ | ✅ |
destination-r2 |
0.1.0 |
✅ | ✅ |
destination-redis |
0.1.4 |
✅ | ✅ |
destination-redpanda |
0.1.0 |
✅ | ✅ |
destination-redshift |
0.3.52 |
✅ | ✅ |
destination-rockset |
0.1.4 |
✅ | ✅ |
destination-s3 |
0.3.18 |
✅ | ✅ |
destination-s3-glue |
0.1.1 |
✅ | ✅ |
destination-scylla |
0.1.3 |
✅ | ✅ |
destination-snowflake |
0.4.41 |
✅ | ✅ |
destination-tidb |
0.1.0 |
✅ | ✅ |
destination-yugabytedb |
0.1.0 |
✅ | ✅ |
- See "Actionable Items" below for how to resolve warnings and errors.
👀 Other Modules (1)
- base-normalization
Actionable Items
(click to expand)
Category | Status | Actionable Item |
---|---|---|
Version | ❌ mismatch |
The version of the connector is different from its normal variant. Please bump the version of the connector. |
⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
|
Changelog | ⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
❌ changelog missing |
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog. | |
Publish | ⚠ not in seed |
The connector is not in the seed file (e.g. source_definitions.yaml ), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug. |
❌ diff seed version |
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version. |
864975e
to
1495ff8
Compare
/test connector=connectors/destination-redshift
Build FailedTest summary info:
|
1495ff8
to
ffd7e72
Compare
ffd7e72
to
ac6d789
Compare
/test connector=connectors/destination-redshift
Build PassedTest summary info:
|
"file_buffer_size": { | ||
"title": "File Buffer Count", | ||
"type": "integer", | ||
"minimum": 15, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the new test checks that this value can be set as low as 10
, but the spec says that minimum us 15
. Should the two minimums be the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Yes, the minimums should be the same. I'll add on a test and change that ensures at minimum there are 15
file buffers
ac6d789
to
be9aba3
Compare
* before another stream's buffer can be created. Increasing the default max will reduce likelihood | ||
* of thrashing but not entirely eliminate unless number of buffers equals streams to be synced | ||
*/ | ||
public static final int DEFAULT_MAX_CONCURRENT_STREAM_IN_BUFFER = 15; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are changing the default and minimum from 10 to 15. This raises two questions:
- How this will affect existing connections?
- How this will affect non-cloud users, who may be running this connector in containers with less than 1GB RAM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The likelihood of this affecting existing connections will be low since the minimum memory needed for all the default buffers will be 465 MB (15 * 31 MB) which is strictly less than the 1 GB
MAX_TOTAL_BUFFER_SIZE_BYTES
- Furthermore, if a customer had low memory availability then they would have run into issues earlier since our
MAX_TOTAL_BUFFER_SIZE_BYTES
was 1 GB before a flush. To not run into issues prior, users would have had to tune the originalDEFAULT_MAX_CONCURRENT_STREAM_IN_BUFFER
or lower theMAX_TOTAL_BUFFER_SIZE_BYTES
to the amount of memory available for the JVM
It is reasonable to have within the change-log to mark this change as potentially breaking to users with memory less than 665 MB (465 MB + 200 MB from MAX_PER_STREAM_BUFFER_SIZE_BYTES
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, we used to run containers with 500MB by default and recently switched to 1GB. Why not leave minimum as 10
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MAX_TOTAL_BUFFER_SIZE_BYTES
has been 1 GB since March so most of this decision is based on that value. Didn't realize that you already had a comment about this here in the community PR but it makes sense to reduce plausibilities of breakages. It begs the question of how we communicate better with our community about these increases. It seems like allowing it to be configurable is one option with proper warnings
Based on your comment, it seems better to just have the default maintain 10
with the option to configure for the user
…ers to increase with adequate warnings
/publish connector=connectors/destination-redshift run-tests=false if you have connectors that successfully published but failed definition generation, follow step 4 here EDIT: Failed test run here Re-running publish command since it failed to checkout Airbyte repository |
/publish connector=connectors/destination-redshift run-tests=false
if you have connectors that successfully published but failed definition generation, follow step 4 here |
Hi @ryankfu - I found a typo in this that prevents it from working correctly - buffers are still capped at 10. https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-redshift/src/main/java/io/airbyte/integrations/destination/redshift/RedshiftStagingS3Destination.java#L178 - this max is to the default, which is still 10. It should be a max with Additionally, the default of 1 here doesn't seem correct - that should likely be |
Also - there's a discrepancy between the config keys used in the spec and the application, so I don't think its even possible to configure this correctly. |
@adam-bloom thank you for pointing out the mismatched As for the buffers being capped at return Math.max(numOfFileBuffers, FileBuffer.DEFAULT_MAX_CONCURRENT_STREAM_IN_BUFFER); will return the larger of In fact there's a test that captures this: Lines 126 to 130 in 0e3d2ba
|
I think it was really the config mismatch that I was seeing the impacts of. The 10 would have been coming from max(1, 10). Just needed a second pass! |
What
Follows discussion in #13975 and partially addresses https://github.com/airbytehq/airbyte-internal-issues/issues/496 which asserts that interleaved messages (e.g. Change Data Capture aka CDC) trigger frequent
flush all
commands which reduces performance since buffers will thrash due to the constant need to switch buffers for different streamsThis change doesn't fully address those tickets but introduces work that a community member has introduced which is a user configurable field for concurrent buffers to be created when a
MessageConsumer
is created for destinations. This change follows a discussion that mentions that each buffer requires a minimum of 31 MB and having an unbounded number of buffers is not feasible primarily because the amount of memory would need to scale and since Airbyte aims to not break existing customer's workflow this number is increased with that consideration in mindHow
Creates the ability for a user to exceed the default within the hard cap limit. If the value exceeds the soft cap then the user will be prompted with a warning indicating the reason to exceed grants additional performance so as long as the increase does not exceed the number of streams to be synced
Recommended reading order
RedshiftStagingS3Destination.java
FileBuffer.java
RedshiftStagingS3DestinationTest.java
🚨 User Impact 🚨
Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.
This should not have any breaking changes as the increase in
FileBuffer
is within the number of memory that is allocated generally across all users (1 GB) and the soft cap is within that amount even with "wiggle" roomPre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereUpdating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereConnector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changesTests
Unit
Put your unit tests output here.
Integration
Put your integration tests output here.
Acceptance
Put your acceptance tests output here.