-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase Database Source SELECT Batch Size #19514
Conversation
2951f69
to
d04404d
Compare
@edgao I'm network-bound at home testing this out... what's the process for getting a connector tested on cloud or a dev server? |
I think roughly this? https://internal-docs.airbyte.io/Things-To-Know/Cloud-Development-Environments - you'd need to push your connector image to dockerhub under some special tag + update the cloud connector version mask to point at that tag but probably worth checking with the cloud folks for more details, I actually haven't done this before either :/ |
@@ -10,7 +10,7 @@ public final class FetchSizeConstants { | |||
// This size is not enforced. It is only used to calculate a proper | |||
// fetch size. The max row size the connector can handle is actually | |||
// limited by the heap size. | |||
public static final double TARGET_BUFFER_SIZE_RATIO = 0.5; | |||
public static final double TARGET_BUFFER_SIZE_RATIO = 0.6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious, why not go higher?
if primary memory usage is due to this buffer, I think we can go as high as 80%. Usual java overhead is ~~10% so a 20% buffer seems good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can keep ramping this up... but in an environment where I can't robustly test this, I want to go step-by-step
I've done some local testing and thing seem ok... @edgao / @davinchia please let me know what I should do to be confident that this is safe to merge in |
Affected Connector ReportNOTE
|
Connector | Version | Changelog | Publish |
---|---|---|---|
source-alloydb |
1.0.17 |
✅ | ✅ |
source-alloydb-strict-encrypt |
1.0.17 |
✅ | ⚠ (not in seed) |
source-mysql |
1.0.14 |
✅ | ✅ |
source-mysql-strict-encrypt |
1.0.14 |
✅ | ⚠ (not in seed) |
source-postgres-strict-encrypt |
1.0.28 |
✅ | ⚠ (not in seed) |
- See "Actionable Items" below for how to resolve warnings and errors.
✅ Destinations (0)
Connector | Version | Changelog | Publish |
---|
- See "Actionable Items" below for how to resolve warnings and errors.
✅ Other Modules (0)
Actionable Items
(click to expand)
Category | Status | Actionable Item |
---|---|---|
Version | ❌ mismatch |
The version of the connector is different from its normal variant. Please bump the version of the connector. |
⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
|
Changelog | ⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
❌ changelog missing |
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog. | |
Publish | ⚠ not in seed |
The connector is not in the seed file (e.g. source_definitions.yaml ), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug. |
❌ diff seed version |
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version. |
/publish connector=connectors/source-mysql
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/source-mysql-strict-encrypt
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/source-postgres
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/source-postgres-strict-encrypt
if you have connectors that successfully published but failed definition generation, follow step 4 here |
…/airbyte into evan/larger-db-buffers
#12400 added dynamic batch sizes to our SQL sources to prevent OOM problems. However, I believe that we are currently creating batch sizes that are too small for optimal speed. This is validated in 2 ways:
This PR does 2 things: