-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proof of concept parallel source stream reading implementation for MySQL #26580
Proof of concept parallel source stream reading implementation for MySQL #26580
Conversation
Before Merging a Connector Pull RequestWow! What a great pull request you have here! 🎉 To merge this PR, ensure the following has been done/considered for each connector added or updated:
If the checklist is complete, but the CI check is failing,
|
Affected Connector ReportNOTE
|
Connector | Version | Changelog | Publish |
---|---|---|---|
source-alloydb |
2.0.28 |
✅ | ✅ |
source-alloydb-strict-encrypt |
2.0.28 |
🔵 (ignored) |
🔵 (ignored) |
source-azure-blob-storage |
0.1.0 |
✅ | ✅ |
source-bigquery |
0.2.3 |
✅ | ✅ |
source-clickhouse |
0.1.17 |
✅ | ✅ |
source-clickhouse-strict-encrypt |
0.1.17 |
🔵 (ignored) |
🔵 (ignored) |
source-cockroachdb |
0.1.22 |
✅ | ✅ |
source-cockroachdb-strict-encrypt |
0.1.22 |
🔵 (ignored) |
🔵 (ignored) |
source-db2 |
0.1.20 |
✅ | ✅ |
source-db2-strict-encrypt |
❌ 0.1.19 (mismatch: 0.1.20 ) |
🔵 (ignored) |
🔵 (ignored) |
source-dynamodb |
0.1.2 |
✅ | ✅ |
source-e2e-test |
2.1.4 |
✅ | ✅ |
source-e2e-test-cloud |
2.1.4 |
🔵 (ignored) |
🔵 (ignored) |
source-elasticsearch |
0.1.1 |
✅ | ✅ |
source-jdbc |
0.3.5 |
🔵 (ignored) |
🔵 (ignored) |
source-kafka |
0.2.3 |
✅ | ✅ |
source-mongodb-strict-encrypt |
0.1.19 |
🔵 (ignored) |
🔵 (ignored) |
source-mongodb-v2 |
0.1.19 |
✅ | ✅ |
source-mssql |
1.0.19 |
✅ | ✅ |
source-mssql-strict-encrypt |
1.0.19 |
🔵 (ignored) |
🔵 (ignored) |
source-mysql |
2.0.25 |
✅ | ✅ |
source-mysql-strict-encrypt |
2.0.25 |
🔵 (ignored) |
🔵 (ignored) |
source-oracle |
0.3.25 |
✅ | ✅ |
source-oracle-strict-encrypt |
❌ 0.3.24 (mismatch: 0.3.25 ) |
🔵 (ignored) |
🔵 (ignored) |
source-postgres |
2.0.34 |
✅ | ✅ |
source-postgres-strict-encrypt |
2.0.34 |
🔵 (ignored) |
🔵 (ignored) |
source-redshift |
0.3.17 |
✅ | ✅ |
source-relational-db |
0.3.1 |
🔵 (ignored) |
🔵 (ignored) |
source-scaffold-java-jdbc |
0.1.0 |
🔵 (ignored) |
🔵 (ignored) |
source-sftp |
0.1.2 |
✅ | ✅ |
source-snowflake |
0.1.36 |
✅ | ✅ |
source-teradata |
0.1.0 |
✅ | ✅ |
source-tidb |
0.2.5 |
✅ | ✅ |
- See "Actionable Items" below for how to resolve warnings and errors.
❌ Destinations (50)
Connector | Version | Changelog | Publish |
---|---|---|---|
destination-azure-blob-storage |
0.2.0 |
✅ | ✅ |
destination-bigquery |
1.4.5 |
✅ | ✅ |
destination-bigquery-denormalized |
1.4.1 |
✅ | ✅ |
destination-cassandra |
0.1.4 |
✅ | ✅ |
destination-clickhouse |
0.2.5 |
✅ | ✅ |
destination-clickhouse-strict-encrypt |
0.2.5 |
🔵 (ignored) |
🔵 (ignored) |
destination-csv |
1.0.0 |
✅ | ✅ |
destination-databricks |
1.1.0 |
✅ | ✅ |
destination-dev-null |
0.3.0 |
🔵 (ignored) |
🔵 (ignored) |
destination-doris |
0.1.0 |
✅ | ✅ |
destination-dynamodb |
0.1.7 |
✅ | ✅ |
destination-e2e-test |
0.3.0 |
✅ | ✅ |
destination-elasticsearch |
0.1.6 |
✅ | ✅ |
destination-elasticsearch-strict-encrypt |
0.1.6 |
🔵 (ignored) |
🔵 (ignored) |
destination-exasol |
0.1.1 |
✅ | ✅ |
destination-gcs |
0.3.0 |
✅ | ✅ |
destination-iceberg |
0.1.0 |
✅ | ✅ |
destination-kafka |
0.1.10 |
✅ | ✅ |
destination-keen |
0.2.4 |
✅ | ✅ |
destination-kinesis |
0.1.5 |
✅ | ✅ |
destination-local-json |
0.2.11 |
✅ | ✅ |
destination-mariadb-columnstore |
0.1.7 |
✅ | ✅ |
destination-mongodb |
0.1.9 |
✅ | ✅ |
destination-mongodb-strict-encrypt |
0.1.9 |
🔵 (ignored) |
🔵 (ignored) |
destination-mqtt |
0.1.3 |
✅ | ✅ |
destination-mssql |
0.1.25 |
✅ | ✅ |
destination-mssql-strict-encrypt |
0.1.25 |
🔵 (ignored) |
🔵 (ignored) |
destination-mysql |
0.1.20 |
✅ | ✅ |
destination-mysql-strict-encrypt |
❌ 0.1.21 (mismatch: 0.1.20 ) |
🔵 (ignored) |
🔵 (ignored) |
destination-oracle |
0.1.19 |
✅ | ✅ |
destination-oracle-strict-encrypt |
0.1.19 |
🔵 (ignored) |
🔵 (ignored) |
destination-postgres |
0.3.27 |
✅ | ✅ |
destination-postgres-strict-encrypt |
0.3.27 |
🔵 (ignored) |
🔵 (ignored) |
destination-pubsub |
0.2.0 |
✅ | ✅ |
destination-pulsar |
0.1.3 |
✅ | ✅ |
destination-r2 |
0.1.0 |
✅ | ✅ |
destination-redis |
0.1.4 |
✅ | ✅ |
destination-redpanda |
0.1.0 |
✅ | ✅ |
destination-redshift |
0.4.9 |
✅ | ✅ |
destination-rockset |
0.1.4 |
✅ | ✅ |
destination-s3 |
0.4.2 |
✅ | ❌ (diff seed version) |
destination-s3-glue |
0.1.7 |
✅ | ✅ |
destination-scylla |
0.1.3 |
✅ | ✅ |
destination-selectdb |
0.1.0 |
✅ | ✅ |
destination-snowflake |
1.0.6 |
✅ | ✅ |
destination-starburst-galaxy |
0.0.1 |
✅ | ✅ |
destination-teradata |
0.1.1 |
✅ | ✅ |
destination-tidb |
0.1.4 |
✅ | ✅ |
destination-vertica |
0.1.0 |
✅ | ✅ |
destination-yugabytedb |
0.1.1 |
✅ | ✅ |
- See "Actionable Items" below for how to resolve warnings and errors.
✅ Other Modules (0)
Actionable Items
(click to expand)
Category | Status | Actionable Item |
---|---|---|
Version | ❌ mismatch |
The version of the connector is different from its normal variant. Please bump the version of the connector. |
⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
|
Changelog | ⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
❌ changelog missing |
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog. | |
Publish | ⚠ not in seed |
The connector is not in the cloud or oss registry, so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that you have added a metadata.yaml file and the expected registries are enabled. |
*/ | ||
executorService = Executors.newFixedThreadPool(computeThreadPoolSize()); | ||
/* | ||
* TODO Expose the max capacity as an advanced connector setting. This would allow the end user to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this, I wonder if it should be a configuration shared between the platform and the connector. My reasoning for sharing this config is that the platform orchestrator will also implement an internal buffer and will likely face the same issues than the connector if the records are too big. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benmoriceau Good point. I'm unclear at the moment as to whether or not an end user would even understand what this lever does. At a minimum, it should probably be in sync with the platform to ensure that we don't overrun it. That being said, the connector also has different memory restrictions than the platform, so the setting may not be equal to what the platform uses, but some smaller value or dynamic value that goes down on retry if the previous failure is due to an OOM. In any event, it needs to be configurable regardless of who sets the configuration.
|
||
// Block while waiting for the next result to be available | ||
final T result = internalQueue.take().data(); | ||
LOGGER.debug("Fetched next result {} from internal queue.", result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: this should not make it to prod, it activates the possibility of having client data in the logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I and @jdpgrailsdev went over the PR and we discussed two major things :
- The internalQueue inside ParallelCompositeIterator class has a high probability of becoming a bottleneck. We have this design cause the read method today returns one iterator. But what if we change the read method to return a list of iterators and each iterator represents one single stream. That way we could start a thread for each stream and all it would do is consume records from the iterator and print it to the console. This way we dont need the queue and ParallelCompositeIterator. The entire logic would live in IntegrationRunner and we could put it inside a feature flag to easily commit this to master and test it out.
- The exception handling also needs to make sure that an exception that happened while syncing a single stream is propagated properly downstream.
The PR has been updated to introduce a different approach suggested by @subodh1810. The new approach moves the concurrency to the
As mentioned above, I have left the cc: @davinchia |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, I just have concerns around the working of the method watchForOrphanThreads. I think its complicating the working of parallelism and may cause problems which would be difficult to debug.
@subodh1810 Could you expound upon this? I do think there is a bug in the implementation -- I need to make sure that the executor service is stopped when the read is complete. That should take care of any threads created to do the concurrent read. I'll clean up the PR and get that change in there and we can continue the conversation about this once the logic to shut down the executor service is in place. |
0a7414d
to
cd28028
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, I am ready to approve. Just a compilation error and few changes that I am not sure belong in the PR.
@@ -250,8 +250,7 @@ protected void estimateFullRefreshSyncSize(final Database database, | |||
/* no-op */ | |||
} | |||
|
|||
private List<TableInfo<CommonField<DataType>>> discoverWithoutSystemTables( | |||
final Database database) | |||
private List<TableInfo<CommonField<DataType>>> discoverWithoutSystemTables(final Database database) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be protected, there is a compilation error I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange. I did not run into any issues building locally. I will make the suggested change to see if that helps.
public static final String FIELD_SELECTION_WORKSPACES = "FIELD_SELECTION_WORKSPACES"; | ||
|
||
@Override | ||
public boolean autoDisablesFailingConnections() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are no longer used. As I was in there adding a new flag, I figured it would be worth cleaning up the unused ones.
Can you also run the |
destination-pubsub test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-pubsub test
destination-dev-null test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-dev-null test
destination-bigquery-denormalized test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-bigquery-denormalized test
source-postgres test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-postgres test
source-mongodb-strict-encrypt test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-mongodb-strict-encrypt test
destination-redshift test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-redshift test
source-snowflake test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-snowflake test
destination-iceberg test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-iceberg test
source-dynamodb test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-dynamodb test
destination-s3 test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-s3 test
destination-mssql test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-mssql test
destination-postgres test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-postgres test
destination-e2e-test test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-e2e-test test
destination-csv test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-csv test
destination-vertica test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-vertica test
destination-doris test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-doris test
destination-gcs test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-gcs test
source-redshift test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-redshift test
destination-scylla test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-scylla test
source-clickhouse test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-clickhouse test
source-sftp test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-sftp test
destination-tidb test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-tidb test
source-elasticsearch test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-elasticsearch test
destination-yugabytedb test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-yugabytedb test
source-alloydb-strict-encrypt test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-alloydb-strict-encrypt test
source-cockroachdb test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-cockroachdb test
destination-databricks test report (commit
|
Step | Result |
---|
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-databricks test
…SQL (#26580) * Proof of concept parallel source stream reading implementation for MySQL * Automated Change * Add read method that supports concurrent execution to Source interface * Remove parallel iterator * Ensure that executor service is stopped * Automated Commit - Format and Process Resources Changes * Expose method to fix compilation issue * Use concurrent map to avoid access issues * Automated Commit - Format and Process Resources Changes * Ensure concurrent streams finish before closing source * Fix compile issue * Formatting * Exclude concurrent stream threads from orphan thread watcher * Automated Commit - Format and Process Resources Changes * Refactor orphaned thread logic to account for concurrent execution * PR feedback * Implement readStreams in wrapper source * Automated Commit - Format and Process Resources Changes * Add readStream override * Automated Commit - Format and Process Resources Changes * 🤖 Auto format source-mysql code [skip ci] * 🤖 Auto format source-mysql code [skip ci] * 🤖 Auto format source-mysql code [skip ci] * 🤖 Auto format source-mysql code [skip ci] * 🤖 Auto format source-mysql code [skip ci] * Debug logging * Reduce logging level * Replace synchronized calls to System.out.println when concurrent * Close consumer * Flush before close * Automated Commit - Format and Process Resources Changes * Remove charset * Use ASCII and flush periodically for parallel streams * Test performance harness patch * Automated Commit - Format and Process Resources Changes * Cleanup * Logging to identify concurrent read enabled * Mark parameter as final --------- Co-authored-by: jdpgrailsdev <jdpgrailsdev@users.noreply.github.com> Co-authored-by: octavia-squidington-iii <octavia-squidington-iii@users.noreply.github.com> Co-authored-by: Rodi Reich Zilberman <867491+rodireich@users.noreply.github.com> Co-authored-by: rodireich <rodireich@users.noreply.github.com>
* Add everything for BQ but migrate, refactor interface after practical work * Make new default methods, refactor to single implemented method * MigrationInterface and BQ impl created * Trying to integrate with standard inserts * remove unnecessary NameAndNamespacePair class * Shimmed in * Java Docs * Initial Testing Setup * Tests! * Move Migrator into TyperDeduper * Functional Migration * Add Integration Test * Pr updates * bump version * bump version * version bump * Update to airbyte-ci-internal (#29026) * 🐛 Source Github, Instagram, Zendesk-support, Zendesk-talk: fix CAT tests fail on `spec` (#28910) * connectors-ci: better modified connectors detection logic (#28855) * connectors-ci: report path should always start with `airbyte-ci/` (#29030) * make report path always start with airbyte-ci * revert report path in orchestrator * add more test cases * bump version * Updated docs (#29019) * CDK: Embedded reader utils (#28873) * relax pydantic dep * Automated Commit - Format and Process Resources Changes * wip * wrap up base integration * add init file * introduce CDK runner and improve error message * make state param optional * update protocol models * review comments * always run incremental if possible * fix --------- Co-authored-by: flash1293 <flash1293@users.noreply.github.com> * 🤖 Bump minor version of Airbyte CDK * 🚨🚨 Low code CDK: Decouple SimpleRetriever and HttpStream (#28657) * fix tests * format * review comments * Automated Commit - Formatting Changes * review comments * review comments * review comments * log all messages * log all message * review comments * review comments * Automated Commit - Formatting Changes * add comment --------- Co-authored-by: flash1293 <flash1293@users.noreply.github.com> * 🤖 Bump minor version of Airbyte CDK * 🐛 Source Github, Instagram, Zendesk Support / Talk - revert `spec` changes and improve (#29031) * Source oauth0: new streams and fix incremental (#29001) * Add new streams Organizations,OrganizationMembers,OrganizationMemberRoles * relax schema definition to allow additional fields * Bump image tag version * revert some changes to the old schemas * Format python so gradle can pass * update incremental * remove unused print * fix unit test --------- Co-authored-by: Vasilis Gavriilidis <vasilis.gavriilidis@orfium.com> * 🐛 Source Mongo: Fix failing acceptance tests (#28816) * Fix failing acceptance tests * Fix failing strict acceptance tests * Source-Greenhouse: Fix unit tests for new CDK version (#28969) Fix unit tests * Add CSV options to the CSV parser (#28491) * remove invalid legacy option * remove unused option * the tests pass but this is quite messy * very slight clean up * Add skip options to csv format * fix some of the typing issues * fixme comment * remove extra log message * fix typing issues * skip before header * skip after header * format * add another test * Automated Commit - Formatting Changes * auto generate column names * delete dead code * update title and description * true and false values * Update the tests * Add comment * missing test * rename * update expected spec * move to method * Update comment * fix typo * remove unused import * Add a comment * None records do not pass the WaitForDiscoverPolicy * format * remove second branch to ensure we always go through the same processing * Raise an exception if the record is None * reset * Update tests * handle unquoted newlines * Automated Commit - Formatting Changes * Update test case so the quoting is explicit * Update comment * Automated Commit - Formatting Changes * Fail validation if skipping rows before header and header is autogenerated * always fail if a record cannot be parsed * format * set write line_no in error message * remove none check * Automated Commit - Formatting Changes * enable autogenerate test * remove duplicate test * missing unit tests * Update * remove branching * remove unused none check * Update tests * remove branching * format * extract to function * comment * missing type * type annotation * use set * Document that the strings are case-sensitive * public -> private * add unit test * newline --------- Co-authored-by: girarda <girarda@users.noreply.github.com> * Dagster: Add sentry logging (#28822) * Add sentry * add sentry decorator * Add traces * Use sentry trace * Improve duplicate logging * Add comments * DNC * Fix up issues * Move to scopes * Remove breadcrumb * Update lock * ✨Source Shortio: Migrate Python CDK to Low-code CDK (#28950) * Migrate Shortio to Low-Code * Update abnormal state * Format * Update Docs * Fix metadata.yaml * Add pagination * Add incremental sync * add incremental parameters * update metadata * rollback update version * release date --------- Co-authored-by: marcosmarxm <marcosmarxm@gmail.com> * Update to new verbiage (#29051) * [skip ci] Metadata: Remove leading underscore (#29024) * DNC * Add test models * Add model test * Remove underscore from metadata files * Regenerate models * Add test to check for key transformation * Allow additional fields on metadata * Delete transform * Proof of concept parallel source stream reading implementation for MySQL (#26580) * Proof of concept parallel source stream reading implementation for MySQL * Automated Change * Add read method that supports concurrent execution to Source interface * Remove parallel iterator * Ensure that executor service is stopped * Automated Commit - Format and Process Resources Changes * Expose method to fix compilation issue * Use concurrent map to avoid access issues * Automated Commit - Format and Process Resources Changes * Ensure concurrent streams finish before closing source * Fix compile issue * Formatting * Exclude concurrent stream threads from orphan thread watcher * Automated Commit - Format and Process Resources Changes * Refactor orphaned thread logic to account for concurrent execution * PR feedback * Implement readStreams in wrapper source * Automated Commit - Format and Process Resources Changes * Add readStream override * Automated Commit - Format and Process Resources Changes * 🤖 Auto format source-mysql code [skip ci] * 🤖 Auto format source-mysql code [skip ci] * 🤖 Auto format source-mysql code [skip ci] * 🤖 Auto format source-mysql code [skip ci] * 🤖 Auto format source-mysql code [skip ci] * Debug logging * Reduce logging level * Replace synchronized calls to System.out.println when concurrent * Close consumer * Flush before close * Automated Commit - Format and Process Resources Changes * Remove charset * Use ASCII and flush periodically for parallel streams * Test performance harness patch * Automated Commit - Format and Process Resources Changes * Cleanup * Logging to identify concurrent read enabled * Mark parameter as final --------- Co-authored-by: jdpgrailsdev <jdpgrailsdev@users.noreply.github.com> Co-authored-by: octavia-squidington-iii <octavia-squidington-iii@users.noreply.github.com> Co-authored-by: Rodi Reich Zilberman <867491+rodireich@users.noreply.github.com> Co-authored-by: rodireich <rodireich@users.noreply.github.com> * connectors-ci: disable dependency scanning (#29033) * updates (#29059) * Metadata: skip breaking change validation on prerelease (#29017) * skip breaking change validation * Move ValidatorOpts higher in call * Add prerelease test * Fix test * ✨ Source MongoDB Internal POC: Generate Test Data (#29049) * Add script to generate test data * Fix prose * Update credentials example * PR feedback * Bump Airbyte version from 0.50.12 to 0.50.13 * Bump versions for mssql strict-encrypt (#28964) * Bump versions for mssql strict-encrypt * Fix failing test * Fix failing test * 🎨 Improve replication method selection UX (#28882) * update replication method in MySQL source * bump version * update expected specs * update registries * bump strict encrypt version * make password always_show * change url * update registries * 🐛 Avoid writing records to log (#29047) * Avoid writing records to log * Update version * Rollout ctid cdc (#28708) * source-postgres: enable ctid+cdc implementation * 100% ctid rollout for cdc * remove CtidFeatureFlags * fix CdcPostgresSourceAcceptanceTest * Bump versions and release notes * Fix compilation error due to previous merge --------- Co-authored-by: subodh <subodh1810@gmail.com> * connectors-ci: fix `unhashable type 'set'` (#29064) * Add Slack Alert lifecycle to Dagster for Metadata publish (#28759) * DNC * Add slack lifecycle logging * Update to use slack * Update slack to use resource and bot * Improve markdown * Improve log * Add sensor logging * Extend sensor time * merge conflict * PR Refactoring * Make the tests work * remove unnecessary classes, pr feedback * more merging * Update airbyte-integrations/bases/base-typing-deduping-test/src/main/java/io/airbyte/integrations/base/destination/typing_deduping/BaseSqlGeneratorIntegrationTest.java Co-authored-by: Edward Gao <edward.gao@airbyte.io> * snowflake updates --------- Co-authored-by: Ben Church <ben@airbyte.io> Co-authored-by: Baz <oleksandr.bazarnov@globallogic.com> Co-authored-by: Augustin <augustin@airbyte.io> Co-authored-by: Serhii Lazebnyi <53845333+lazebnyi@users.noreply.github.com> Co-authored-by: Joe Reuter <joe@airbyte.io> Co-authored-by: flash1293 <flash1293@users.noreply.github.com> Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com> Co-authored-by: Vasilis Gavriilidis <vasilis.gavriilidis@orfium.com> Co-authored-by: Jonathan Pearlin <jonathan@airbyte.io> Co-authored-by: Alexandre Girard <alexandre@airbyte.io> Co-authored-by: girarda <girarda@users.noreply.github.com> Co-authored-by: btkcodedev <btk.codedev@gmail.com> Co-authored-by: marcosmarxm <marcosmarxm@gmail.com> Co-authored-by: Natalie Kwong <38087517+nataliekwong@users.noreply.github.com> Co-authored-by: jdpgrailsdev <jdpgrailsdev@users.noreply.github.com> Co-authored-by: octavia-squidington-iii <octavia-squidington-iii@users.noreply.github.com> Co-authored-by: Rodi Reich Zilberman <867491+rodireich@users.noreply.github.com> Co-authored-by: rodireich <rodireich@users.noreply.github.com> Co-authored-by: Alexandre Cuoci <Hesperide@users.noreply.github.com> Co-authored-by: terencecho <terencecho@users.noreply.github.com> Co-authored-by: Lake Mossman <lake@airbyte.io> Co-authored-by: Benoit Moriceau <benoit@airbyte.io> Co-authored-by: subodh <subodh1810@gmail.com> Co-authored-by: Edward Gao <edward.gao@airbyte.io>
* Add everything for BQ but migrate, refactor interface after practical work * Make new default methods, refactor to single implemented method * MigrationInterface and BQ impl created * Trying to integrate with standard inserts * remove unnecessary NameAndNamespacePair class * Shimmed in * Java Docs * Initial Testing Setup * Tests! * Move Migrator into TyperDeduper * Functional Migration * Add Integration Test * Pr updates * bump version * bump version * version bump * Update to airbyte-ci-internal (airbytehq#29026) * 🐛 Source Github, Instagram, Zendesk-support, Zendesk-talk: fix CAT tests fail on `spec` (airbytehq#28910) * connectors-ci: better modified connectors detection logic (airbytehq#28855) * connectors-ci: report path should always start with `airbyte-ci/` (airbytehq#29030) * make report path always start with airbyte-ci * revert report path in orchestrator * add more test cases * bump version * Updated docs (airbytehq#29019) * CDK: Embedded reader utils (airbytehq#28873) * relax pydantic dep * Automated Commit - Format and Process Resources Changes * wip * wrap up base integration * add init file * introduce CDK runner and improve error message * make state param optional * update protocol models * review comments * always run incremental if possible * fix --------- Co-authored-by: flash1293 <flash1293@users.noreply.github.com> * 🤖 Bump minor version of Airbyte CDK * 🚨🚨 Low code CDK: Decouple SimpleRetriever and HttpStream (airbytehq#28657) * fix tests * format * review comments * Automated Commit - Formatting Changes * review comments * review comments * review comments * log all messages * log all message * review comments * review comments * Automated Commit - Formatting Changes * add comment --------- Co-authored-by: flash1293 <flash1293@users.noreply.github.com> * 🤖 Bump minor version of Airbyte CDK * 🐛 Source Github, Instagram, Zendesk Support / Talk - revert `spec` changes and improve (airbytehq#29031) * Source oauth0: new streams and fix incremental (airbytehq#29001) * Add new streams Organizations,OrganizationMembers,OrganizationMemberRoles * relax schema definition to allow additional fields * Bump image tag version * revert some changes to the old schemas * Format python so gradle can pass * update incremental * remove unused print * fix unit test --------- Co-authored-by: Vasilis Gavriilidis <vasilis.gavriilidis@orfium.com> * 🐛 Source Mongo: Fix failing acceptance tests (airbytehq#28816) * Fix failing acceptance tests * Fix failing strict acceptance tests * Source-Greenhouse: Fix unit tests for new CDK version (airbytehq#28969) Fix unit tests * Add CSV options to the CSV parser (airbytehq#28491) * remove invalid legacy option * remove unused option * the tests pass but this is quite messy * very slight clean up * Add skip options to csv format * fix some of the typing issues * fixme comment * remove extra log message * fix typing issues * skip before header * skip after header * format * add another test * Automated Commit - Formatting Changes * auto generate column names * delete dead code * update title and description * true and false values * Update the tests * Add comment * missing test * rename * update expected spec * move to method * Update comment * fix typo * remove unused import * Add a comment * None records do not pass the WaitForDiscoverPolicy * format * remove second branch to ensure we always go through the same processing * Raise an exception if the record is None * reset * Update tests * handle unquoted newlines * Automated Commit - Formatting Changes * Update test case so the quoting is explicit * Update comment * Automated Commit - Formatting Changes * Fail validation if skipping rows before header and header is autogenerated * always fail if a record cannot be parsed * format * set write line_no in error message * remove none check * Automated Commit - Formatting Changes * enable autogenerate test * remove duplicate test * missing unit tests * Update * remove branching * remove unused none check * Update tests * remove branching * format * extract to function * comment * missing type * type annotation * use set * Document that the strings are case-sensitive * public -> private * add unit test * newline --------- Co-authored-by: girarda <girarda@users.noreply.github.com> * Dagster: Add sentry logging (airbytehq#28822) * Add sentry * add sentry decorator * Add traces * Use sentry trace * Improve duplicate logging * Add comments * DNC * Fix up issues * Move to scopes * Remove breadcrumb * Update lock * ✨Source Shortio: Migrate Python CDK to Low-code CDK (airbytehq#28950) * Migrate Shortio to Low-Code * Update abnormal state * Format * Update Docs * Fix metadata.yaml * Add pagination * Add incremental sync * add incremental parameters * update metadata * rollback update version * release date --------- Co-authored-by: marcosmarxm <marcosmarxm@gmail.com> * Update to new verbiage (airbytehq#29051) * [skip ci] Metadata: Remove leading underscore (airbytehq#29024) * DNC * Add test models * Add model test * Remove underscore from metadata files * Regenerate models * Add test to check for key transformation * Allow additional fields on metadata * Delete transform * Proof of concept parallel source stream reading implementation for MySQL (airbytehq#26580) * Proof of concept parallel source stream reading implementation for MySQL * Automated Change * Add read method that supports concurrent execution to Source interface * Remove parallel iterator * Ensure that executor service is stopped * Automated Commit - Format and Process Resources Changes * Expose method to fix compilation issue * Use concurrent map to avoid access issues * Automated Commit - Format and Process Resources Changes * Ensure concurrent streams finish before closing source * Fix compile issue * Formatting * Exclude concurrent stream threads from orphan thread watcher * Automated Commit - Format and Process Resources Changes * Refactor orphaned thread logic to account for concurrent execution * PR feedback * Implement readStreams in wrapper source * Automated Commit - Format and Process Resources Changes * Add readStream override * Automated Commit - Format and Process Resources Changes * 🤖 Auto format source-mysql code [skip ci] * 🤖 Auto format source-mysql code [skip ci] * 🤖 Auto format source-mysql code [skip ci] * 🤖 Auto format source-mysql code [skip ci] * 🤖 Auto format source-mysql code [skip ci] * Debug logging * Reduce logging level * Replace synchronized calls to System.out.println when concurrent * Close consumer * Flush before close * Automated Commit - Format and Process Resources Changes * Remove charset * Use ASCII and flush periodically for parallel streams * Test performance harness patch * Automated Commit - Format and Process Resources Changes * Cleanup * Logging to identify concurrent read enabled * Mark parameter as final --------- Co-authored-by: jdpgrailsdev <jdpgrailsdev@users.noreply.github.com> Co-authored-by: octavia-squidington-iii <octavia-squidington-iii@users.noreply.github.com> Co-authored-by: Rodi Reich Zilberman <867491+rodireich@users.noreply.github.com> Co-authored-by: rodireich <rodireich@users.noreply.github.com> * connectors-ci: disable dependency scanning (airbytehq#29033) * updates (airbytehq#29059) * Metadata: skip breaking change validation on prerelease (airbytehq#29017) * skip breaking change validation * Move ValidatorOpts higher in call * Add prerelease test * Fix test * ✨ Source MongoDB Internal POC: Generate Test Data (airbytehq#29049) * Add script to generate test data * Fix prose * Update credentials example * PR feedback * Bump Airbyte version from 0.50.12 to 0.50.13 * Bump versions for mssql strict-encrypt (airbytehq#28964) * Bump versions for mssql strict-encrypt * Fix failing test * Fix failing test * 🎨 Improve replication method selection UX (airbytehq#28882) * update replication method in MySQL source * bump version * update expected specs * update registries * bump strict encrypt version * make password always_show * change url * update registries * 🐛 Avoid writing records to log (airbytehq#29047) * Avoid writing records to log * Update version * Rollout ctid cdc (airbytehq#28708) * source-postgres: enable ctid+cdc implementation * 100% ctid rollout for cdc * remove CtidFeatureFlags * fix CdcPostgresSourceAcceptanceTest * Bump versions and release notes * Fix compilation error due to previous merge --------- Co-authored-by: subodh <subodh1810@gmail.com> * connectors-ci: fix `unhashable type 'set'` (airbytehq#29064) * Add Slack Alert lifecycle to Dagster for Metadata publish (airbytehq#28759) * DNC * Add slack lifecycle logging * Update to use slack * Update slack to use resource and bot * Improve markdown * Improve log * Add sensor logging * Extend sensor time * merge conflict * PR Refactoring * Make the tests work * remove unnecessary classes, pr feedback * more merging * Update airbyte-integrations/bases/base-typing-deduping-test/src/main/java/io/airbyte/integrations/base/destination/typing_deduping/BaseSqlGeneratorIntegrationTest.java Co-authored-by: Edward Gao <edward.gao@airbyte.io> * snowflake updates --------- Co-authored-by: Ben Church <ben@airbyte.io> Co-authored-by: Baz <oleksandr.bazarnov@globallogic.com> Co-authored-by: Augustin <augustin@airbyte.io> Co-authored-by: Serhii Lazebnyi <53845333+lazebnyi@users.noreply.github.com> Co-authored-by: Joe Reuter <joe@airbyte.io> Co-authored-by: flash1293 <flash1293@users.noreply.github.com> Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com> Co-authored-by: Vasilis Gavriilidis <vasilis.gavriilidis@orfium.com> Co-authored-by: Jonathan Pearlin <jonathan@airbyte.io> Co-authored-by: Alexandre Girard <alexandre@airbyte.io> Co-authored-by: girarda <girarda@users.noreply.github.com> Co-authored-by: btkcodedev <btk.codedev@gmail.com> Co-authored-by: marcosmarxm <marcosmarxm@gmail.com> Co-authored-by: Natalie Kwong <38087517+nataliekwong@users.noreply.github.com> Co-authored-by: jdpgrailsdev <jdpgrailsdev@users.noreply.github.com> Co-authored-by: octavia-squidington-iii <octavia-squidington-iii@users.noreply.github.com> Co-authored-by: Rodi Reich Zilberman <867491+rodireich@users.noreply.github.com> Co-authored-by: rodireich <rodireich@users.noreply.github.com> Co-authored-by: Alexandre Cuoci <Hesperide@users.noreply.github.com> Co-authored-by: terencecho <terencecho@users.noreply.github.com> Co-authored-by: Lake Mossman <lake@airbyte.io> Co-authored-by: Benoit Moriceau <benoit@airbyte.io> Co-authored-by: subodh <subodh1810@gmail.com> Co-authored-by: Edward Gao <edward.gao@airbyte.io>
What
How
Source
interface that returns the individual stream iterators instead of a composite iteratorIntegrationRunner
to use the newSource
interface method. This new logic submits the consumption of records from each iterator in the list to a thread pool, passing each stream directly to theproduceMessages
method of theIntegrationRunner
in parallel.Source
interface method to return a list of iterators.Open Questions
FeatureFlags
class, which relies on environment variables. This will allow the platform to set the environment variable on the pod by interpreting the real feature flag client, which supports contexts (e.g. scoping to a connection or workspace). Open to suggestions on this front.Recommended reading order
Source.java
IntegrationRunner.java
ConcurrentStreamConsumer.java
MySqlSource.java
StreamStatusUtils.java