Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDBC Sources: validate actual source schema #21844

Merged

Conversation

VitaliiMaltsev
Copy link
Contributor

@VitaliiMaltsev VitaliiMaltsev commented Jan 25, 2023

What

From 21413
If the schema of the source was manually changed by the user and the structure of the table was changed (columns were added, data types were changed), but the user did not refresh and did not save the new schema in airbyte-db, the sync may fail

How

Validate actual source schema vs catalog schema and log warning if the source schema was changed
schema-changed

Recommended reading order

  1. AbstractDbSource.java

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here
Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here
Connector Generator
  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed

Tests

Unit

Put your unit tests output here.

Integration

Put your integration tests output here.

Acceptance

Put your acceptance tests output here.

@VitaliiMaltsev
Copy link
Contributor Author

VitaliiMaltsev commented Jan 25, 2023

/test connector=connectors/source-clickhouse

🕑 connectors/source-clickhouse https://github.com/airbytehq/airbyte/actions/runs/4006580472
✅ connectors/source-clickhouse https://github.com/airbytehq/airbyte/actions/runs/4006580472
Python tests coverage:

	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          12      4    67%   16-19
	 source_acceptance_test/config.py                       141      5    96%   87, 93, 239, 243-244
	 source_acceptance_test/conftest.py                     211     95    55%   36, 42-44, 49, 54, 77, 83, 89-91, 110, 115-117, 123-125, 131-132, 137-138, 143, 149, 158-167, 173-178, 193, 217, 248, 254, 262-267, 275-285, 293-306, 311-317, 324-335, 342-358
	 source_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 source_acceptance_test/tests/test_core.py              402    115    71%   53, 58, 93-104, 109-116, 120-121, 125-126, 308, 346-363, 376-387, 391-396, 402, 435-440, 478-485, 528-530, 533, 598-606, 618-621, 626, 682-683, 689, 692, 728-738, 751-776
	 source_acceptance_test/tests/test_incremental.py       160     14    91%   58-65, 70-83, 246
	 source_acceptance_test/utils/asserts.py                 39      2    95%   62-63
	 source_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       133     33    75%   24-27, 46-47, 50-54, 57-58, 73-75, 78-80, 83-85, 88-90, 93-95, 124-125, 159-161, 208
	 source_acceptance_test/utils/json_schema_helper.py     107     13    88%   30-31, 38, 41, 65-68, 96, 120, 192-194
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1609    339    79%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestConnection.test_check: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestDiscovery.test_discover: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestBasicRead.test_read: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestFullRefresh.test_sequential_reads: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestIncremental.test_two_sequential_reads: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:94: The previous and actual specifications are identical.
================= 13 passed, 6 skipped, 21 warnings in 15.56s ==================

@VitaliiMaltsev
Copy link
Contributor Author

VitaliiMaltsev commented Jan 25, 2023

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/4006581521
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/4006581521
Python tests coverage:

	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          12      4    67%   16-19
	 source_acceptance_test/config.py                       141      5    96%   87, 93, 239, 243-244
	 source_acceptance_test/conftest.py                     211     95    55%   36, 42-44, 49, 54, 77, 83, 89-91, 110, 115-117, 123-125, 131-132, 137-138, 143, 149, 158-167, 173-178, 193, 217, 248, 254, 262-267, 275-285, 293-306, 311-317, 324-335, 342-358
	 source_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 source_acceptance_test/tests/test_core.py              402    115    71%   53, 58, 93-104, 109-116, 120-121, 125-126, 308, 346-363, 376-387, 391-396, 402, 435-440, 478-485, 528-530, 533, 598-606, 618-621, 626, 682-683, 689, 692, 728-738, 751-776
	 source_acceptance_test/tests/test_incremental.py       160     14    91%   58-65, 70-83, 246
	 source_acceptance_test/utils/asserts.py                 39      2    95%   62-63
	 source_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       133     33    75%   24-27, 46-47, 50-54, 57-58, 73-75, 78-80, 83-85, 88-90, 93-95, 124-125, 159-161, 208
	 source_acceptance_test/utils/json_schema_helper.py     107     13    88%   30-31, 38, 41, 65-68, 96, 120, 192-194
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1609    339    79%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:94: The previous and actual specifications are identical.
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:377: The previous and actual discovered catalogs are identical.
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_incremental.py:22: `future_state` has a bypass reason, skipping.
================== 54 passed, 6 skipped in 716.78s (0:11:56) ===================

@VitaliiMaltsev
Copy link
Contributor Author

VitaliiMaltsev commented Jan 25, 2023

/test connector=connectors/source-mysql

🕑 connectors/source-mysql https://github.com/airbytehq/airbyte/actions/runs/4006583017
✅ connectors/source-mysql https://github.com/airbytehq/airbyte/actions/runs/4006583017
Python tests coverage:

	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          12      4    67%   16-19
	 source_acceptance_test/config.py                       141      5    96%   87, 93, 239, 243-244
	 source_acceptance_test/conftest.py                     211     95    55%   36, 42-44, 49, 54, 77, 83, 89-91, 110, 115-117, 123-125, 131-132, 137-138, 143, 149, 158-167, 173-178, 193, 217, 248, 254, 262-267, 275-285, 293-306, 311-317, 324-335, 342-358
	 source_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 source_acceptance_test/tests/test_core.py              402    115    71%   53, 58, 93-104, 109-116, 120-121, 125-126, 308, 346-363, 376-387, 391-396, 402, 435-440, 478-485, 528-530, 533, 598-606, 618-621, 626, 682-683, 689, 692, 728-738, 751-776
	 source_acceptance_test/tests/test_incremental.py       160     14    91%   58-65, 70-83, 246
	 source_acceptance_test/utils/asserts.py                 39      2    95%   62-63
	 source_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       133     33    75%   24-27, 46-47, 50-54, 57-58, 73-75, 78-80, 83-85, 88-90, 93-95, 124-125, 159-161, 208
	 source_acceptance_test/utils/json_schema_helper.py     107     13    88%   30-31, 38, 41, 65-68, 96, 120, 192-194
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1609    339    79%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestConnection.test_check: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestDiscovery.test_discover: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestBasicRead.test_read: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestFullRefresh.test_sequential_reads: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestIncremental.test_two_sequential_reads: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:94: The previous and actual specifications are identical.
================= 13 passed, 6 skipped, 21 warnings in 15.88s ==================

@VitaliiMaltsev
Copy link
Contributor Author

VitaliiMaltsev commented Jan 25, 2023

/test connector=connectors/source-redshift

🕑 connectors/source-redshift https://github.com/airbytehq/airbyte/actions/runs/4006584977
✅ connectors/source-redshift https://github.com/airbytehq/airbyte/actions/runs/4006584977
Python tests coverage:

	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          12      4    67%   16-19
	 source_acceptance_test/config.py                       141      5    96%   87, 93, 239, 243-244
	 source_acceptance_test/conftest.py                     211     95    55%   36, 42-44, 49, 54, 77, 83, 89-91, 110, 115-117, 123-125, 131-132, 137-138, 143, 149, 158-167, 173-178, 193, 217, 248, 254, 262-267, 275-285, 293-306, 311-317, 324-335, 342-358
	 source_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 source_acceptance_test/tests/test_core.py              402    115    71%   53, 58, 93-104, 109-116, 120-121, 125-126, 308, 346-363, 376-387, 391-396, 402, 435-440, 478-485, 528-530, 533, 598-606, 618-621, 626, 682-683, 689, 692, 728-738, 751-776
	 source_acceptance_test/tests/test_incremental.py       160     14    91%   58-65, 70-83, 246
	 source_acceptance_test/utils/asserts.py                 39      2    95%   62-63
	 source_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       133     33    75%   24-27, 46-47, 50-54, 57-58, 73-75, 78-80, 83-85, 88-90, 93-95, 124-125, 159-161, 208
	 source_acceptance_test/utils/json_schema_helper.py     107     13    88%   30-31, 38, 41, 65-68, 96, 120, 192-194
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1609    339    79%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:94: The previous and actual specifications are identical.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:377: The previous and actual discovered catalogs are identical.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_incremental.py:30: `future_state` not specified, skipping.
================== 27 passed, 3 skipped in 156.66s (0:02:36) ===================

@VitaliiMaltsev
Copy link
Contributor Author

VitaliiMaltsev commented Jan 25, 2023

/test connector=connectors/source-snowflake

🕑 connectors/source-snowflake https://github.com/airbytehq/airbyte/actions/runs/4006586479
✅ connectors/source-snowflake https://github.com/airbytehq/airbyte/actions/runs/4006586479
Python tests coverage:

	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          12      4    67%   16-19
	 source_acceptance_test/config.py                       141      5    96%   87, 93, 239, 243-244
	 source_acceptance_test/conftest.py                     211     95    55%   36, 42-44, 49, 54, 77, 83, 89-91, 110, 115-117, 123-125, 131-132, 137-138, 143, 149, 158-167, 173-178, 193, 217, 248, 254, 262-267, 275-285, 293-306, 311-317, 324-335, 342-358
	 source_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 source_acceptance_test/tests/test_core.py              402    115    71%   53, 58, 93-104, 109-116, 120-121, 125-126, 308, 346-363, 376-387, 391-396, 402, 435-440, 478-485, 528-530, 533, 598-606, 618-621, 626, 682-683, 689, 692, 728-738, 751-776
	 source_acceptance_test/tests/test_incremental.py       160     14    91%   58-65, 70-83, 246
	 source_acceptance_test/utils/asserts.py                 39      2    95%   62-63
	 source_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       133     33    75%   24-27, 46-47, 50-54, 57-58, 73-75, 78-80, 83-85, 88-90, 93-95, 124-125, 159-161, 208
	 source_acceptance_test/utils/json_schema_helper.py     107     13    88%   30-31, 38, 41, 65-68, 96, 120, 192-194
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1609    339    79%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:94: The previous and actual specifications are identical.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:377: The previous and actual discovered catalogs are identical.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_incremental.py:30: `future_state` not specified, skipping.
=================== 27 passed, 3 skipped in 63.70s (0:01:03) ===================

@github-actions
Copy link
Contributor

github-actions bot commented Jan 25, 2023

Affected Connector Report

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to do the following as needed:

  • Run integration tests
  • Bump connector or module version
  • Add changelog
  • Publish the new version

✅ Sources (25)

Connector Version Changelog Publish
source-alloydb 1.0.36
source-alloydb-strict-encrypt 1.0.36 🔵
(ignored)
🔵
(ignored)
source-bigquery 0.2.3
source-clickhouse 0.1.15
source-clickhouse-strict-encrypt 0.1.15 🔵
(ignored)
🔵
(ignored)
source-cockroachdb 0.1.19
source-cockroachdb-strict-encrypt 0.1.19 🔵
(ignored)
🔵
(ignored)
source-db2 0.1.17
source-db2-strict-encrypt 0.1.17 🔵
(ignored)
🔵
(ignored)
source-dynamodb 0.1.0
source-jdbc 0.3.5 🔵
(ignored)
🔵
(ignored)
source-mongodb-strict-encrypt 0.1.19 🔵
(ignored)
🔵
(ignored)
source-mongodb-v2 0.1.19
source-mssql 0.4.28
source-mssql-strict-encrypt 0.4.28 🔵
(ignored)
🔵
(ignored)
source-mysql 1.0.21
source-mysql-strict-encrypt 1.0.21 🔵
(ignored)
🔵
(ignored)
source-oracle 0.3.22
source-oracle-strict-encrypt 0.3.22 🔵
(ignored)
🔵
(ignored)
source-postgres 1.0.42
source-postgres-strict-encrypt 1.0.42 🔵
(ignored)
🔵
(ignored)
source-redshift 0.3.16
source-scaffold-java-jdbc 0.1.0 🔵
(ignored)
🔵
(ignored)
source-snowflake 0.1.29
source-tidb 0.2.2
  • See "Actionable Items" below for how to resolve warnings and errors.

✅ Destinations (0)

Connector Version Changelog Publish
  • See "Actionable Items" below for how to resolve warnings and errors.

✅ Other Modules (0)

Actionable Items

(click to expand)

Category Status Actionable Item
Version
mismatch
The version of the connector is different from its normal variant. Please bump the version of the connector.

doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.
Changelog
doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.

changelog missing
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog.
Publish
not in seed
The connector is not in the seed file (e.g. source_definitions.yaml), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug.

diff seed version
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version.


final JsonNode catalogSchema = stream.getJsonSchema();
if (!catalogSchema.equals(currentJsonSchema)) {
LOGGER.warn("Source schema changed for table {}! Actual schema: {}. Catalog schema: {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an error state?
Should we do something other than warn?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Can we actually throw a ConfigErrorException here so that this is actionable to the user? With a message maybe : "The underlying schema changed for the table. Please refresh your source schema! Source schema changed for table {}! Actual schema: {}. Catalog schema: {}"

Along those lines. In addition, could you add a unit test as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rodireich @akashkulk
I'm not sure that throwing an exception here would be correct in all cases, the sync would fail if the datatype changed from integer to float, but if the change were the reverse (from float to integer), and all values in the database contained only integers, then the sync would be successful, also if the user manually added some columns - the sync will be successful, but in some cases when the column is deleted - the sync will fail
I think that there may be other possible variations of a successful/unsuccessful sync, for example, using an incremental mode, but at the same time, throwing an exception here will lead to a complete fail of the sync, while only one stream can fail, for which the user manually changed structure in the database, and all other streams should not fail
WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So some changes can go through but other will require to refresh the source schema.
My question:
for non-breaking change (e.g user adds a column) - can we keep on syncing indefinitely with the old schema?
for breaking change (e.g user removes a column) - at what stage will the sync fail, is it deterministic or depending on sync type and other factors.

I think this change is making it better for identifying a state we're unaware of today, until something breaks. I wonder if we there's a good way to know what to do? (maybe it's impossible).
+1 to Akash's suggestion for adding a test

Copy link
Contributor Author

@VitaliiMaltsev VitaliiMaltsev Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rodireich added the unit test

Answering your questions:
for non-breaking change - yes, sync could be indefinitely with the old schema
for breaking change - i'm not sure that we could determine the exact stage of sync failure . Potentially - it could be

  1. On normalization step as described here
  2. During WRITE(avro values mapping to the old avro schema or during json schema formatting in BigQuery Denormalized)
  3. During READ (at any place where old catalog used)

@rodireich
Copy link
Contributor

@VitaliiMaltsev the change looks good.
I just wonder about the entire user flow - should we update the catalog in this case (run a discover).
should we stop the sync and ask the user to intervention?

Otherwise the warning would keep getting logged but connection would still fail eventually, right?

@VitaliiMaltsev VitaliiMaltsev requested a review from a team as a code owner February 1, 2023 13:10
@VitaliiMaltsev VitaliiMaltsev temporarily deployed to more-secrets February 1, 2023 13:12 — with GitHub Actions Inactive
@VitaliiMaltsev VitaliiMaltsev temporarily deployed to more-secrets February 1, 2023 13:12 — with GitHub Actions Inactive
@github-actions
Copy link
Contributor

github-actions bot commented Feb 1, 2023

Airbyte Code Coverage

There is no coverage information present for the Files changed

Total Project Coverage 24.52%

@VitaliiMaltsev
Copy link
Contributor Author

VitaliiMaltsev commented Feb 1, 2023

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/4065070311
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/4065070311
Python tests coverage:

	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          12      4    67%   16-19
	 source_acceptance_test/config.py                       141      5    96%   87, 93, 239, 243-244
	 source_acceptance_test/conftest.py                     211     95    55%   36, 42-44, 49, 54, 77, 83, 89-91, 110, 115-117, 123-125, 131-132, 137-138, 143, 149, 158-167, 173-178, 193, 217, 248, 254, 262-267, 275-285, 293-306, 311-317, 324-335, 342-358
	 source_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 source_acceptance_test/tests/test_core.py              476    117    75%   53, 58, 97-108, 113-120, 124-125, 129-130, 380, 400, 438, 476-493, 506-517, 521-526, 532, 565-570, 608-615, 658-660, 663, 728-736, 748-751, 756, 812-813, 819, 822, 858-868, 881-906
	 source_acceptance_test/tests/test_incremental.py       160     14    91%   58-65, 70-83, 246
	 source_acceptance_test/utils/asserts.py                 39      2    95%   62-63
	 source_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       133     33    75%   24-27, 46-47, 50-54, 57-58, 73-75, 78-80, 83-85, 88-90, 93-95, 124-125, 159-161, 208
	 source_acceptance_test/utils/json_schema_helper.py     114     13    89%   31-32, 39, 42, 66-69, 97, 121, 203-205
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1690    341    80%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:98: The previous and actual specifications are identical.
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:507: The previous and actual discovered catalogs are identical.
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_incremental.py:22: `future_state` has a bypass reason, skipping.
================== 66 passed, 6 skipped in 717.64s (0:11:57) ===================

@github-actions
Copy link
Contributor

github-actions bot commented Feb 1, 2023

Platform Test Results

   243 files  ±0     243 suites  ±0   23m 50s ⏱️ - 3m 10s
1 663 tests ±0  1 652 ✔️ ±0  11 💤 ±0  0 ±0 
1 682 runs  ±0  1 671 ✔️ ±0  11 💤 ±0  0 ±0 

Results for commit b19641f. ± Comparison against base commit ee00fcc.

♻️ This comment has been updated with latest results.

@@ -180,6 +183,35 @@ public AutoCloseableIterator<AirbyteMessage> read(final JsonNode config,
});
}

private void validateSourceSchema(Map<String, TableInfo<CommonField<DataType>>> fullyQualifiedTableNameToInfo,
ConfiguredAirbyteCatalog catalog,
Database database) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove passing in the Database parameter. It's not being used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove passing in the Database parameter. It's not being used

removed

Copy link
Contributor

@akashkulk akashkulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The platform already has an auto detect schema change feature which will catch these schema drifts, so I'm not sure if we need this logging.

@@ -253,6 +262,45 @@ public void testCanReadUtf8() throws Exception {
}
}

@Test
public void testValuesChangedOnChangedSourceSchema() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what's being tested in this test: IIUC this test is verifying that if the underlying schema is changed, the records read will be different

But won't this always be true? Across two different reads, even with without the changes in this PR the read will always return different records

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what's being tested in this test: IIUC this test is verifying that if the underlying schema is changed, the records read will be different

But won't this always be true? Across two different reads, even with without the changes in this PR the read will always return different records

updated this test with 4 different reads with the same catalog:
1 read - initial sync (integers)
2 read - nothing changed ==> values are same as in 1 read (integers)
3 read - added 1 record into the source table ==> added 1 airbyte record (integers)
4 read - user changed column definition from int to float ==> values in airbyte records become double

This test fully emulate behaviour from this issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you aren't actually checking for the actual change you're making here, correct? i.e. you aren't verifying the correct logs are being outputted.

In that case, I think you should remove this test altogether since that change is a logging one (and not a validation)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you aren't actually checking for the actual change you're making here, correct? i.e. you aren't verifying the correct logs are being outputted.

In that case, I think you should remove this test altogether since that change is a logging one (and not a validation)

removed the test

@VitaliiMaltsev VitaliiMaltsev temporarily deployed to more-secrets February 2, 2023 09:55 — with GitHub Actions Inactive
@VitaliiMaltsev VitaliiMaltsev temporarily deployed to more-secrets February 2, 2023 09:55 — with GitHub Actions Inactive
@VitaliiMaltsev
Copy link
Contributor Author

The platform already has an auto detect schema change feature which will catch these schema drifts, so I'm not sure if we need this logging.

I did not find any platform logging if the source schema changed with the comparison actual schema vs catalog shema. Where can i find it?

@@ -180,6 +183,34 @@ public AutoCloseableIterator<AirbyteMessage> read(final JsonNode config,
});
}

private void validateSourceSchema(Map<String, TableInfo<CommonField<DataType>>> fullyQualifiedTableNameToInfo,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be changed to be named logSourceSchemaChange

validate implies that if it isn't the same you'd be throwing an error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add a quick comment on what this function is doing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add a quick comment on what this function is doing

renamed this method and added a comment

@akashkulk
Copy link
Contributor

It's not in the logs. However, there is a new Review changes interstitial introduced. https://www.loom.com/i/e17df917d481406faed839595d3e97c4

@VitaliiMaltsev VitaliiMaltsev merged commit 04b9703 into master Feb 8, 2023
@VitaliiMaltsev VitaliiMaltsev deleted the vmaltsev/21413-jdbc-read-warn-if-schema-changed branch February 8, 2023 10:32
danidelvalle pushed a commit to danidelvalle/airbyte that referenced this pull request Feb 9, 2023
* JDBC Sources: validate actual source schema

* add unit test

* updated test cases

* refactoring
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bigquery normalization failing with Bad int64 value: 1.0
5 participants