Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve error message for tables with invalid columns as cursor #15317

Merged
merged 12 commits into from
Sep 13, 2022

Conversation

subodh1810
Copy link
Contributor

@subodh1810 subodh1810 commented Aug 4, 2022

What

How

Describe the solution

Recommended reading order

  1. x.java
  2. y.python

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here
Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here
Connector Generator
  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed

Tests

Unit

Put your unit tests output here.

Integration

Put your integration tests output here.

Acceptance

Put your acceptance tests output here.

@subodh1810 subodh1810 changed the title Validate cursor fields before sync improve error message for tables with invalid columns as cursor Aug 16, 2022
@subodh1810 subodh1810 marked this pull request as ready for review August 16, 2022 11:48
@subodh1810 subodh1810 requested a review from a team as a code owner August 16, 2022 11:48
@subodh1810 subodh1810 requested review from tuliren and edgao August 16, 2022 11:48
@subodh1810
Copy link
Contributor Author

subodh1810 commented Aug 16, 2022

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2867669255
❌ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2867669255
🐛 https://gradle.com/s/tf2sirgwr4ndu

Build Failed

Test summary info:

Could not find result summary

@subodh1810
Copy link
Contributor Author

/test connector=connectors/source-mysql

@subodh1810 subodh1810 temporarily deployed to more-secrets August 16, 2022 11:50 Inactive
Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, added a few minor comments.

@Override
public boolean isValidCursorType(final MysqlType cursorType) {
return switch (cursorType) {
case BIT, BOOLEAN, TINYINT, TINYINT_UNSIGNED, SMALLINT, SMALLINT_UNSIGNED, MEDIUMINT, MEDIUMINT_UNSIGNED, INT, INT_UNSIGNED, BIGINT, BIGINT_UNSIGNED, FLOAT, FLOAT_UNSIGNED, DOUBLE, DOUBLE_UNSIGNED, DECIMAL, DECIMAL_UNSIGNED, DATE, DATETIME, TIMESTAMP, TIME, YEAR, CHAR, VARCHAR, TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT, ENUM, SET, TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB, BINARY, VARBINARY -> true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think of this style?

return switch (cursorType) {
  case <mysql-specific types> -> true;
  default -> super.isValidCursorType(cursorType);
};

I.e. only defining the types that aren't handled by JdbcSourceOperations already. (not a huge win in this case, but I think it would make e.g. getJsonType much nicer)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach as well, reduces replication of cursor types but it does make it harder to know the full list of supported cursor types without looking into the super class. For extensibility though this seems better

public record InvalidCursorInfo(String tableName, String cursorColumnName, String cursorSqlType) {

@Override
public String toString() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nitpick: I'd prefer to define a new method prettyString instead of overriding toString (mostly in case someone actually wants to print one of these out for debugging or something)

public class InvalidCursorException extends RuntimeException {

public InvalidCursorException(final List<InvalidCursorInfo> tablesWithInvalidCursor) {
super("The following tables have invalid columns selected as cursor, please select a valid column as a cursor. " + tablesWithInvalidCursor.stream().map(InvalidCursorInfo::toString)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: maybe replace "a valid column" with "a column with a well-defined ordering"? So that it's clear why the cursor isn't valid

stream.getName());
final boolean hasSourceDefinedCursor =
!Objects.isNull(airbyteStream.getStream().getSourceDefinedCursor()) && airbyteStream.getStream().getSourceDefinedCursor();
if (!tableNameToTable.containsKey(fullyQualifiedTableName) || airbyteStream.getSyncMode() != SyncMode.INCREMENTAL || hasSourceDefinedCursor) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for my understanding: is there any situation where tableNameToTable.containsKey(fullyQualifiedTableName) isn't true? (totally fair to have this condition either way, I'm just curious how tableNameToTable gets populated)

.findFirst()
.orElseThrow();

if (isValidCursorType(cursorType)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: this seems a little bit nicer (avoids using continue)

if (!isValidCursorType(cursorType)) {
  tablesWithInvalidCursor.add(new InvalidCursorInfo(fullyQualifiedTableName, cursorField, cursorType.toString()));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this comment

@@ -142,6 +146,41 @@ public AutoCloseableIterator<AirbyteMessage> read(final JsonNode config,
});
}

private void validateCursorFieldForIncrementalTables(final Map<String, TableInfo<CommonField<DataType>>> tableNameToTable, final ConfiguredAirbyteCatalog catalog) {
Copy link
Contributor

@ryankfu ryankfu Aug 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm mistaken, shouldn't this method have throws InvalidCursorException at the end to indicate that it could throw an Exception?

Another note is can we have a short javadoc comment that says something along the lines of

/**
 * Creates a list of incremental tables with invalid cursor columns (e.g. non-numeric types). Will also throw 
 * `InvalidCursorException` if at least one table includes an invalid cursor type
 */

EDIT: After chatting with Ed on this, since InvalidCursorException extends RuntimeException you don't need to define this in the method, preference would still be to have this defined either as a javadoc comment or within the method to just know in case the future someone wants to catch this Exception

@Override
public boolean isValidCursorType(final MysqlType cursorType) {
return switch (cursorType) {
case BIT, BOOLEAN, TINYINT, TINYINT_UNSIGNED, SMALLINT, SMALLINT_UNSIGNED, MEDIUMINT, MEDIUMINT_UNSIGNED, INT, INT_UNSIGNED, BIGINT, BIGINT_UNSIGNED, FLOAT, FLOAT_UNSIGNED, DOUBLE, DOUBLE_UNSIGNED, DECIMAL, DECIMAL_UNSIGNED, DATE, DATETIME, TIMESTAMP, TIME, YEAR, CHAR, VARCHAR, TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT, ENUM, SET, TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB, BINARY, VARBINARY -> true;
Copy link
Contributor

@ryankfu ryankfu Aug 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a large case, would like to see if this can simply be reordered lexicographically to more easily parse out the options

EDIT: chatting with Ed on this I'm not hard set on lexicographically sorting since it can also be easier for people that look at the data as grouped sets of values, may like to still consider how "readable" this would be for someone looking to understand which types are supported though

@Override
public boolean isValidCursorType(final JDBCType cursorType) {
return switch (cursorType) {
case TIMESTAMP, TIME, DATE, BIT, BOOLEAN, TINYINT, SMALLINT, INTEGER, BIGINT, FLOAT, DOUBLE, REAL, NUMERIC, DECIMAL, CHAR, NCHAR, NVARCHAR, VARCHAR, LONGVARCHAR, BINARY, BLOB -> true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous comment, would like to see if this can be reordered lexicographically to more easily know if a cursor type is supported

@Override
public boolean isValidCursorType(final MysqlType cursorType) {
return switch (cursorType) {
case BIT, BOOLEAN, TINYINT, TINYINT_UNSIGNED, SMALLINT, SMALLINT_UNSIGNED, MEDIUMINT, MEDIUMINT_UNSIGNED, INT, INT_UNSIGNED, BIGINT, BIGINT_UNSIGNED, FLOAT, FLOAT_UNSIGNED, DOUBLE, DOUBLE_UNSIGNED, DECIMAL, DECIMAL_UNSIGNED, DATE, DATETIME, TIMESTAMP, TIME, YEAR, CHAR, VARCHAR, TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT, ENUM, SET, TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB, BINARY, VARBINARY -> true;
Copy link
Contributor

@ryankfu ryankfu Aug 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: reorder values lexicographically for easier parsing and also to have the

// since cursor are expected to be comparable, ...

comment in the abstractDbSource method since that's the top-level where these methods override from

Copy link
Contributor

@ryankfu ryankfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question regarding whether validateCursorFieldForIncrmentalTables should include throws InvalidCursorException since the last part of the logic will throw one when an incremental table with invalid cursor exists

Minor nitpick items on the case values being reordered to be more easily parsable and move the comment for supported cursor types to the top-level abstractDbSource class and JdbcCompatibleSourceOperations class

# Conflicts:
#	airbyte-db/db-lib/src/main/java/io/airbyte/db/JdbcCompatibleSourceOperations.java
@subodh1810 subodh1810 temporarily deployed to more-secrets August 22, 2022 17:08 Inactive
@github-actions
Copy link
Contributor

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to run corresponding integration tests:

  • source-snowflake
  • source-db2
  • source-mongodb-strict-encrypt
  • source-oracle-strict-encrypt
  • source-tidb
  • source-bigquery
  • source-mysql
  • source-mysql-strict-encrypt
  • source-mongodb-v2
  • source-oracle
  • source-mssql
  • source-cockroachdb-strict-encrypt
  • source-scaffold-java-jdbc
  • source-mssql-strict-encrypt
  • source-cockroachdb
  • source-clickhouse-strict-encrypt
  • source-db2-strict-encrypt
  • source-jdbc
  • source-redshift
  • source-postgres-strict-encrypt
  • source-postgres
  • source-clickhouse

@subodh1810 subodh1810 temporarily deployed to more-secrets August 25, 2022 06:35 Inactive
@subodh1810 subodh1810 requested review from ryankfu and grishick August 25, 2022 06:39
@github-actions
Copy link
Contributor

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to run corresponding integration tests:

  • source-mysql
  • source-clickhouse-strict-encrypt
  • source-cockroachdb-strict-encrypt
  • source-postgres-strict-encrypt
  • source-clickhouse
  • source-mssql
  • source-mssql-strict-encrypt
  • source-oracle-strict-encrypt
  • source-mysql-strict-encrypt
  • source-scaffold-java-jdbc
  • source-postgres
  • source-jdbc
  • source-redshift
  • source-db2
  • source-oracle
  • source-mongodb-v2
  • source-snowflake
  • source-db2-strict-encrypt
  • source-bigquery
  • source-tidb
  • source-mongodb-strict-encrypt
  • source-cockroachdb

@subodh1810 subodh1810 temporarily deployed to more-secrets August 31, 2022 08:43 Inactive
@subodh1810
Copy link
Contributor Author

subodh1810 commented Aug 31, 2022

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2964101455
❌ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2964101455
🐛 https://gradle.com/s/cyhuy5l2eilyu

Build Failed

Test summary info:

Could not find result summary

@subodh1810
Copy link
Contributor Author

subodh1810 commented Aug 31, 2022

/test connector=connectors/source-mysql

🕑 connectors/source-mysql https://github.com/airbytehq/airbyte/actions/runs/2964102867
✅ connectors/source-mysql https://github.com/airbytehq/airbyte/actions/runs/2964102867
No Python unittests run

Build Passed

Test summary info:

All Passed

@subodh1810
Copy link
Contributor Author

/test connector=connectors/source-postgres

@github-actions
Copy link
Contributor

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to run corresponding integration tests:

  • source-mongodb-v2
  • source-mssql-strict-encrypt
  • source-mysql-strict-encrypt
  • source-jdbc
  • source-oracle
  • source-clickhouse-strict-encrypt
  • source-tidb
  • source-cockroachdb-strict-encrypt
  • source-postgres-strict-encrypt
  • source-oracle-strict-encrypt
  • source-clickhouse
  • source-cockroachdb
  • source-mysql
  • source-mongodb-strict-encrypt
  • source-snowflake
  • source-postgres
  • source-redshift
  • source-mssql
  • source-bigquery
  • source-db2
  • source-db2-strict-encrypt
  • source-scaffold-java-jdbc

@subodh1810 subodh1810 temporarily deployed to more-secrets August 31, 2022 15:38 Inactive
@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2022

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to run corresponding integration tests:

  • source-mssql
  • source-db2-strict-encrypt
  • source-snowflake
  • source-jdbc
  • source-scaffold-java-jdbc
  • source-alloydb
  • source-oracle
  • source-bigquery
  • source-mssql-strict-encrypt
  • source-tidb
  • source-mongodb-strict-encrypt
  • source-cockroachdb-strict-encrypt
  • source-clickhouse
  • source-mysql-strict-encrypt
  • source-redshift
  • source-db2
  • source-mongodb-v2
  • source-mysql
  • source-clickhouse-strict-encrypt
  • source-postgres
  • source-postgres-strict-encrypt
  • source-cockroachdb
  • source-oracle-strict-encrypt

@subodh1810 subodh1810 temporarily deployed to more-secrets September 7, 2022 12:35 Inactive
@subodh1810
Copy link
Contributor Author

subodh1810 commented Sep 7, 2022

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/3009305842
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/3009305842
No Python unittests run

Build Passed

Test summary info:

All Passed

@subodh1810
Copy link
Contributor Author

subodh1810 commented Sep 7, 2022

/test connector=connectors/source-mysql

🕑 connectors/source-mysql https://github.com/airbytehq/airbyte/actions/runs/3009307137
✅ connectors/source-mysql https://github.com/airbytehq/airbyte/actions/runs/3009307137
No Python unittests run

Build Passed

Test summary info:

All Passed

@subodh1810
Copy link
Contributor Author

subodh1810 commented Sep 12, 2022

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/3036336715
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/3036336715
No Python unittests run

Build Passed

Test summary info:

All Passed

@subodh1810
Copy link
Contributor Author

subodh1810 commented Sep 12, 2022

/test connector=connectors/source-mysql

🕑 connectors/source-mysql https://github.com/airbytehq/airbyte/actions/runs/3036338716
✅ connectors/source-mysql https://github.com/airbytehq/airbyte/actions/runs/3036338716
No Python unittests run

Build Passed

Test summary info:

All Passed

@github-actions
Copy link
Contributor

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to run corresponding integration tests:

  • source-cockroachdb-strict-encrypt
  • source-redshift
  • source-alloydb
  • source-mysql
  • source-postgres
  • source-oracle
  • source-bigquery
  • source-clickhouse
  • source-snowflake
  • source-db2-strict-encrypt
  • source-cockroachdb
  • source-mongodb-strict-encrypt
  • source-postgres-strict-encrypt
  • source-mssql-strict-encrypt
  • source-mongodb-v2
  • source-mssql
  • source-scaffold-java-jdbc
  • source-clickhouse-strict-encrypt
  • source-mysql-strict-encrypt
  • source-jdbc
  • source-oracle-strict-encrypt
  • source-db2
  • source-tidb

@subodh1810 subodh1810 merged commit 43076ee into master Sep 13, 2022
@subodh1810 subodh1810 deleted the validate-cursor-fields-before-sync branch September 13, 2022 06:36
robbinhan pushed a commit to robbinhan/airbyte that referenced this pull request Sep 29, 2022
…ytehq#15317)

* implement validation for cursor type before reading data

* rename class

* add test

* fix merge conflicts

* address review comments

* fix test
jhammarstedt pushed a commit to jhammarstedt/airbyte that referenced this pull request Oct 31, 2022
…ytehq#15317)

* implement validation for cursor type before reading data

* rename class

* add test

* fix merge conflicts

* address review comments

* fix test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JDBC sources: Improve error message when an invalid cursor column is selected
5 participants