Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add previously missing cursor types to JDBC utils. #2600

Merged
merged 7 commits into from
Mar 29, 2021

Conversation

davinchia
Copy link
Contributor

@davinchia davinchia commented Mar 24, 2021

What

Incremental syncs using NVARCHAR as the key was previously failing since we did not recognise the type.

This was reported by a user.

Take the chance to fix wrong casting of JDBC integers to JSON Schema Numbers type, that is resulting in the data being transformed as a float after normalisation.

How

Add the missing types in.

Bump all connector versions that use the class:

  • source-mssql
  • source-mysql
  • source-redshift
  • source-postgres
  • destination-postgres
  • desintation-redshift
  • destination-snowflake

Pre-merge Checklist

  • Run integration tests
  • Publish Docker images

Recommended reading order

  1. JdbcUtils.java

@davinchia davinchia marked this pull request as ready for review March 24, 2021 07:38
@davinchia davinchia removed the request for review from michel-tricot March 24, 2021 07:38
@davinchia
Copy link
Contributor Author

davinchia commented Mar 24, 2021

/test connector=connectors/source-mssql

🕑 connectors/source-mssql https://github.com/airbytehq/airbyte/actions/runs/682454352
✅ connectors/source-mssql https://github.com/airbytehq/airbyte/actions/runs/682454352

@davinchia
Copy link
Contributor Author

davinchia commented Mar 24, 2021

/test connector=connectors/source-mysql

🕑 connectors/source-mysql https://github.com/airbytehq/airbyte/actions/runs/682459951
✅ connectors/source-mysql https://github.com/airbytehq/airbyte/actions/runs/682459951

@davinchia
Copy link
Contributor Author

davinchia commented Mar 24, 2021

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/682460279
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/682460279

@davinchia
Copy link
Contributor Author

davinchia commented Mar 24, 2021

/test connector=connectors/source-redshift

🕑 connectors/source-redshift https://github.com/airbytehq/airbyte/actions/runs/682460999
✅ connectors/source-redshift https://github.com/airbytehq/airbyte/actions/runs/682460999

@ChristopheDuong
Copy link
Contributor

Oh great, since you are tackling this, could you maybe add support for integer as well?

Thanks!

#1006 (comment)

@@ -2,6 +2,6 @@
"sourceDefinitionId": "decd338e-5647-4c0b-adf4-da0e75f5a750",
"name": "Postgres",
"dockerRepository": "airbyte/source-postgres",
"dockerImageTag": "0.2.1",
"dockerImageTag": "0.2.2",
Copy link
Contributor

@ChristopheDuong ChristopheDuong Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's something we don't handle very well at the moment is conflict in connector's versions being bumped in multiple branches at the same time... I'm not sure how we should resolve this though.

So when you'll try to bump versions of those connectors, it's going to fail because 0.2.2 has already been published but not merged yet...:

https://hub.docker.com/r/airbyte/source-postgres/tags?page=1&ordering=name

coming from #2460

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense. will coordinate with artem!

.put("bigint", JsonSchemaPrimitive.NUMBER)
.put("smallint", JsonSchemaPrimitive.INTEGER)
.put("int", JsonSchemaPrimitive.INTEGER)
.put("bigint", JsonSchemaPrimitive.INTEGER)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we expect destinations to handle all INTEGER as longs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Do we need to add integer in this PR?
  2. How good to we feel that everything still works after adding a new type? Just running the tests is insufficient in this case because none of the tests are testing for integer type. So it seems like we need to add tests for sources, destinations, etc to make sure that this gets supported sanely. We can do this, but that's a bigger PR (which goes back to question 1).

My feeling is as is, this is yoloing a kinda big change. So we should know why it is valuable right now and understand why we are yoloing it versus doing it more carefully with adequate testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normalization is converting a JsonSchemaPrimitive.INTEGER into dbt_utils.type_bigint (the largest integer available as we don't have info on the size of the integer anymore)

{% macro default__type_bigint() %}
    bigint
{% endmacro %}

{% macro bigquery__type_bigint() %}
    int64
{% endmacro %}

from https://github.com/fishtown-analytics/dbt-utils/blob/ceb28497769c642cae7e3d5d18f1fe6bb253ef59/macros/cross_db_utils/datatypes.sql#L71

default -> throw new IllegalArgumentException(String.format("%s is not supported.", cursorFieldType));
}
}

// the switch statement intentionally has duplicates so that its structure matches the type switch
// statement above.

// these json type fields are eventually consumed by the normalization process (if configured).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does something need to change in normalization to support the new primitive type INTEGER?

Copy link
Contributor

@ChristopheDuong ChristopheDuong Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, other sources were already producing integer columns as json primitives, this change is only for JDBC based source in java

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think more broadly the question is just what needs to change downstream to support the integer type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may be a little puritanical of me, but it seems a little odd to have a comment about normalization in this utils class. feel free to keep it if you feel it's helpful though. (2 of 10 on the scale)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In normalization, it is already handling integer fields.

For example facebook catalog produces such as this one:

Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the nvarchar parts look good to me. see comments below that I am wary about doing the integer piece. curious to understand if we need to do that now.

.put("bigint", JsonSchemaPrimitive.NUMBER)
.put("smallint", JsonSchemaPrimitive.INTEGER)
.put("int", JsonSchemaPrimitive.INTEGER)
.put("bigint", JsonSchemaPrimitive.INTEGER)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Do we need to add integer in this PR?
  2. How good to we feel that everything still works after adding a new type? Just running the tests is insufficient in this case because none of the tests are testing for integer type. So it seems like we need to add tests for sources, destinations, etc to make sure that this gets supported sanely. We can do this, but that's a bigger PR (which goes back to question 1).

My feeling is as is, this is yoloing a kinda big change. So we should know why it is valuable right now and understand why we are yoloing it versus doing it more carefully with adequate testing.

default -> throw new IllegalArgumentException(String.format("%s is not supported.", cursorFieldType));
}
}

// the switch statement intentionally has duplicates so that its structure matches the type switch
// statement above.

// these json type fields are eventually consumed by the normalization process (if configured).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think more broadly the question is just what needs to change downstream to support the integer type?

default -> throw new IllegalArgumentException(String.format("%s is not supported.", cursorFieldType));
}
}

// the switch statement intentionally has duplicates so that its structure matches the type switch
// statement above.

// these json type fields are eventually consumed by the normalization process (if configured).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may be a little puritanical of me, but it seems a little odd to have a comment about normalization in this utils class. feel free to keep it if you feel it's helpful though. (2 of 10 on the scale)

Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked offline. Agreed to separate out the integer stuff into a separate project. Approving because once that's done this can be merged. Don't want to block due to time zone.

@davinchia
Copy link
Contributor Author

davinchia commented Mar 25, 2021

@chris I'm going to separate this out for now to make this change cleaner. Let's talk later on how to approach this from a test perspective later today. Looks like we might have other types to add as well.

I will redo the docker versions after #2460 is merged in.

@davinchia davinchia force-pushed the davinchia/fix-missing-cursor-type branch from 794796c to dcbbc7b Compare March 25, 2021 04:01
@davinchia
Copy link
Contributor Author

davinchia commented Mar 28, 2021

/test connector=connectors/source-mssql

🕑 connectors/source-mssql https://github.com/airbytehq/airbyte/actions/runs/695152731
✅ connectors/source-mssql https://github.com/airbytehq/airbyte/actions/runs/695152731

@davinchia
Copy link
Contributor Author

davinchia commented Mar 28, 2021

/test connector=connectors/source-mysql

🕑 connectors/source-mysql https://github.com/airbytehq/airbyte/actions/runs/695153373
✅ connectors/source-mysql https://github.com/airbytehq/airbyte/actions/runs/695153373

@davinchia
Copy link
Contributor Author

davinchia commented Mar 28, 2021

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/695153562
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/695153562

@davinchia
Copy link
Contributor Author

davinchia commented Mar 28, 2021

/test connector=connectors/source-redshift

🕑 connectors/source-redshift https://github.com/airbytehq/airbyte/actions/runs/695153616
✅ connectors/source-redshift https://github.com/airbytehq/airbyte/actions/runs/695153616

@davinchia
Copy link
Contributor Author

davinchia commented Mar 28, 2021

/publish connector=connectors/source-mysql

🕑 connectors/source-mysql https://github.com/airbytehq/airbyte/actions/runs/695169703
✅ connectors/source-mysql https://github.com/airbytehq/airbyte/actions/runs/695169703

@davinchia
Copy link
Contributor Author

davinchia commented Mar 28, 2021

/publish connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/695170052
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/695170052

@davinchia
Copy link
Contributor Author

davinchia commented Mar 28, 2021

/test connector=connectors/source-mssql

🕑 connectors/source-mssql https://github.com/airbytehq/airbyte/actions/runs/695170592
❌ connectors/source-mssql https://github.com/airbytehq/airbyte/actions/runs/695170592

@davinchia
Copy link
Contributor Author

davinchia commented Mar 28, 2021

/publish connector=connectors/source-mssql

🕑 connectors/source-mssql https://github.com/airbytehq/airbyte/actions/runs/695178337
✅ connectors/source-mssql https://github.com/airbytehq/airbyte/actions/runs/695178337

@davinchia
Copy link
Contributor Author

davinchia commented Mar 28, 2021

/publish connector=connectors/destination-redshift

🕑 connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/695188118
✅ connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/695188118

@davinchia
Copy link
Contributor Author

davinchia commented Mar 28, 2021

/publish connector=connectors/destination-postgres

🕑 connectors/destination-postgres https://github.com/airbytehq/airbyte/actions/runs/695188386
✅ connectors/destination-postgres https://github.com/airbytehq/airbyte/actions/runs/695188386

@davinchia
Copy link
Contributor Author

davinchia commented Mar 28, 2021

/publish connector=connectors/destination-snowflake

🕑 connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/695188496
✅ connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/695188496

@davinchia
Copy link
Contributor Author

davinchia commented Mar 28, 2021

/publish connector=connectors/source-redshift

🕑 connectors/source-redshift https://github.com/airbytehq/airbyte/actions/runs/695223589
✅ connectors/source-redshift https://github.com/airbytehq/airbyte/actions/runs/695223589

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants