Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Databases DateTime, Date, Time data type bug (MySQL, Postgres, etc.) #14628

Closed
subodh1810 opened this issue Jul 12, 2022 · 14 comments
Closed

Databases DateTime, Date, Time data type bug (MySQL, Postgres, etc.) #14628

subodh1810 opened this issue Jul 12, 2022 · 14 comments
Assignees

Comments

@subodh1810
Copy link
Contributor

Ref : https://airbytehq-team.slack.com/archives/C03C4AVJWG4/p1657271466983419
I was going through some of the data type tests in MySQL and I see that we convert all the time related data type values (TIMESTAMP, DATE TIME, DATE) into TIMESTAMP WITH TIEMZONE values. For instance

  addDataTypeTestData(
        TestDataHolder.builder()
            .sourceType("datetime")
            .airbyteType(JsonSchemaType.STRING)
            .addInsertValues("null", "'2005-10-10 23:22:21'")
            .addExpectedValues(null, "2005-10-10T23:22:21.000000Z")
            .build());

^ DATETIME column value 2005-10-10 23:22:21 is being converted into 2005-10-10T23:22:21.000000Z , similarly

addDataTypeTestData(
        TestDataHolder.builder()
            .sourceType("date")
            .airbyteType(JsonSchemaType.STRING)
            .addInsertValues("null", "'2021-01-01'")
            .addExpectedValues(null, "2021-01-01T00:00:00Z")
            .build());

^ DATE value 2021-01-01 is being converted into 2021-01-01T00:00:00Z .
The tests can be found in the class AbstractMySqlSourceDatatypeTest

@subodh1810
Copy link
Contributor Author

This problem is also existing in Postgres CDC ,

    addDataTypeTestData(
        TestDataHolder.builder()
            .sourceType("date")
            .airbyteType(JsonSchemaType.STRING)
            .addInsertValues("'January 7, 1999'", "'1999-01-08'", "'1/9/1999'", "'January 10, 99 BC'", "'January 11, 99 AD'", "null")
            .addExpectedValues("1999-01-07T00:00:00Z", "1999-01-08T00:00:00Z", "1999-01-09T00:00:00Z", "0099-01-10T00:00:00Z", "1999-01-11T00:00:00Z",
                null)
            .build());

take a look in CdcPostgresSourceDatatypeTest

@subodh1810 subodh1810 changed the title MySQL DateTime, Date, Time data type bug Databases DateTime, Date, Time data type bug (MySQL, Postgres, etc.) Jul 12, 2022
@grishick
Copy link
Contributor

grishick commented Jul 12, 2022

  • Test if we can get rid of debezium converter for data types after upgrading to new version of debezium
  • Fix DataTypeUtils
    We could break this down into two tasks, but the first task seems small enough.

@grishick
Copy link
Contributor

Hey team! Please add your planning poker estimate with ZenHub @edgao @ryankfu @subodh1810

@grishick
Copy link
Contributor

Related issue that should be fixed when this is fixed: #14590

@edgao
Copy link
Contributor

edgao commented Jul 27, 2022

the converter is still necessary after the 1.9.2 upgrade; debezium converts date into epoch days rather than formatted into 2022-01-23 strings

@edgao
Copy link
Contributor

edgao commented Aug 3, 2022

source-postgres is completed in #15259; there's still work remaining for other DB sources.

@edgao
Copy link
Contributor

edgao commented Aug 3, 2022

@grishick how do I remove this issue from the postgres epic?

@grishick
Copy link
Contributor

grishick commented Aug 3, 2022

@grishick how do I remove this issue from the postgres epic?

done (clicked on the cogwheel next to Epics in the side bar on the right and unchecked the checkbox next to postgres epic)

@grishick
Copy link
Contributor

grishick commented Aug 4, 2022

@subodh1810
This is still affecting MySQL and Snowflake sources (related OC issue). Ideally this can be fixed at a shared class level that will fix this for all JDBC sources. However, if you determine that this approach is not feasible, please scope this issue to MySQL and open a followup issue for Snowflake source.

@edgao
Copy link
Contributor

edgao commented Aug 4, 2022

for reference: I was able to repro the issue with source-snowflake; see https://github.com/airbytehq/oncall/issues/383#issuecomment-1204588122 - it looks like timestamp and date columns are both incorrectly treated as timestamptz

@edgao edgao removed their assignment Aug 4, 2022
@grishick
Copy link
Contributor

grishick commented Aug 4, 2022

@edgao @subodh1810 should this be renamed to MySQL scope only now that Postgres is fixed separately?

@edgao
Copy link
Contributor

edgao commented Aug 4, 2022

I think the scope of this ticket is any db/dw source, so it covers everything except postgres (e.g. the snowflake thing I mentioned above)

@grishick
Copy link
Contributor

grishick commented Aug 5, 2022

If the implementation will require connector-specific change, please file separate issues for each connector, so that we can prioritize accordingly.

@subodh1810
Copy link
Contributor Author

Am closing this cause we have fixed MySQL and Postgres. Also going to re-estimate it. This took longer than expected. WE merged 4 big PRs to fix the two sources. Making sure CDC and non-CDC output was the same was more challenging than expected. Will file separate issues for other db sources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants