-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛Source-mssql: added support for some datatypes #14449
🐛Source-mssql: added support for some datatypes #14449
Conversation
/test connector=connectors/source-mssql
Build PassedTest summary info:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please specify which data types are added in this PR as part of the commit message and PR description.
I see that you removed trailing "0"s from one of the tests - why?
@grishick the initial idea was to do not transform the user's data, i.e. if user had time with millis - then we would also add it, but if user's data was without millis - then also not add it. But I got your concern and updated PR to have millis by default (re-used pattern that we already use in other connectors). Also updated description as per your request. |
/test connector=connectors/source-mssql
Build PassedTest summary info:
|
Hey, so this is actually related to work I'm currently trying to do in a fork. I know your PR doesn't capture "MONEY", "NUMERIC", and "DECIMAL". But could we tack those changes in? |
private final Set<String> BINARY = Set.of("VARBINARY", "BINARY"); | ||
private static final String DATETIMEOFFSET = "DATETIMEOFFSET"; | ||
private static final String TIME_TYPE = "TIME"; | ||
private static final String SMALLMONEY_TYPE = "SMALLMONEY"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private static final String SMALLMONEY_TYPE = "SMALLMONEY"; | |
private final Set<String> DECIMAL_TYPES = Set.of("SMALLMONEY", "MONEY", "NUMERIC", "DECIMAL"); |
and change the if else to use contains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @gsmith-schlesinger .
Could you please clarify what's wrong with those types?
If we update it a is then some dataType tests started failing.
Also if we update something for CDD migration we need to make sure we do the same for normal sync. In the nearest future we'll have the same tests for both Normal and CDC syncs.
Ex.:
org.opentest4j.AssertionFailedError: Returned value '999.0' from stream dbo_1_decimal is not in the expected list: [999.00, 5.10, 0.00, null] ==> expected:
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So currently as mentioned in #12949 and from this issues #5609 from are exactly the same issue we're currently experiencing the difference is the column type from the database we're reading from is use a MONEY
.
There's a PR which refers to this area of code that "fixed" the issue for SMALLMONEY
but it didn't encapsulate all decimal type columns from MSSQL.
You'll see the error logs in both tickets, and those error logs are the same thing we're seeing in ours.
Failed to convert JSON to Avro: Could not evaluate union, field CTR is expected to be one of these: NULL, DOUBLE. If this is a complex type, check if offending field (path: CTR) adheres to schema: 0.00
There's a workaround using CSV or JSON. But that's not sufficient for our use case.
I could be wrong, that this isn't the right area of the code. But I'm still attempting to get airbyte running in and fighting intellij to verify this "fixes" our issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should note as well this issue comes up only on CDC / Incremental airbyte tasks. The Full Refresh / Overwrite task works successfully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have any PR where you've fixed and aligned it with regular sync?
Just asking as this particular PR basically is the first step of merging tests for normal and CDC sync's to make sure that behavior (output) are the same - #14379.
I just had been asked to split it into 2 different PRs.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not yet, but I'll honestly have to wait for your PR to be merged before I can submit mine anyways as you've made a lot of changes I was already planning on making in my own fork. I only started working towards this two days ago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main idea here is that if you change anything in CDC then you also need to make sure that normal sync will return exactly the same. Unfortunately, there are completely different approaches used here, so it may be pretty tricky to alight it. Especially when it comes to alight zero decimal values (ex. 10 vs 10.0 vs 10.00 etc)
@@ -34,11 +46,61 @@ public void converterFor(final RelationalColumn field, | |||
registerDate(field, registration); | |||
} else if (SMALLMONEY_TYPE.equalsIgnoreCase(field.typeName())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else if (SMALLMONEY_TYPE.equalsIgnoreCase(field.typeName())) { | |
} else if (DECIMAL_TYPES.contains(field.typeName().toUpperCase())) { |
…ort-for-some-datatypes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 versions of MSSQLConverter, can you replicate your changes to airbyte-integrations/bases/debezium-v1-9-2/src/main/java/io/airbyte/integrations/debezium/internals/MSSQLConverter.java
as well.
Also please take a look at the issue #14628, we want to avoid mutating the data types in any way possible so please verify that its not happening in this PR
@@ -391,32 +391,27 @@ protected void initTests() { | |||
.sourceType("datetimeoffset") | |||
.airbyteType(JsonSchemaType.STRING) | |||
.addInsertValues("'0001-01-10 00:00:00 +01:00'", "'9999-01-10 00:00:00 +01:00'", "null") | |||
.addExpectedValues("0001-01-10 00:00:00 +01:00", "9999-01-10 00:00:00 +01:00", null) | |||
.addExpectedValues("0001-01-10 00:00:00.0000000 +01:00", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we expect to have milliseconds when insert values do not have them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was your concern in the comment above #14449 (review)
Previously we had millis in any case. I removed it for cases when source didn't have it. I thought you had a concern about it so I added it back, but aligned it to the format that we have for other connectors. And also aligned between normal and CDC cync. Because for now we have different behavior even inside mssql connector. If it's not OK, kindly ask to specify your expected behavior. Thanks
...c/test-integration/java/io/airbyte/integrations/source/mssql/CdcMssqlSourceDatatypeTest.java
Show resolved
Hide resolved
@@ -225,7 +225,7 @@ protected void initTests() { | |||
.sourceType("time") | |||
.airbyteType(JsonSchemaType.STRING) | |||
.addInsertValues("null", "'13:00:01'", "'13:00:04Z'", "'13:00:04.123456Z'") | |||
.addExpectedValues(null, "13:00:01.0000000", "13:00:04.0000000", "13:00:04.1234560") | |||
.addExpectedValues(null, "13:00:01.000000", "13:00:04.000000", "13:00:04.123456") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here - is it the correct behavior to expect milliseconds when inserting data w/o milliseconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was your concern in the comment above #14449 (review)
Previously we had millis in any case. I removed it for cases when source didn't have it. I thought you had a concern about it so I added it back, but aligned it to the format that we have for other connectors. If it's not OK, kindly ask to specify your expected behavior. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a misunderstanding. I wasn't expressing a concern, but asking for the underlying logic/reason for removing trailing zeros in some of the test cases. Generally speaking, we should not be changing source data. If source has time w/o millis we should not be adding them, if source has time with millis, we should not be removing them.
@@ -225,7 +225,7 @@ protected void initTests() { | |||
.sourceType("time") | |||
.airbyteType(JsonSchemaType.STRING) | |||
.addInsertValues("null", "'13:00:01'", "'13:00:04Z'", "'13:00:04.123456Z'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we expect to lose "Z" - is this the correct behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just aligned it to what we already use in other connectors (re-used existing time format). Should I add an additional format to add it back? Thanks
…ort-for-some-datatypes
@@ -381,8 +381,8 @@ protected void initTests() { | |||
TestDataHolder.builder() | |||
.sourceType("time") | |||
.airbyteType(JsonSchemaType.STRING) | |||
.addInsertValues("'00:00:00.0000000'", "'23:59:59.9999999'", "'00:00:00'", "'23:58'", "null") | |||
.addExpectedValues("00:00:00", "23:59:59.9999999", "00:00:00", "23:58:00", null) | |||
.addInsertValues("'00:00:00.000000'", "'23:59:59.999999'", "'00:00:00'", "'23:58'", "null") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I'm missing something - do we no longer support 7-decimal-digit precision in timestamps?
@gsmith-schlesinger, you save my days. Your sugesstion resovled the issue. Additional Failure Information: tech.allegro.schema.json2avro.converter.AvroConversionException: Failed to convert JSON to Avro: Could not evaluate union, field TotalAmount is expected to be one of these: NULL, DOUBLE. If this is a complex type, check if offending field (path: TotalAmount) adheres to schema: 0.00 A PR is recommended. |
What
Currently, we have 2 different classes for integration datatype tests CdcMssqlSourceDatatypeTest and MssqlSourceDatatypeTest. They have the same input args, but differ "expected result" for some types.
Some of Cdc types were not supported at all.
How
Aligned tests for normal and CDC datatype tests to make those tests ready for merging.
This PR is basically the first part of splitting this #14379 into 2 PRs as it had been requested. Once it appears to be merged, I will create a second PR for tests merging to have the same tests for both sync ways (to verify that output is the same for both).
Normal sync:
CDC:
Recommended reading order
🚨 User Impact 🚨
Shouldn't introduce any breaking changes.
Pre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereUpdating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereConnector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changesTests
Unit
Put your unit tests output here.
Integration
Put your integration tests output here.
Acceptance
Put your acceptance tests output here.