-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
source-mysql: datatype fixes #49918
source-mysql: datatype fixes #49918
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/util/Jsons.kt
Outdated
Show resolved
Hide resolved
.../bulk/core/extract/src/testFixtures/kotlin/io/airbyte/cdk/read/DynamicDatatypeTestFactory.kt
Outdated
Show resolved
Hide resolved
.../bulk/core/extract/src/testFixtures/kotlin/io/airbyte/cdk/read/DynamicDatatypeTestFactory.kt
Outdated
Show resolved
Hide resolved
with("${key}.type", converterClass.canonicalName) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntactic sugar for registering converters
MysqlType.ENUM, | ||
MysqlType.SET -> StringFieldType | ||
MysqlType.JSON -> JsonStringFieldType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
were json values always rendered as strings? or is this a recent regression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in old mysql's MySqlSourceOperations.copyToJsonField:
case … JSON, … -> putString(json, columnName, resultSet, colIndex);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oof, OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that we should fix this in a different PR
} | ||
MysqlType.VECTOR, | ||
MysqlType.UNKNOWN, | ||
null -> PokemonFieldType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might as well list all values explicitly for safety
// This is to make sure that numbers are represented as strings. | ||
.with("decimal.handling.mode", "string") | ||
// This is to make sure that temporal data is represented without loss of precision. | ||
.with("time.precision.mode", "adaptive_time_microseconds") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the default value for this setting, but I prefer to declare it explicitly
@@ -373,6 +391,10 @@ class MySqlDebeziumOperations( | |||
// This to make sure that binary data represented as a base64-encoded String. | |||
// https://debezium.io/documentation/reference/2.2/connectors/mysql.html#mysql-property-binary-handling-mode | |||
.with("binary.handling.mode", "base64") | |||
// This is to make sure that numbers are represented as strings. | |||
.with("decimal.handling.mode", "string") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this wraps large numbers in quotes, which will get turned into BigDecimal
again by the snippet I added further above
@@ -175,7 +174,8 @@ object MySqlDatatypeTestOperations : | |||
val dateTimeValues = | |||
mapOf( | |||
"'2024-09-13 14:30:00'" to """"2024-09-13T14:30:00.000000"""", | |||
"'2024-09-13T14:40:00+00:00'" to """"2024-09-13T14:40:00.000000"""" | |||
"'2024-09-13T14:40:00+00:00'" to """"2024-09-13T14:40:00.000000"""", | |||
"'1752-09-01 14:30:00'" to """"1752-09-01T14:30:00.000000"""", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I have enough edge cases in these temporal test cases.
In any case mysql doesn't do BC DATE
s, and doesn't do pre-year-1000 DATETIME
s and doesn't do TIMESTAMP
s before unix epoch 0.
b464f37
to
2e13b9a
Compare
66a505d
to
233cabf
Compare
e1a82df
to
4576060
Compare
4576060
to
7a9c03f
Compare
7a9c03f
to
ebbbb9d
Compare
fd09283
to
1805e08
Compare
1805e08
to
3f30b8c
Compare
ebbbb9d
to
0aa2030
Compare
3f30b8c
to
9818c9a
Compare
data.set<JsonNode>(field.id, Jsons.readTree(textNode.textValue())) | ||
} | ||
else -> continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels like a bad abstraction here, the fact that we (the implementor of DebeziumOperations) have to iterate through the schema types. Just some food for thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's not great. Perhaps Stream
should have pre-computed groupings of Field
s per Airbyte type. Or per FieldType
. Not sure.
@rodireich I looked into the debezium-connector-mysql code and all of the weird edge cases in the legacy converter logic apply only to debezium snapshots, as far as I can tell, which we just don't do. When using debezium in regular, non-snapshot mode the Debezium engine stays on the happy path with no weird java types. |
e81d3dc
to
38639e5
Compare
@rodireich actually there is one edge case which this PR doesn't handle correctly: default values! I'm working on adding test cases for these. |
38639e5
to
1ec2728
Compare
I added test cases for default values, which proved very useful. |
What
This PR tries to get source-mysql's datatypes correct by leveraging the improved
MySqlDatatypeIntegrationTest
.How
n/a
Review guide
n/a
User Impact
Source-mysql will handle some datatypes differently:
Can this PR be safely reverted and rolled back?