-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Destination-S3: Correctly generate int64 values #23466
Conversation
/test connector=connectors/destination-s3
Build PassedTest summary info:
|
/test connector=connectors/destination-gcs
Build PassedTest summary info:
|
/test connector=connectors/destination-databricks
Build PassedTest summary info:
|
/test connector=connectors/destination-bigquery
Build PassedTest summary info:
|
/test connector=connectors/destination-redshift
Build PassedTest summary info:
|
/test connector=connectors/destination-snowflake
Build PassedTest summary info:
|
Affected Connector ReportNOTE
|
Connector | Version | Changelog | Publish |
---|
- See "Actionable Items" below for how to resolve warnings and errors.
✅ Destinations (10)
Connector | Version | Changelog | Publish |
---|---|---|---|
destination-bigquery |
1.2.15 |
✅ | ✅ |
destination-bigquery-denormalized |
1.2.15 |
✅ | ✅ |
destination-databricks |
0.3.1 |
✅ | ✅ |
destination-gcs |
0.2.15 |
✅ | ✅ |
destination-jdbc |
0.3.14 |
🔵 (ignored) |
🔵 (ignored) |
destination-r2 |
0.1.0 |
✅ | ✅ |
destination-redshift |
0.4.1 |
✅ | ✅ |
destination-s3 |
0.3.21 |
✅ | ✅ |
destination-s3-glue |
0.1.2 |
✅ | ✅ |
destination-snowflake |
0.4.51 |
✅ | ✅ |
- See "Actionable Items" below for how to resolve warnings and errors.
👀 Other Modules (1)
- base-normalization
Actionable Items
(click to expand)
Category | Status | Actionable Item |
---|---|---|
Version | ❌ mismatch |
The version of the connector is different from its normal variant. Please bump the version of the connector. |
⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
|
Changelog | ⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
❌ changelog missing |
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog. | |
Publish | ⚠ not in seed |
The connector is not in the seed file (e.g. source_definitions.yaml ), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug. |
❌ diff seed version |
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version. |
/test connector=connectors/destination-bigquery-denormalized
Build PassedTest summary info:
|
return Schema.createUnion(newUnionTypes); | ||
} else { | ||
return Schema.createUnion(unionTypes); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic is fine, I got a little carried away, so heres an alternate approach
private Schema createUnionAndCheckLongTypesDuplications(List<Schema> unionTypes) {
Predicate<Schema> isALong = type -> type.getType() == Schema.Type.LONG;
Predicate<Schema> isPlainLong = isALong.and(type -> Objects.isNull(type.getLogicalType()));
Predicate<Schema> isTimestampMicrosLong = isALong.and(type ->
Objects.nonNull(type.getLogicalType()) && "timestamp-micros".equals(type.getLogicalType().getName()));
boolean hasPlainLong = unionTypes.stream().anyMatch(isPlainLong);
boolean hasTimestampMicrosLong = unionTypes.stream().anyMatch(isTimestampMicrosLong);
Predicate<Schema> removeTimestampType = type -> !(hasPlainLong && hasTimestampMicrosLong && isTimestampMicrosLong.test(type));
return Schema.createUnion(unionTypes.stream().filter(removeTimestampType).collect(Collectors.toList()));
}
```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use your approach. Thanks!
This reverts commit 151c85d.
/publish connector=connectors/destination-s3
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/destination-gcs
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/destination-bigquery
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/destination-bigquery-denormalized
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/destination-snowflake
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/destination-redshift
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/destination-snowflake
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/destination-snowflake
if you have connectors that successfully published but failed definition generation, follow step 4 here |
* [17564] Updated s3 avro to use long instead of int --------- Co-authored-by: etsybaev <etsybaev@users.noreply.github.com> Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
* [17564] Updated s3 avro to use long instead of int --------- Co-authored-by: etsybaev <etsybaev@users.noreply.github.com> Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
* [17564] Updated s3 avro to use long instead of int --------- Co-authored-by: etsybaev <etsybaev@users.noreply.github.com> Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
* [17564] Updated s3 avro to use long instead of int --------- Co-authored-by: etsybaev <etsybaev@users.noreply.github.com> Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
What
As noted in #14362 (comment), dest-s3's avro and parquet output formats currently use int32. This causes problems for users with int64 values. We should have dest-s3 produce int64 values for integer inputs.
The reason we use int32 right now is because we want to support union types, which behave kind of weirdly in edge cases.
How
Updated S3 to use Avro.LONG type instead of Avro.INT
🚨 User Impact 🚨
At the first glance, no breaking changes are expected. In the edge case user might need to reset connection and re-sync data.
Important note: If we would get oneOf ("long" and "long, specific type: timestamp" - the value would be treated as a long
Pre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereUpdating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereConnector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changes