Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Connector-V2] Support write cdc changelog event into hudi sink #7845

Merged
merged 3 commits into from
Oct 18, 2024

Conversation

happyboy1024
Copy link
Contributor

@happyboy1024 happyboy1024 commented Oct 15, 2024

Purpose of this pull request

Support write cdc changelog into hudi sink, and fix #7837

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

@RoderickAdriance
Copy link

RoderickAdriance commented Oct 15, 2024

Your hudi connector have a bug,please contact me wx 15345737051. @happyboy1024
We can discuss further.

@happyboy1024
Copy link
Contributor Author

Your hudi connector have a bug,please contact me wx 15345737051. @happyboy1024 We can discuss further.

I added you, of course you can also submit an issue description.

@Hisoka-X
Copy link
Member

cc @liugddx

@@ -115,9 +115,9 @@ Note: When this configuration corresponds to a single table, you can flatten the

`max_commits_to_keep` The max commits to keep of hudi table.

### auto_commit [boolean]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why delete auto_commit?

Copy link
Contributor Author

@happyboy1024 happyboy1024 Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why delete auto_commit?

1.HoodieJavaWriteClient is auto commit by default when not configure this.

2.Due to the nature of hudi timelines, data needs to be submitted immediately after the data is writed or deleted, If not next flush will not aware data state of the previous batch. This may result in duplicate data or loss of deleted data.

@liugddx
Copy link
Member

liugddx commented Oct 17, 2024

Overall LGTM.

liugddx
liugddx previously approved these changes Oct 17, 2024
convertToSchema(seaTunnelRowType.getFieldType(i)),
convertToSchema(
seaTunnelRowType.getFieldType(i),
ROW_NAME + "_" + seaTunnelRowType.getFieldNames()[i]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a standrad way of set field name? org.apache.seatunnel.avro.generated.record_fieldname?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a standrad way of set field name? org.apache.seatunnel.avro.generated.record_fieldname?

I have modified the schema generation logic according to hudi official. PTAL. @liugddx @Hisoka-X .

@Hisoka-X
Copy link
Member

Thanks @happyboy1024

Copy link
Member

@liugddx liugddx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@liugddx liugddx merged commit 934434c into apache:dev Oct 18, 2024
9 checks passed
@happyboy1024 happyboy1024 deleted the cdc_to_hudi branch October 22, 2024 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] [Hudi sink connector] sink mysql decimal column type error
4 participants