Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #498: Add Hudi Codebase #499

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

JosepSampe
Copy link
Member

Description

Type of change

New Hudi table format support

Checklist:

Here is the list of things you should do before submitting this pull request:

  • New feature / bug fix has been committed following the Contribution guide.
  • Add logging to the code following the Contribution guide.
  • Add comments to the code (make it easier for the community!).
  • Change the documentation.
  • Add tests.
  • Your branch is updated to the main branch (dependent changes have been merged).

@JosepSampe JosepSampe self-assigned this Dec 2, 2024
Copy link
Member

@Jiaweihu08 Jiaweihu08 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Just some comments here.

metaClient,
isOverwriteOperation,
updatedConfig,
hasRevisionUpdate = true // Force update the metadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to study the isolation level of different operations in Hudi. In delta, for instance, metadata updates are Serializable, and more rigorous checks are done for concurrent writes.

writeStatus
}

/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the following the same for Delta? If so, we can centralize them somehow to avoid duplication? What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encodeBlocks is the same, while decodeBlocks is slightly different, although we can probably make both the same

Copy link
Member

@osopardo1 osopardo1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see the Hudi support advancing. Congrats!!

I know it's a draft, but I would like to fully understand the changes and try to minimize the issues we already made in the Delta Integration. I've added a few comments (some of them about semanthics, don't worry......)

@JosepSampe JosepSampe marked this pull request as ready for review December 18, 2024 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants