Skip to content

Conversation

@dentiny
Copy link
Contributor

@dentiny dentiny commented Apr 30, 2025

Which issue does this PR close?

This PR sync up with upstream iceberg spec on deletion vector related fields.
Reference PR: apache/iceberg#11240
Reference file: https://github.com/apache/iceberg/blob/main/format/spec.md#manifests

Context:
I'm working on iceberg related projects in my work, one of the features I'm working on is deletion vector.
Because there's no existing rust support for it, I need to do some of the work my own (i.e. deletion vector read and write).

My philosophy is to reduce discrepancy with upstream is:

  • Avoid touching existing code unless unavoidable, and collaborate with open community if possible;
  • For deletion vector issue, I could make no other changes and keep puffin write/read completely change on my end;
  • The schema change in this PR is something hard to work-around. :(

I personally think the schema change alone (without actual deletion vector feature implementation) make sense, because it's a sync-up from iceberg spec sync (which has already been acknowledged by the community), as the PR.

I'm happy to discuss more on this PR, I'm willing to provide more on the motivation, my work-around and followup;
I'm also willing to contribute if the community requires more hands on it.

Are these changes tested?

This PR is a no-op change, I confirm I could build + link with no problem.

@dentiny dentiny changed the title [WIP] Sync manifest entry schema Sync manifest entry schema Apr 30, 2025
@Xuanwo Xuanwo changed the title Sync manifest entry schema feat: Add deletion vector related fields in spec types Apr 30, 2025
@Xuanwo
Copy link
Member

Xuanwo commented Apr 30, 2025

Hi @dentiny, I have updated the PR title to better reflect the changes. I have to say that I completely agree with your "upstream first" philosophy, and I appreciate any kind of help. Welcome to join the community!

Xuanwo
Xuanwo previously approved these changes Apr 30, 2025
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most changes look good me.

Do you have other concerns? @Fokko @liurenjie1024 @sdd

@dentiny dentiny changed the title feat: Add deletion vector related fields in spec types [WIP] Sync manifest entry schema Apr 30, 2025
@dentiny
Copy link
Contributor Author

dentiny commented Apr 30, 2025

Sorry there're some test compilation failures, let me fix them.

@dentiny dentiny marked this pull request as draft April 30, 2025 06:58
@dentiny dentiny marked this pull request as ready for review April 30, 2025 08:06
@dentiny dentiny changed the title [WIP] Sync manifest entry schema Sync manifest entry schema Apr 30, 2025
@dentiny dentiny requested a review from Xuanwo April 30, 2025 08:07
equality_ids: vec![],
sort_order_id: None,
partition_spec_id: 0,
referenced_data_file: None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I manually assigned these fields is to keep coding style consistency.

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dentiny for this pr, generally a good start for building v3 support. Just one minor hint.

@Xuanwo Xuanwo changed the title Sync manifest entry schema feat: Add deletion vector related fields in spec types Apr 30, 2025
@dentiny dentiny requested a review from liurenjie1024 April 30, 2025 17:12
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dentiny for this pr!

@liurenjie1024 liurenjie1024 merged commit b4bc6dd into apache:main May 1, 2025
17 checks passed
dentiny added a commit to dentiny/iceberg-rust that referenced this pull request May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants