Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delta Lake +Unity Catalog exporter: Delta Lake table metadata #7527

Merged

Conversation

Jonathan-Rosenberg
Copy link
Contributor

Closes #7302

Change Description

  • When returning the Delta Log, also return the table's metadata.
  • Unity Catalog: if the external Delta Lake table has a description, create it in Unity with it as a comment.

FYI @talSofer

@Jonathan-Rosenberg Jonathan-Rosenberg added the include-changelog PR description should be included in next release changelog label Mar 4, 2024
Copy link

github-actions bot commented Mar 4, 2024

♻️ PR Preview 5e9af10 has been successfully destroyed since this PR has been closed.

🤖 By surge-preview

Copy link

github-actions bot commented Mar 4, 2024

E2E Test Results - DynamoDB Local - Local Block Adapter

10 passed

Copy link
Member

@N-o-Z N-o-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks very good!
I have some comments about the documentation.
Also - can we please add an esti test?

Comment on lines 11 to 12
luautil "github.com/treeverse/lakefs/pkg/actions/lua/util"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
luautil "github.com/treeverse/lakefs/pkg/actions/lua/util"
luautil "github.com/treeverse/lakefs/pkg/actions/lua/util"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't mess with goimports

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't mess with it, just remove the redundant newline 📏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing sure thing

Comment on lines 10 to 11
"github.com/csimplestring/delta-go/action"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"github.com/csimplestring/delta-go/action"
"github.com/csimplestring/delta-go/action"

Comment on lines 377 to 378
- the first is a table of the format `{number, {string}}` where `number` is a version in the Delta Log, and the mapped `{string}`
table (list) contains JSON strings of the different Delta Lake log operations listed in the mapped version entry.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sentences are not very clear. table? list?JSON strings?? I'm not sure what I'm getting here

Copy link
Contributor Author

@Jonathan-Rosenberg Jonathan-Rosenberg Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll clarify

The format of the response is two tables:
- the first is a table of the format `{number, {string}}` where `number` is a version in the Delta Log, and the mapped `{string}`
table (list) contains JSON strings of the different Delta Lake log operations listed in the mapped version entry.
- the second is a table of the metadata of the table, it consists of the following fields:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the metadata versioned as well?
I find it confusing that on the one hand we provide a list of versions for the table and on the other hand we provide only the metadata of the current snapshot. Perhaps a clarification is needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the latest version's metadata.
The intention is to make it easier to get the current state of the table to initialize it in other systems faster.
I'll clarify

@Jonathan-Rosenberg Jonathan-Rosenberg requested a review from N-o-Z March 4, 2024 13:22
Copy link
Member

@N-o-Z N-o-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the tests!

@Jonathan-Rosenberg Jonathan-Rosenberg merged commit f9370a0 into master Mar 4, 2024
36 checks passed
@Jonathan-Rosenberg Jonathan-Rosenberg deleted the feature/add-metadata-to-delta-export-and-unity branch March 4, 2024 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
include-changelog PR description should be included in next release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unity catalog exporter: Exported tables should retain original table field documentation
2 participants