Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task: Support Azure Unity Catalog export #7554

Merged
merged 6 commits into from
Mar 14, 2024

Conversation

N-o-Z
Copy link
Member

@N-o-Z N-o-Z commented Mar 13, 2024

Closes #7551
Closes #7553

Change Description

Add support for Unity catalog exporter in Azure
Fix delta exporter to support abfss scheme for delta log path and table physical address

Testing Details

Added esti test case

Breaking Change?

No

After fix this is the output of delta_exporter:


  delta_exporter(completed in 4.949s)

Delta Lake exported table "test-table"'s location: abfss://esti-system-testing@esti4hns.dfs.core.windows.net/cnp12lbck6tb6j1l84lg/testdeltacatalogexportabfss/_lakefs/exported/main/d39916/test_table

Delta Lake exported table "test-table"'s metadata:

	partition_columns = []


	configuration = {}


	created_time = 1707066829815

	description = This is the description of the table

	id = db5e0917-1716-4b0f-a009-c25e5b7304a1

	name = 

	schema_string = {"type":"struct","fields":[{"name":"registration_dttm","type":"timestamp","nullable":true,"metadata":{}},{"name":"id","type":"integer","nullable":true,"metadata":{}},{"name":"first_name","type":"string","nullable":true,"metadata":{}},{"name":"last_name","type":"string","nullable":true,"metadata":{}},{"name":"email","type":"string","nullable":true,"metadata":{}},{"name":"gender","type":"string","nullable":true,"metadata":{}},{"name":"ip_address","type":"string","nullable":true,"metadata":{}},{"name":"cc","type":"string","nullable":true,"metadata":{}},{"name":"country","type":"string","nullable":true,"metadata":{}},{"name":"birthdate","type":"string","nullable":true,"metadata":{}},{"name":"salary","type":"double","nullable":true,"metadata":{}},{"name":"title","type":"string","nullable":true,"metadata":{}},{"name":"comments","type":"string","nullable":true,"metadata":{}},{"name":"__index_level_0__","type":"long","nullable":true,"metadata":{}}]}

@N-o-Z N-o-Z added include-changelog PR description should be included in next release changelog export-hooks labels Mar 13, 2024
@N-o-Z N-o-Z self-assigned this Mar 13, 2024
Copy link

github-actions bot commented Mar 13, 2024

♻️ PR Preview 156c05b has been successfully destroyed since this PR has been closed.

🤖 By surge-preview

Copy link

E2E Test Results - DynamoDB Local - Local Block Adapter

10 passed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the export_delta_log method with the new parameter?

Copy link
Contributor

@Jonathan-Rosenberg Jonathan-Rosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool!
Thanks for the quick fix
(just the docs thingy)

AzureStorageAccount: viper.GetString("azure_storage_account"),
AzureAccessKey: viper.GetString("azure_storage_access_key"),
}
//blockstore := setupCatalogExportTestByStorageType(t, testData)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete?

Copy link
Contributor

@Isan-Rivkin Isan-Rivkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, request only around docs.
I wonder about friction - not sure so it's more of a question:
Unity users will have to use this path transformer, maybe we can make it somehow default?
For example let it be the default in delta_exporter (maybe its a bad idea).
:)

docs/howto/hooks/lua.md Show resolved Hide resolved
@N-o-Z
Copy link
Member Author

N-o-Z commented Mar 14, 2024

Looks good, request only around docs. I wonder about friction - not sure so it's more of a question: Unity users will have to use this path transformer, maybe we can make it somehow default? For example let it be the default in delta_exporter (maybe its a bad idea). :)

I don't think making it the default behavior for delta export is a good idea. For delta export use cases this might even be a point of friction for users who have set up a Azure Blob Storage (and not an ADLS gen2 storage account)

@@ -739,6 +745,10 @@ The registration will use the following paths to register the table:
`<catalog>.<branch name>.<table_name>` where the branch name will be used as the schema name.
The return value is a table with mapping of table names to registration request status.

**Note: (Azure users)** Databricks catalog external locations is supported only for ADLS Gen2 storage accounts.
When exporting Delta tables using the `lakefs/catalogexport/delta_exporter.export_delta_log` function, the `path_transformer` must be
used to convert the paths scheme to `abfss`. the built-in `azure` lua library provides this functionality in `transformPathToAbfss`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
used to convert the paths scheme to `abfss`. the built-in `azure` lua library provides this functionality in `transformPathToAbfss`.
used to convert the paths scheme to `abfss`. The built-in `azure` lua library provides this functionality in `transformPathToAbfss`.

Just had to...

Copy link
Contributor

@Jonathan-Rosenberg Jonathan-Rosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome (one typo change request that doesn't block)

Copy link
Contributor

@Isan-Rivkin Isan-Rivkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@N-o-Z
Copy link
Member Author

N-o-Z commented Mar 14, 2024

Thanks - waiting to verify the fix completely before merging

@N-o-Z N-o-Z merged commit ccd6db6 into master Mar 14, 2024
36 checks passed
@N-o-Z N-o-Z deleted the task/unity-exporter-support-for-azure-7551 branch March 14, 2024 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
export-hooks include-changelog PR description should be included in next release changelog
Projects
None yet
3 participants