You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is an enhancement request. #1021 allows users to import a Glue database that is originally not in the same AWS account as the S3 Bucket. This scenario is very similar to the one described in this blogpost. There are data producer accounts where data is stored in S3 and then there is a central catalog account where all glue databases are created. The glue databases are then shared back with the data producer accounts as resource link databases using Lake Formation. More schematically:
In AWS:
Account A - Central Catalog - Original Glue database + data lake location registered in Lake Formation
Account B - Data producer account - S3 Bucket + Resource link database
In data.all:
Environment A
Environment B - Imported Dataset with S3 Bucket + Resource link database + ANOTHER registration in Lake Formation + Glue Crawler + IAM role that can access Bucket+resource link database
Data sharing detects the source catalog and shares the Original Glue database. If pre-requisites are met: Environment A is onboarded in data.all and the Original Glue database is tagged as explained in #1021
Issues:
The second registration in Lake Formation is not needed and pollutes LF
The Glue crawler in the producer account targets the resource link database, which does not make much sense. Instead, if anything, it should create tables in the Original database as explained in thisBlogpost.
No "heads up" in the UI indicating that the pre-reqs are needed
No visibility on whether the pre-requisites are fulfilled from the UI
Solutions
(we can implement more than one or other alternatives)
Add documentation in user guide - planned as part of 2.3 release
Store as Dataset metadata if a database is a resource link database:
Show in UI + info about tagging+environment should be onboarded
Avoid creating Glue crawler and registering the data lake location in LF for resource link databases
Potentially simplify sharing checks
The text was updated successfully, but these errors were encountered:
As a relatively new user of Data.all I would love to see some more instructions directly in UI. May be they shouldn't be shown by default, but it would be nice to have (?)-icon, which can be linked to particular paragraph in user guide.
As per LF-locations, I thinks it's some kind of a bug (feature?): we should register location afterwards. I think, we need to put effort into research of this behaviour.
This is an enhancement request. #1021 allows users to import a Glue database that is originally not in the same AWS account as the S3 Bucket. This scenario is very similar to the one described in this blogpost. There are data producer accounts where data is stored in S3 and then there is a central catalog account where all glue databases are created. The glue databases are then shared back with the data producer accounts as resource link databases using Lake Formation. More schematically:
In AWS:
In data.all:
Data sharing detects the source catalog and shares the Original Glue database. If pre-requisites are met: Environment A is onboarded in data.all and the Original Glue database is tagged as explained in #1021
Issues:
Solutions
(we can implement more than one or other alternatives)
The text was updated successfully, but these errors were encountered: