-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingest/looker): browse path followups #10217
feat(ingest/looker): browse path followups #10217
Conversation
b3e3431
to
f280f3b
Compare
model as container of explores project as container of views
a7cb96a
to
eadcbe2
Compare
due to multithreading, occassionally browse path v2 was not generated correctly for some folders.
@@ -325,7 +339,7 @@ def auto_browse_path_v2( | |||
yield MetadataChangeProposalWrapper( | |||
entityUrn=urn, | |||
aspect=BrowsePathsV2Class( | |||
path=_prepend_platform_instance([], platform, platform_instance) | |||
path=prepend_platform_instance([], platform, platform_instance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when using a platform instance with manually generated browse path v2, is the source responsible for adding the platform instance or does this helper do it?
I want to make sure we don't have cases where it's prepended twice or more for folders deep in the hierarchy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intended to keep it as responsibility of source to have platform instance prepended in case its manually emitting browse path v2 but I see a corner case now, where the platform instance may get prepended twice - from within helper - to children of the container that emits such browse path v2.
I can
- add a check in
prepend_platform_instance
not to prepend if browse path already has a platform instance entry at its start - or leave the responsibility of prepending platform instance only to source helper - across all sources - i.e. no source should ever prepend platform instance in browse path v2.
I'm inclined to do 2nd for looker but I also think, it wouldn't harm to have a safety net (as mentioned in point 1). WDYT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s do the 2nd one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
cls.get_platform_id = lambda: id or platform_name.lower().replace(" ", "-") | ||
cls.get_platform_id = lambda *args: id or platform_name.lower().replace( | ||
" ", "-" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
browse path source helper relies on platform
attribute of source or platform
attribute of source's config to generate platform instance however, the platform
is set to lookml
in lookml source and is used in stale metadata removal job id generation. Earlier, this caused platform instance to be emitted with lookml platform.
This PR changes browse path source helper to use get_platform_id
as preferred method to get platform name. get_platform_id
added as part of @platform_name
wrapper (used for grouping sources together in documentation) can be used as a standard mechanism for identifying platform
of a source. It accepts separate platform name and id and can be flexibly used to separate display name from platform's id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is not working for s3 where platform is picked from config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted this logic and added logic to prefer "platform_name" from config if its present.
paths: Dict[str, List[BrowsePathEntryClass]] = {} | ||
|
||
emitted_urns: Set[str] = set() | ||
containers_used_as_parent: Set[str] = set() | ||
for urn, batch in _batch_workunits_by_urn(stream): | ||
container_path: Optional[List[BrowsePathEntryClass]] = None | ||
legacy_path: Optional[List[BrowsePathEntryClass]] = None | ||
has_browse_path_v2 = False | ||
browse_path_v2: Optional[List[BrowsePathEntryClass]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usage of has_browse_path_v2
replaced by browse_path_v2 is not None
Follow up on #10147
Checklist