Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest): stateful-ingestion - keep dataset urn case in checkpoints #6244

Merged
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions metadata-ingestion/src/datahub/emitter/mce_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,12 @@ def dataset_urn_to_key(dataset_urn: str) -> Optional[DatasetKeyClass]:
return None


def dataset_key_to_urn(key: DatasetKeyClass) -> str:
return (
f"urn:li:dataset:(urn:li:dataPlatform:{key.platform},{key.name},{key.origin})"
)


def make_container_new_urn(guid: str) -> str:
return f"urn:dh:container:0:({guid})"

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
from typing import Iterable, List, Set

from datahub.emitter.mce_builder import dataset_urn_to_key, make_dataset_urn
from datahub.emitter.mce_builder import (
dataset_key_to_urn,
dataset_urn_to_key,
)
from datahub.metadata.schema_classes import DatasetKeyClass


class CheckpointStateUtil:
Expand Down Expand Up @@ -35,4 +39,6 @@ def get_dataset_urns_not_in(
)
for encoded_urn in difference:
platform, name, env = encoded_urn.split(CheckpointStateUtil.get_separator())
yield make_dataset_urn(platform, name, env)
yield dataset_key_to_urn(
DatasetKeyClass(platform=platform, name=name, origin=env)
)