Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingestion): Add typeUrn handling to ownership transformers #9370

Merged

Conversation

skrydal
Copy link
Collaborator

@skrydal skrydal commented Dec 1, 2023

Changed behavior of both ownership transformers by adding possibility to define ownership_type_urn, as well as removing default value of ownership_type (it will be populated as CUSTOM if user defines ownership_type_urn). This way the transformers maintain backward compatibility as long as the users don't base on default value for ownership_type (if ownership_type is set but not ownership_type_urn transformers work as before, setting the ownership without typeUrn.

Fixes #8441

@skrydal
Copy link
Collaborator Author

skrydal commented Dec 3, 2023

Seems like the failing test case is not related at all to the code change (it's Cypress UI test)

@skrydal
Copy link
Collaborator Author

skrydal commented Dec 6, 2023

Seems like this PR might be also fixing problems such as #8441

Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did a similar thing here #9332

I'd like to do something similar, where you can provide either an urn or a legacy enum value in the ownership_type field. Main reason is that I think having two separate fields will be confusing for end users

@skrydal
Copy link
Collaborator Author

skrydal commented Dec 7, 2023

Hi @hsheth2, thank you for your answer. Providing typeUrn in old type field was a topic of discussion within my team as we were aiming to bring consistency into ownership type handling internally. We decided to go this way as we based on a particular assumption:
Enum-based ownership type setting is obsolete and will be discontinued at some point in the future (especially considering that some logic in policyEngine expects typeUrn to be present instead of enums).

If above assumption is correct, then we are currently in transition period supporting both ownership type "styles" and, at some point, setting ownership based on type (enum value) will not be possible/correct. If we use approach proposed in this PR, I think at such moment, if there are Datahub users still having old-style ownership set in a ingestor/transformer configuration it will be way clearer for them what is wrong (ingestor will simply fail to run and they will need to see docs which will explain everything to them).
It's worth mentioning there is a need for more PRs changing other parts of Datahub metadata-ingestion, such as here in business glossary.
What do you think?

@hsheth2
Copy link
Collaborator

hsheth2 commented Dec 7, 2023

I think ultimately the public interface that people think about is "this ownership is of this type", and don't really care/distinguish between type and typeUrn.

In the dbt PR, what we do is look at the user-provided type - if it begins with urn:li:ownershipType, then we assume it's a custom ownership type, otherwise we assume it's a legacy ownership type.

I don't see us getting rid of the old ownership types anytime soon, so I suspect we'll be in this hybrid state for a bit.

@skrydal
Copy link
Collaborator Author

skrydal commented Dec 11, 2023

@hsheth2
I applied changes to make the config use type field for both styles of ownership type. I wondered what should be type set in case we provide typeUrn. My understanding was that the only correct value in such case is CUSTOM but in referenced PR I can see different approach:
https://github.com/datahub-project/datahub/pull/9332/files#diff-d4c6fce50d23d7d5137010d6c33f662b176aef3bcdfc4f1a1deefa4ee86a222dR287-R288
Could you please explain to me what is agreed strategy here?

Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good - thanks for adding the tests too

There's some additional simplification that we could do here, those aren't blockers for merging this PR

@@ -102,7 +98,7 @@ def transform_aspect(


class DatasetOwnershipBaseConfig(TransformerSemanticsConfigModel):
ownership_type: Optional[str] = OwnershipTypeClass.DATAOWNER
ownership_type: Optional[str]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to do this in this PR, but it might simplify some of the cast calls away

Suggested change
ownership_type: Optional[str]
ownership_type: str = OwnershipTypeClass.DATAOWNER

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted to that (changed it in first place due to addition of another field, now removed).

if ownership_type is None:
return OwnershipTypeClass.DATAOWNER, None
try:
ownership_type_str = cast(str, ownership_type)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this cast call isn't required

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, refactored and simplified this function

@@ -210,7 +210,8 @@ def test_simple_dataset_ownership_transformation(mock_time):
"owner_urns": [
builder.make_user_urn("person1"),
builder.make_user_urn("person2"),
]
],
"ownership_type": "DATAOWNER",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these defaults necessary - it looks like validate_ownership_type will add this default right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly just want to double check that this change is backwards compatible

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed "ownership_type" from this test case, kept it in other one so that we have better coverage of different cases.

Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hsheth2 hsheth2 merged commit a495d65 into datahub-project:master Dec 13, 2023
53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata poc-marathon-dec-2023
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ownership transforms do not work with new ownership types
3 participants