Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingest): lineage for SageMaker model endpoints and groups #2894

Merged
merged 27 commits into from
Jul 19, 2021

Conversation

kevinhu
Copy link
Contributor

@kevinhu kevinhu commented Jul 16, 2021

Uses the SageMaker lineage graph to extract and infer lineage between models and endpoints as well as models and model groups.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

@kevinhu kevinhu marked this pull request as ready for review July 16, 2021 17:22
"Deleting": EndpointStatusClass.DELETING,
"Failed": EndpointStatusClass.FAILED,
"Unknown": EndpointStatusClass.UNKNOWN,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this level of granularity in our model as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so – it may be a little too detailed for our current pull-based integration, but it will be useful once we have push-based functionality.

@@ -55,6 +55,7 @@ def check_golden_file(
# if updating a golden file that doesn't exist yet, load the output again
if update_golden and not golden_exists:
golden = load_json_file(output_path)
shutil.copyfile(str(output_path), str(golden_path))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the golden doesn't exist, doesn't it make sense to just set it to None or [] -> there is a diff -> we update the underlying file

This logic seems a bit too complex, but we can address this in a follow up

Ownership,
Status,
Deprecation,
BrowsePaths
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will ml model endpoints be browsable? probably not, right? I imagine these will just be linked via ml models

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No they won't – will remove

Comment on lines 22 to 30
/**
* Name of the MLModelEndpoint
*/
@Searchable = {
"fieldType": "TEXT_PARTIAL",
"enableAutocomplete": true,
"boostScore": 10.0
}
name: string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do these names come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SageMaker – you can set a name for an endpoint

Copy link
Contributor

@dexter-mh-lee dexter-mh-lee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shirshanka shirshanka merged commit 6abd5e1 into datahub-project:master Jul 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants