Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Column Lineage API #280

Merged
merged 5 commits into from
Mar 12, 2021
Merged

feat: Column Lineage API #280

merged 5 commits into from
Mar 12, 2021

Conversation

allisonsuarez
Copy link
Contributor

@allisonsuarez allisonsuarez commented Mar 10, 2021

Summary of Changes

  • Created Table Lineage API
  • Manually tested

Tests

What tests did you add or modify and why? If no tests were added or modified, explain why. Remove this line

Documentation

What documentation did you add or modify and why? Add any relevant links then remove this line

CheckList

Make sure you have checked all steps below to ensure a timely review.

  • PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
  • PR includes a summary of changes.
  • PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
  • PR passes make test

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
@codecov-io
Copy link

codecov-io commented Mar 12, 2021

Codecov Report

Merging #280 (101b52c) into master (2752492) will increase coverage by 3.51%.
The diff coverage is 72.83%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #280      +/-   ##
==========================================
+ Coverage   74.10%   77.61%   +3.51%     
==========================================
  Files          25       27       +2     
  Lines        1255     1385     +130     
  Branches      136      163      +27     
==========================================
+ Hits          930     1075     +145     
+ Misses        297      263      -34     
- Partials       28       47      +19     
Impacted Files Coverage Δ
metadata_service/api/popular_tables.py 100.00% <ø> (ø)
metadata_service/api/system.py 66.66% <ø> (ø)
metadata_service/api/user.py 100.00% <ø> (ø)
metadata_service/proxy/statsd_utilities.py 81.25% <ø> (ø)
metadata_service/util.py 100.00% <ø> (ø)
metadata_service/api/column.py 54.54% <28.57%> (-45.46%) ⬇️
metadata_service/proxy/shared.py 28.57% <28.57%> (ø)
metadata_service/api/badge.py 61.29% <61.29%> (ø)
metadata_service/proxy/neo4j_proxy.py 71.55% <61.53%> (-3.45%) ⬇️
metadata_service/proxy/base_proxy.py 67.07% <73.33%> (-0.07%) ⬇️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ed93cab...101b52c. Read the comment docs.

@allisonsuarez allisonsuarez requested a review from feng-tao March 12, 2021 01:14
@allisonsuarez allisonsuarez marked this pull request as ready for review March 12, 2021 01:14
@allisonsuarez
Copy link
Contributor Author

cc: @youngyjd @dkunitskiy

@feng-tao
Copy link
Member

hey @allisonsuarez, I wonder how you test it, is it by building a sub private class with CW to use the API directly?

Copy link
Contributor

@dorianj dorianj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM

direction = args.get('direction', 'both')
depth = args.get('depth', 0)
try:
lineage = self.client.get_lineage(id=f"{table_uri}/{column_name}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this id generation is a common pattern, should it be handled by a util or wrapper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question, I don't think we have any generic method for both tables and columns, in any methods that call columns like get_column_description on neo4j proxy we get column through its association with table name node, but if we have the entire column key we can just call directly with the key rather than querying neo4j for table->column->what_you_want

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yeah -- i guess ultimately this is tied to ColumnMetadata.COLUMN_KEY_FORMAT -- we're "getting lucky" in that column shares a prefix with TableMetadata.TABLE_KEY_FORMAT. I'm sure there are other places that rely on this behavior

@allisonsuarez
Copy link
Contributor Author

hey @allisonsuarez, I wonder how you test it, is it by building a sub private class with CW to use the API directly?

In this case I just used a private subclass for neo4j proxy and returned some mock data from the get_lineage method. Still working on getting column lineage from CW

@allisonsuarez allisonsuarez merged commit 681893f into master Mar 12, 2021
@allisonsuarez allisonsuarez deleted the asm-column-lineage-api branch March 12, 2021 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants