Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: fill data in column lineage input nodes. #2742

Merged

Conversation

JDarDagran
Copy link
Contributor

Problem

As stated in #2725 data is null for output nodes in column lineage endpoint.

Solution

Fill data field.

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've included a one-line summary of your change for the CHANGELOG.md (Depending on the change, this may not be necessary).
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@boring-cyborg boring-cyborg bot added the api API layer changes label Feb 2, 2024
Copy link

netlify bot commented Feb 2, 2024

Deploy Preview for peppy-sprite-186812 canceled.

Name Link
🔨 Latest commit e791b91
🔍 Latest deploy log https://app.netlify.com/sites/peppy-sprite-186812/deploys/65c5d3d437ad5a0008dee469

@JDarDagran JDarDagran force-pushed the add_data_to_column_lineage_edges branch from dfef1ae to dbf7793 Compare February 2, 2024 09:57
Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>
@JDarDagran JDarDagran force-pushed the add_data_to_column_lineage_edges branch from dbf7793 to 65c7c8d Compare February 2, 2024 10:40
@JDarDagran JDarDagran marked this pull request as ready for review February 2, 2024 10:42
@JDarDagran JDarDagran changed the title draft: fill data in column lineage input nodes. api: fill data in column lineage input nodes. Feb 2, 2024
this.transformationDescription = data.transformationDescription;
this.transformationType = data.transformationType;
this.inputFields = ImmutableList.of();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by some of these fields like fieldType and inputFields. Would these ever be set here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The truth is that if I set InputFieldNodeData to implement NodeData interface jackson couldn't deduce what was type of NodeData was in the output node (the one that previously had data field null). For that reason I basically made a workaround to simply create ColumnLineageNodeData from InputFieldNodeData.

I'd need more guidance how to fix jackson deduction :|

@tatiana
Copy link
Collaborator

tatiana commented Feb 6, 2024

@wslulciuc @phixMe - while talking to @JDarDagran, he mentioned that there were concerns this PR may not solve the original issue and that you'd be logging an issue with further details. Could you confirm if that is the case?

@wslulciuc
Copy link
Member

@tatiana: yeah, although this is a "fix", I think the issue is more fundemtail to our colum lineage query. That is, @phix and I looked into this a few days ago and walked through every OL call (@JDarDagran provided us) and it was enlightening. I’d like to share what we found, just haven’t had a window of time to write up our findings but the tl;dr is: the output_dataset_field_uuid in our column lineage query doesn’t return values for the ColumnLineageNodes on the edges of the graph (ie. graph nodes with no input columns, see below). I’d like to further confirm this, but given that our seed data works just fine, there’s something we are not accounting for.

IMG_6440

@pawel-big-lebowski
Copy link
Collaborator

@tatiana: yeah, although this is a "fix", I think the issue is more fundemtail to our colum lineage query. That is, @phix and I looked into this a few days ago and walked through every OL call (@JDarDagran provided us) and it was enlightening. I’d like to share what we found, just haven’t had a window of time to write up our findings but the tl;dr is: the output_dataset_field_uuid in our column lineage query doesn’t return values for the ColumnLineageNodes on the edges of the graph (ie. graph nodes with no input columns, see below). I’d like to further confirm this, but given that our seed data works just fine, there’s something we are not accounting for.

Hi @wslulciuc, I spent some time investigating nulls and I think @JDarDagran's PR solves the problem we were able to reproduce. The PR modifies a test and then applies changes need to fix it. That's why I think it is worth merging from that perspective.

What do you say? Once you provide more details on other issue, we can resume discussion while not blocking this.

@wslulciuc
Copy link
Member

wslulciuc commented Feb 9, 2024

@pawel-big-lebowski yeah, I'll merge 😉. I'm fine with that as we don't want to further delay the merge of #2725. Mind opening an issue to dive deeper into the column-level lineage query and move our discussion there?

Copy link
Member

@wslulciuc wslulciuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, @JDarDagran! We'll want to open an issue to dive deeper into the column-level lineage query and move our discussion there. But, giving this a +1 to not block #2742

@wslulciuc wslulciuc enabled auto-merge (squash) February 9, 2024 07:31
@wslulciuc wslulciuc merged commit b111d64 into MarquezProject:main Feb 9, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api API layer changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants