Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Dataset dependencies in separate fields #3097

Conversation

arturowczarek
Copy link
Collaborator

@arturowczarek arturowczarek commented Sep 17, 2024

  • When flag columnLineage.deprecatedMechanismEnabled, then the dataset dependencies are extracted from property field into separate field (dataset)

Problem

The dataset dependencies are mixed with field dependencies

Relates to: #3084
This is a continuation of: #3098
The next change is #3100

Solution

We should have the dataset dependencies separate

Note: All schema changes require discussion. Please link the issue for context.

  • Your change modifies the core OpenLineage model
  • Your change modifies one or more OpenLineage facets

If you're contributing a new integration, please specify the scope of the integration and how/where it has been tested (e.g., Apache Spark integration supports S3 and GCS filesystem operations, tested with AWS EMR).

One-line summary:

Checklist

  • You've signed-off your work
  • Your pull request title follows our guidelines
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • Your comment includes a one-liner for the changelog about the specific purpose of the change (not required for changes to tests, docs, or CI config)
  • You've versioned the core OpenLineage model or facets according to SchemaVer (if relevant)
  • You've added a header to source files (if relevant)

SPDX-License-Identifier: Apache-2.0
Copyright 2018-2024 contributors to the OpenLineage project

@boring-cyborg boring-cyborg bot added area:client/java openlineage-java area:integration/spark area:spec Specifications and standards for the project area:tests Testing code language:java Uses Java programming language labels Sep 17, 2024
@arturowczarek arturowczarek force-pushed the feature/dataset-dependencies branch 2 times, most recently from b2655ad to a2c9738 Compare September 18, 2024 08:47
@boring-cyborg boring-cyborg bot added area:client/python openlineage-python language:python Uses Python programming language labels Sep 18, 2024
@arturowczarek arturowczarek force-pushed the feature/dataset-dependencies branch 2 times, most recently from 8e67233 to 3462074 Compare September 18, 2024 12:29
@codecov-commenter
Copy link

codecov-commenter commented Sep 18, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (a97dae9) to head (4b3d6b8).

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff              @@
##             main     #3097       +/-   ##
============================================
+ Coverage   85.70%   100.00%   +14.29%     
============================================
  Files          54         5       -49     
  Lines        3112       173     -2939     
============================================
- Hits         2667       173     -2494     
+ Misses        445         0      -445     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@arturowczarek arturowczarek force-pushed the feature/dataset-dependencies branch 2 times, most recently from d98f987 to f8943cc Compare September 18, 2024 13:03
@boring-cyborg boring-cyborg bot added the area:documentation Improvements or additions to documentation label Sep 18, 2024
@arturowczarek arturowczarek force-pushed the feature/dataset-dependencies branch from f8943cc to a5a42e4 Compare September 18, 2024 14:19
@arturowczarek arturowczarek mentioned this pull request Sep 18, 2024
10 tasks
@arturowczarek arturowczarek marked this pull request as ready for review September 18, 2024 14:36
@arturowczarek arturowczarek requested a review from a team as a code owner September 18, 2024 14:36
@arturowczarek arturowczarek force-pushed the feature/dataset-dependencies branch 3 times, most recently from 02d9d44 to d30daf5 Compare September 25, 2024 08:27
@arturowczarek arturowczarek force-pushed the feature/dataset-dependencies branch from d30daf5 to fb283e0 Compare September 27, 2024 10:14
@arturowczarek arturowczarek force-pushed the feature/dataset-dependencies branch from fb283e0 to 901a214 Compare September 27, 2024 13:09
Copy link
Collaborator

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original idea behind PR was: let's copy existing tests, bcz there will be a lot of tests changes when implementing dataset lineage feature. The problem with this reasoning is that it turned out only 4 existing tests are affected by the new feature, but going this way requires copying 900 lines of code.

Is my understanding correct?

@arturowczarek arturowczarek force-pushed the feature/dataset-dependencies branch from 901a214 to 946ec61 Compare October 1, 2024 06:14
@arturowczarek arturowczarek force-pushed the feature/dataset-dependencies branch 2 times, most recently from 2add4cd to d6ddf99 Compare October 1, 2024 15:03
* When flag columnLineage.deprecatedMechanismEnabled, then the dataset dependencies are extracted from property field into separate field (dataset)

Signed-off-by: Artur Owczarek <owczarek.artur@gmail.com>
@arturowczarek arturowczarek force-pushed the feature/dataset-dependencies branch from d6ddf99 to 447cf2c Compare October 2, 2024 05:23
@pawel-big-lebowski pawel-big-lebowski merged commit 59218be into OpenLineage:main Oct 2, 2024
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:client/java openlineage-java area:client/python openlineage-python area:documentation Improvements or additions to documentation area:integration/spark area:spec Specifications and standards for the project area:tests Testing code language:java Uses Java programming language language:python Uses Python programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants