-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: Dataset dependencies in separate fields #3097
feature: Dataset dependencies in separate fields #3097
Conversation
b2655ad
to
a2c9738
Compare
integration/spark/shared/src/main/java/io/openlineage/spark/api/SparkOpenLineageConfig.java
Outdated
Show resolved
Hide resolved
8e67233
to
3462074
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #3097 +/- ##
============================================
+ Coverage 85.70% 100.00% +14.29%
============================================
Files 54 5 -49
Lines 3112 173 -2939
============================================
- Hits 2667 173 -2494
+ Misses 445 0 -445 ☔ View full report in Codecov by Sentry. |
d98f987
to
f8943cc
Compare
f8943cc
to
a5a42e4
Compare
02d9d44
to
d30daf5
Compare
...src/test/java/io/openlineage/spark/agent/column/ColumnLevelLineageIcebergDeprecatedTest.java
Outdated
Show resolved
Hide resolved
...nlineage/spark/agent/column/ColumnLineageWithTransformationTypesDeprecatedMechanismTest.java
Outdated
Show resolved
Hide resolved
...eage/spark/agent/lifecycle/plan/column/ColumnLevelLineageBuilderDeprecatedMechanismTest.java
Outdated
Show resolved
Hide resolved
...src/main/java/io/openlineage/spark3/agent/lifecycle/plan/column/ColumnLevelLineageUtils.java
Show resolved
Hide resolved
d30daf5
to
fb283e0
Compare
fb283e0
to
901a214
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original idea behind PR was: let's copy existing tests, bcz there will be a lot of tests changes when implementing dataset lineage feature. The problem with this reasoning is that it turned out only 4 existing tests are affected by the new feature, but going this way requires copying 900 lines of code.
Is my understanding correct?
...spark/app/src/test/java/io/openlineage/spark/agent/column/ColumnLevelLineageIcebergTest.java
Show resolved
Hide resolved
integration/spark/shared/src/main/java/io/openlineage/spark/api/SparkOpenLineageConfig.java
Show resolved
Hide resolved
901a214
to
946ec61
Compare
2add4cd
to
d6ddf99
Compare
* When flag columnLineage.deprecatedMechanismEnabled, then the dataset dependencies are extracted from property field into separate field (dataset) Signed-off-by: Artur Owczarek <owczarek.artur@gmail.com>
d6ddf99
to
447cf2c
Compare
Problem
The dataset dependencies are mixed with field dependencies
Relates to: #3084
This is a continuation of: #3098
The next change is #3100
Solution
We should have the dataset dependencies separate
If you're contributing a new integration, please specify the scope of the integration and how/where it has been tested (e.g., Apache Spark integration supports
S3
andGCS
filesystem operations, tested with AWS EMR).One-line summary:
Checklist
SPDX-License-Identifier: Apache-2.0
Copyright 2018-2024 contributors to the OpenLineage project