-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIP-62] Translate AIP-60 URI to OpenLineage #40173
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
boring-cyborg
bot
added
area:core-operators
Operators, Sensors and hooks within Core Airflow
area:lineage
area:providers
provider:amazon-aws
AWS/Amazon - related issues
provider:openlineage
AIP-53
provider:trino
labels
Jun 11, 2024
kacpermuda
changed the title
WIP - translate AIP-60 URI to OpenLineage
WIP [AIP-62] Translate AIP-60 URI to OpenLineage
Jun 11, 2024
kacpermuda
force-pushed
the
aip62-ol-dataset-mapping
branch
4 times, most recently
from
June 19, 2024 08:49
05096aa
to
918cc0d
Compare
kacpermuda
changed the title
WIP [AIP-62] Translate AIP-60 URI to OpenLineage
[AIP-62] Translate AIP-60 URI to OpenLineage
Jun 19, 2024
kacpermuda
requested review from
eladkal,
o-nikolas and
mobuchowski
as code owners
June 19, 2024 09:19
mobuchowski
added
AIP-62
Tasks tracking implementation of AIP-62 Getting Lineage from Hook Instrumentation
and removed
provider:trino
labels
Jun 19, 2024
@uranusjr this is how we want to use AIP-60 datasets in OpenLineage |
kacpermuda
force-pushed
the
aip62-ol-dataset-mapping
branch
from
June 19, 2024 10:30
918cc0d
to
01756fd
Compare
uranusjr
reviewed
Jun 20, 2024
kacpermuda
force-pushed
the
aip62-ol-dataset-mapping
branch
3 times, most recently
from
July 9, 2024 17:22
e0fc1fa
to
37490be
Compare
I believe this PR is ready to be reviewed, however it should probably NOT be merged before #40335 gets merged. |
kacpermuda
force-pushed
the
aip62-ol-dataset-mapping
branch
from
July 10, 2024 10:45
37490be
to
a0b337f
Compare
kacpermuda
force-pushed
the
aip62-ol-dataset-mapping
branch
6 times, most recently
from
July 11, 2024 13:47
741c4b8
to
84c1b09
Compare
uranusjr
reviewed
Jul 12, 2024
uranusjr
reviewed
Jul 12, 2024
kacpermuda
force-pushed
the
aip62-ol-dataset-mapping
branch
3 times, most recently
from
July 16, 2024 07:50
f894908
to
29fe565
Compare
kacpermuda
force-pushed
the
aip62-ol-dataset-mapping
branch
4 times, most recently
from
July 22, 2024 10:13
f57c53e
to
9dbc9ea
Compare
kacpermuda
commented
Jul 22, 2024
kacpermuda
force-pushed
the
aip62-ol-dataset-mapping
branch
2 times, most recently
from
July 22, 2024 17:27
3a17c9f
to
b2ba021
Compare
Signed-off-by: Kacper Muda <mudakacper@gmail.com>
Signed-off-by: Kacper Muda <mudakacper@gmail.com>
kacpermuda
force-pushed
the
aip62-ol-dataset-mapping
branch
from
July 22, 2024 18:07
b2ba021
to
bd2ef8c
Compare
mobuchowski
approved these changes
Jul 23, 2024
ephraimbuddy
added
the
changelog:skip
Changes that should be skipped from the changelog (CI, tests, etc..)
label
Jul 24, 2024
This was referenced Jul 28, 2024
kosteev
pushed a commit
to GoogleCloudPlatform/composer-airflow
that referenced
this pull request
Sep 16, 2024
Fix unit tests: - test_does_not_double_import_entrypoint_provider_plugins - in apache-airflow-providers-databricks==6.8.0 was added DatabricksWorkflowPlugin (apache/airflow#40724) - test_dataset - in apache-airflow-providers-amazon==8.27.0 changed Dataset URI format validation (apache/airflow#40173) Change-Id: Iae902e544aae2086ea4495b0850c19f813aa7069 GitOrigin-RevId: 7d5b7a9ead32610f7e3864230e55bb3a17bf6da5
kosteev
pushed a commit
to GoogleCloudPlatform/composer-airflow
that referenced
this pull request
Sep 20, 2024
Changes: - add suffix +composer to version - remove http and sqlite from pre-installed providers as they are included to Composer dependencies already - add pre-commit configuration file - set Composer pypi dependencies - adjust Airflow configs required for unit tests in order to prevent them from being cleaned up during testing - fix test_dataset as in apache-airflow-providers-amazon==8.27.0 changed Dataset URI format validation (apache/airflow#40173) Change-Id: Iac6842a49929d9e2c4b8ed29353312dbc450de8a GitOrigin-RevId: 1680fdc22961fa22517b2fd21ca67e8240e1f16a
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
AIP-62
Tasks tracking implementation of AIP-62 Getting Lineage from Hook Instrumentation
area:core-operators
Operators, Sensors and hooks within Core Airflow
area:lineage
area:providers
changelog:skip
Changes that should be skipped from the changelog (CI, tests, etc..)
provider:amazon-aws
AWS/Amazon - related issues
provider:openlineage
AIP-53
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
closes: #38767
For important changes look at the first commit, then for example implementation look at the second commit.
For Airflow Dataset I've added:
_get_normalized_scheme()
function that still doesscheme.lower()
underneath but now we can also use this in OL provider and be sure that we are using the same mechanism everywhere.Dataset.normalized_uri
property - so that we can retrieve a normalized and AIP-60 compliant uri or None in all other cases (not an uri, no scheme, no normalizer etc.). At first i thought that in Airflow 3 we could just use Dataset.uri, as it will raise an error when a normalizer fails, but there can still be schems without a normalizer defined so i felt like this is needed.Also small adjustment to ProvidersManager: I felt like this dataset-uris part of provider.yml is getting complex, so i re-wrote the
_discover_dataset_uri_handlers
method to be more flexible for future expansions (f.e. OL to AIP-60 converters).This Pr should be only merged AFTER #40335.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.