-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduling based on dataset aliases #40693
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
341281d
to
253c5b0
Compare
uranusjr
reviewed
Jul 15, 2024
uranusjr
reviewed
Jul 15, 2024
uranusjr
reviewed
Jul 15, 2024
ffccb4b
to
22d772b
Compare
…during scheduling
…take dataset_alias
…iter to take dataset_alias" This reverts commit 22d772b06be7cbfde67ccab6a87569112dec136e.
… relationship defined
Co-authored-by: Ephraim Anierobi <splendidzigy24@gmail.com>
e332b66
to
6b1fd6c
Compare
All the comments were addressed. Please let me know if anyone wants to take a deeper look. I'm planning on merging this one later today. |
phanikumv
approved these changes
Jul 22, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area:db-migrations
PRs with DB migration
area:serialization
kind:documentation
type:new-feature
Changelog: New Features
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related: #40039
What
In #40478, we introduce a new class
DatasetAlias
, which allows emittingDatasetEvent
s or creatingDataset
s in a task. This PR allows us to schedule a DAG run based onDatasetAlias
.Example
In the example above, before the DAG "dataset-alias-producer" is executed, the dataset alias
DatasetAlias("example-alias")
is not yet resolved toDataset("s3://bucket/my-task")
. Consequently, completing the execution of the DAG "dataset-producer" will only trigger the DAG "dataset-consumer" and not the DAG "dataset-alias-consumer". However, upon triggering the DAG "dataset-alias-producer", theDatasetAlias("example-alias")
will be resolved toDataset("s3://bucket/my-task")
, and it will produce a dataset event that triggers the DAG "dataset-consumer". At this point,DatasetAlias("example-alias")
is resolved toDataset("s3://bucket/my-task")
. Therefore, completing the execution of either DAG "dataset-producer" or "dataset-alias-producer" will trigger both the DAG "dataset-consumer" and "dataset-alias-consumer".^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.