-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP/Idea: Pass task output as outlet to dataset trigger params #37888
WIP/Idea: Pass task output as outlet to dataset trigger params #37888
Conversation
FYI @jedcunningham @uranusjr WDYT? (I thought it is more complex but walking through the code looks quite simple...) |
run_conf = {} | ||
for item in dataset_events: | ||
event: DatasetEvent = item | ||
extra: dict | None = event.extra | ||
if extra: | ||
run_conf.update(extra) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit too heavy-handed, but I like the idea of passing in event extras as the downstream DAG run parameters. (Should we use conf or params for this?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code was just an idea - not finally thought through. If you say "heavy handed"... what do you mean? too brute force and the user does not know what comes out? Or do you mean we need a better merging mechanism? Or some hooking to be able to inject a custom merging strategy? Or just "code is ugly" :-D
Background: I would assume that in 90% of cases a single DAG triggers a dataset. THeremight be cases where multiple events come together to trigger. In such case we need to merge extra
s. I'd assume most times it is "conflict free" but you never know. Might be a feature to have it "last property wins" to collect events but otherwise if users feel there are too many conflicts, individual extras can also be produced "conflict free" with individual keys.
conf
vs. params
:
Yes, params
and dag_run.conf
somehow should be merged. I believe this is a leftover in the API from the past. CONF is the dict which is used to trigger a DAG. The conf is persisted as blob with the DagRun.
During runtime the conf is available in the context as dict, representing 1:1 the conf used to trigger. No validation. Just a dict.
params
in contrast have default values, conf is setting values on top and the result is JSON validated.
Both üarams
and conf
are available in the context and can be used. I believre mid-term we should deprecate the usage of conf in the DAG and consolidate to the (more and better functional) params. But for today params only exist during runtime.
@hussein-awala did an attempt here but it dd not make it to finish line: #29174
cb3514a
to
f00f806
Compare
My organization messed-up the airflow repo Fork - data is gone - will need to re-open the PR later when recovered :-( |
Repo at Bosch was restored, re-opening discussion :-D |
Ufff |
Other PR super-seeds this. |
This PR is a WIP proposal to fix/resolve the request for feature #37810
NOTE: It is just a code preview, therefore WIP.
Idea:
extra
(if not provided in Dataset reference)extra
use this as paramsextra
to the data triggered DAG asparams
Open items:
closes: #37810