-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Advanced Params using json-schema #17100
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadly looks good, but some main themes to addres:
- Don't mutate
v.default
to validate the value - Questions about serialization
- Possibly confusing names with Param and DagParam.
airflow/providers/google/cloud/transfers/facebook_ads_to_gcs.py
Outdated
Show resolved
Hide resolved
44b2a6d
to
07265cd
Compare
bc691a4
to
fc0116a
Compare
see also |
airflow/providers/google/cloud/transfers/facebook_ads_to_gcs.py
Outdated
Show resolved
Hide resolved
airflow/providers/google/marketing_platform/operators/display_video.py
Outdated
Show resolved
Hide resolved
57f5ebd
to
452e2d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good -- a couple of small improvements that you can take or leave.
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestions
bash_command="echo {{ params.int_param }} {{ params.str_param }} {{ params.old_param }} " | ||
"{{ params.simple_param }} {{ params.email_param }} {{ params.task_param }}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is still un-solved
Airflow takes a params dictionary at the DAG level or at a Task level that can be overridden by providing dag_run.conf values. However, this params dictionary is quite static in nature and doesn't provide much value addition.
There have been quite some requests made by the community on this already, like #11054, #16430, #17085
Goal
Proposal
Param
which can be used in place of the value part of params dictionary.Approaches
pydantic
Pydantic is one of the fastest Python libraries to provide data & type validations (benchmark). I'd implemented various params classes in it (see sample) but did not like the way I had to write validators for each field separately. Also, the order you define fields matters a lot how one can access them in those validator methods.
attrs
Have used attrs previously and it's also in use within Airflow already. attrs simplifies writing classes and also exposes various in-build validators & pre-post init methods. Using attrs it was quite easy to create these classes (see this), though we've to fill in the logic by ourselves to do the data validation. We also felt that more & more such data validation requirements would come from the users and it could turn into a big pile of code in itself.
json-schema
We are using json-schema for DAG serialization already. json-schema has a very powerful & extensive way to define properties (validations) on a field in a language-agnostic way. It has implementation libs in almost all major languages. The custom code using json-schema is pretty minimum (here) & provides very extensive validations.
We should be able to use its Javascript implementation and validate data on the UI itself. The only concern here is that the json-schema rules can become pretty complex easily & users might found it hard to read and understand.
Trigger DAG page
DAG details API
DAG Trigger API
DAG trigger via CLI
Tasks test via CLI
Thanks a lot to @ashb & @kaxil for their inputs.