Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate expected types for args for DAG, BaseOperator and TaskGroup #40269

Merged
merged 4 commits into from
Jun 21, 2024

Conversation

pankajkoti
Copy link
Member

@pankajkoti pankajkoti commented Jun 16, 2024

Validates that some commonly used arguments by DAG authors
conform to the expected types. If the provided values do not
match the expected types, a TypeError is raised, resulting in
DAG import errors that appear in the Airflow UI.

Closes: #40187


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

airflow/models/baseoperator.py Outdated Show resolved Hide resolved
airflow/models/baseoperator.py Outdated Show resolved Hide resolved
@pankajkoti pankajkoti marked this pull request as ready for review June 16, 2024 19:25
@uranusjr
Copy link
Member

Generally looks good to me, except I think we don’t need to declare the types in a separate dict, but can use typing.get_type_hints to extract the type hints from __init__. I am not entirely sure, but let’s give it a try.

Most of the entries should be simple, only email needs some extra work to handle the Union type. We can use typing.get_origin(...) is typing.Union to check if the instance is a Union, and use typing.get_args to extract the arguments.

@pankajkoti
Copy link
Member Author

pankajkoti commented Jun 17, 2024

Generally looks good to me, except I think we don’t need to declare the types in a separate dict, but can use typing.get_type_hints to extract the type hints from __init__. I am not entirely sure, but let’s give it a try.

Most of the entries should be simple, only email needs some extra work to handle the Union type. We can use typing.get_origin(...) is typing.Union to check if the instance is a Union, and use typing.get_args to extract the arguments.

I did try to get the type hints using get_type_hints method from the typing module, but it does help for Python < 3.10, for type annotations that use a bitwise or | in the annotations instead of the Union annotation, also did try typing.get_origin(...) is typing.Union but it did not yield expected results. Additionally using type hints directly in the isinstance methods may not work well always e.g. when the annotation is of type Iterable[str]
Hence, I have taken the manual approach 🙂 in this PR

@uranusjr
Copy link
Member

uranusjr commented Jun 17, 2024

That makes sense, getting annotations at runtime (instead of in Mypy) is something Python has been working in recently but I was not sure exactly when it became usable. Looks like we can’t do that yet (maybe two years later…)

Let’s extract the dicts and checks outside the classes, and add a todo comment that says maybe consider refactoring when we require 3.10+ (not sure about the exact version but we don’t need to be exact)

@pankajkoti
Copy link
Member Author

Let’s extract the dicts and checks outside the classes

@uranusjr, could you clarify where exactly we should extract the validation logic? Do you mean move the dicts to be within the same module but outside the classes? The arguments we validate are specific to certain classes (DAG, BaseOperator, TaskGroup) and vary across them. Moreover, since these argument values could be passed directly or via default arguments, we perform the validation check after the instance has been assigned all argument values, following the evaluation of positional/keyword arguments and default arguments. In my opinion, keeping these dicts localized within their respective classes is better. However, if you could explain where and why we should extract them, it would help me understand your perspective better. Also, the helper validation method is already outside the classes in utils which we call from the respective classes.

@uranusjr
Copy link
Member

Your correct checks are fine, I just want to move the dicts out of the classes, and add comments to hint about future refactors (and to stop people from refactoring before the supported Python versions allow us to).

@pankajkoti
Copy link
Member Author

pankajkoti commented Jun 17, 2024

@uranusjr okay, moved the dicts out and added comments as suggested. Please review when possible, thanks.

@dstandish
Copy link
Contributor

my suggestions:

  • keep it limited to the attrs that the scheduler cares about
  • but, if we're going to do all params, use a pre-commit for it, or some existing library that does it more invisibly

Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to add a pre-commit hook to ensure the dicts are up-to-date with the __init__ arguments. But this lgtm.

@pankajkoti
Copy link
Member Author

my suggestions:

  • keep it limited to the attrs that the scheduler cares about
  • but, if we're going to do all params, use a pre-commit for it, or some existing library that does it more invisibly

I will dwelve into this later to check if we can curtail the attrs list and/or if we have a good library that could help us. For now, would like to merge this and have a preventive workaround for crashes we're seeing users reporting.

@pankajkoti pankajkoti merged commit 832099c into apache:main Jun 21, 2024
52 checks passed
@pankajkoti pankajkoti deleted the validate-arg-types branch June 21, 2024 09:47
@pankajkoti pankajkoti added this to the Airflow 2.9.3 milestone Jun 21, 2024
@utkarsharma2 utkarsharma2 added the type:bug-fix Changelog: Bug Fixes label Jul 1, 2024
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Jul 26, 2024
…pache#40269)

Validates that some commonly used arguments by DAG authors
conform to the expected types. If the provided values do not
match the expected types, a TypeError is raised, resulting in
DAG import errors that appear in the Airflow UI.

Closes: apache#40187
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scheduler Crashes When Passing Invalid Value to Argument in default_args
4 participants