Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-5390] Remove provide context #6074

Closed
wants to merge 5 commits into from

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Sep 10, 2019

Add some more docs. I agree with you @mik-laj that the docs where a bit minimal on the subject. WDYT?

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
    • https://issues.apache.org/jira/browse/AIRFLOW-XXX
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

@@ -34,7 +34,8 @@ Passing in arguments
^^^^^^^^^^^^^^^^^^^^

Use the ``op_args`` and ``op_kwargs`` arguments to pass additional arguments
to the Python callable.
to the Python callable. If you use any of the :doc:`context variables <../../macros-ref>`
as an argument of the provided callable, the value will be automatically injected as shown below:
Copy link
Member

@mik-laj mik-laj Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All context variables can still be provided with a double-asterisk argument:

def myfunc(**context):
    print(context)  # all variables will be provided to context

python_operator = PythonOperator(task_id='mytask', python_callable=myfunc)

I think this part of UPDATING.md is missing in docs.

In my opinion, there is still too much documentation in the UPDATING.md file.

Maybe it's worth moving some content from updating.md file and linking to other documentation? I mentioned this file, but docstring is also useful documentation. This can discuss reserved parameter names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main goal was to simplify how it works, that you don't have to care about how this works as a user of Airflow. Besides that, I don't like the ** notation:

def myfunc(**context):
    print(context['ds'])

is equivalent to:

def myfunc(ds):
    print(ds)

So I don't want to point the user in the direction of using an overly verbose way of doing this. Normally in Python you should also be conscious with the kwargs stuff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There aren't really reserved parameter names anymore. The only clash that is possible if you do something like:

def fn(dag, ds, **context):
    print(dag)

PythonOperator(
    op_args=[1],
    python_callable=fn
)

In this case the dag == 1, and the ds is just the execution date. If you would change the code to:

def fn(dag, ds, **context):
    print(dag)

PythonOperator(
    op_args=[1, 2],
    python_callable=fn
)

Then ds would be ds == 2. To avoid confusion the arguments that you provide using the op_args cannot be part of the keywords in the context. This is an edge case that will not happen in practice, and we throw an error just to keep the users their sanity.

@mik-laj
Copy link
Member

mik-laj commented Sep 10, 2019

Here I put my proposition for documentation based on note in updating.md. This is not a finished change, but I wanted to show an example.


UPDATING.md

provide_context argument on the PythonOperator was removed. The signature of the callable passed to the PythonOperator is now inferred and argument values are always automatically provided. There is no need to explicitly provide or not provide the context anymore.
For more information: [python.rst]

The change is backwards compatible, setting provide_context will add the provide_context variable to the kwargs (but won't do anything).

PR: #5990


python.rst

The signature of the callable passed to the PythonOperator is inferred and argument values are always automatically provided. For example:

def myfunc(execution_date):
    print(execution_date)

python_operator = PythonOperator(task_id='mytask', python_callable=myfunc, dag=dag)

Notice you don't have to set provide_context=True, variables from the task context are now automatically detected and provided.

All context variables can still be provided with a double-asterisk argument:

def myfunc(**context):
    print(context)  # all variables will be provided to context

python_operator = PythonOperator(task_id='mytask', python_callable=myfunc)

The task context variable names are reserved names in the callable function, hence a clash with op_args and op_kwargs results in an exception:

def myfunc(dag):
    # raises a ValueError because "dag" is a reserved name
    # valid signature example: myfunc(mydag)

python_operator = PythonOperator(
    task_id='mytask',
    op_args=[1],
    python_callable=myfunc,
)

docstring

The signature of the callable passed to the PythonOperator is inferred and argument values are always automatically provided. 

.. warning::
The task context variable names are reserved names in the callable function, hence a clash with op_args and op_kwargs results in an exception

@Fokko
Copy link
Contributor Author

Fokko commented Sep 13, 2019

Rebased

@Fokko
Copy link
Contributor Author

Fokko commented Sep 13, 2019

Looks like master is failing 😭

@mik-laj
Copy link
Member

mik-laj commented Sep 13, 2019

@nuclearpinguin Can you look at this?

@mik-laj
Copy link
Member

mik-laj commented Sep 13, 2019

@Fokko Fixed and merged. I rebased your branch. Cross fingers.

@Fokko
Copy link
Contributor Author

Fokko commented Sep 14, 2019

Thanks @mik-laj for fixing this, appreciate it. Let's wait for Travis his verdict.

@mik-laj
Copy link
Member

mik-laj commented Sep 14, 2019

Travis is sad 😿 Can you comfort him?

======================================================================
8) FAIL: test_next_execution (tests.cli.test_cli.TestCLI)
----------------------------------------------------------------------
   Traceback (most recent call last):
    tests/cli/test_cli.py line 274 in test_next_execution
      self.assertEqual(stdout[-1], expected_output[i])
   AssertionError: '2019-09-15 00:00:00+00:00' != 'None'
   - 2019-09-15 00:00:00+00:00
   + None

@mik-laj
Copy link
Member

mik-laj commented Sep 21, 2019

Travis is sad. Can you do rebase?

@Fokko
Copy link
Contributor Author

Fokko commented Sep 22, 2019

I've rebased the branch, let's see if we can make Travis happy again.

@OmerJog
Copy link
Contributor

OmerJog commented Oct 3, 2019

@Fokko static check failed
rst ``code`` is two backticks.

@stale
Copy link

stale bot commented Nov 20, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Nov 20, 2019
@stale stale bot closed this Nov 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants