[AIRFLOW-103] Allow jinja templates to be used in task params #1488

withnale · 2016-05-11T12:57:19Z

Dear Airflow Maintainers,

Please accept this PR that addresses the following issues:

https://issues.apache.org/jira/browse/AIRFLOW-103

Reminders for contributors:

Your PR's title must reference an issue on Airflow's JIRA. For example, a PR called "[AIRFLOW-1] My Amazing PR" would close JIRA issue Improving the search functionality in the graph view #1. Please open a new issue if required!
You must add an Apache License header to all new files.
Please squash your commits when possible and follow the 7 rules of good Git commits.

mistercrunch · 2016-05-11T15:39:16Z

airflow/models.py

@@ -1414,6 +1411,15 @@ def get_template_context(self, session=None):
            'test_mode': self.test_mode,
        }

+        # Allow task level param definitions to be rendered via jinja
+        # using the context available up until this point
+        if task.params:


This should be done in the render_templates and apply only on strings (use isinstance(obj, basestring) to check)

This comment still holds. It hasn't been addressed or contested.

mistercrunch · 2016-05-11T15:56:48Z

I think this can result in a change of behavior that could create some troubles or slow downs. If many tasks are referencing a params object (which is a common pattern), it will get "rendered" many times. This could result in either unexpected results or inefficiencies / regressions.

It'd be wise to add a new bool param to BaseOperator named render_params, set to False by default and only apply where necessary. I personally wouldn't approve this PR without this safeguard.

If double rendering was an issue (I think with render_params it's less of a concern), the DAG object could keep track of object ids (using Python's id(obj)) that have been templated and implement some logic in render_template to avoid re-rendering objects that have been rendered before.

withnale · 2016-05-11T21:44:02Z

If double rendering was an issue (I think with render_params it's less of a concern), the DAG object could keep track of object ids (using Python's id(obj)) that have been templated and implement some logic in render_template to avoid re-rendering objects that have been rendered before.

Yes, I can see that effectively caching values would avoid the potential slowdown of multiple jinja renders. However, would it not be possible to just store the rendered params on first call within a TI and serve this value of subsequent calls rather than delving into the tracking individual ids walking through the params dict?

Also, I'm confused why you'd say it lives in render_template rather than get_template_context. I wouldn't like to treat it as just another field in template_fields, since you would always want the template_fields to be able to make use of the rendered params. Or are you suggesting a separate block in render_templates?

mistercrunch · 2016-05-12T00:39:01Z

well if it's the same object it's already been rendered in place, so all you want to do it to not re-render it . Maintaining a list of object ids that have been rendered would allow to skip double-rendering
get_template_context returns the context used for rendering, it does not render anything. render_templates is the method that renders things, so params should be rendered in there along with everything else.

landscape-bot · 2016-05-24T13:13:29Z

Repository health increased by 0.11% when pulling 69be5e4 on withnale:jinja_task_params into ee24855 on apache:master.

No new problems were introduced.
1 problem was fixed (including 0 errors and 1 code smell).

withnale · 2016-05-24T16:06:01Z

Have updated the existing PR to include a BaseOperator parameter of render_params=False and tested as appropriate in get_template_context(). Still trying to work out whether the natural home of this should be render_templates() or get_template_context().

For context sake, here is an example operator from one of my DAGs. The config and steps params are long YAML files which define our EMR clusters and most of these are very similar hence the need to parameterise. Some of the values passed in via the parameters are dynamic and created from other TaskInstances.

create_cluster = EmrSensor(
    config='resources/emr/cluster/config.yaml',
    steps='resources/emr/steps/steps.yaml',
    params={
        'db_host': "{{ task_instance.xcom_pull(task_ids='create_db') }}",
        'year': "{{ (execution_date - macros.timedelta(days=18)).year }}",
        'month': "{{ (execution_date - macros.timedelta(days=18)).month }}"
    },
    task_id='create_cluster',
    dag=dag,
)

So if implemented in `render_templates()`` are you thinking of something like this (not tested as yet)...?

    def render_templates(self):
        task = self.task
        jinja_context = self.get_template_context()
        if hasattr(self, 'task') and hasattr(self.task, 'dag'):
            if self.task.dag.user_defined_macros:
                jinja_context.update(
                    self.task.dag.user_defined_macros)

        rt = self.task.render_template  # shortcut to method
        if self.render_params:
            jinja_context['params'] = rt('params', jinja_context['params'], jinja_context)

        for attr in task.__class__.template_fields:
            content = getattr(task, attr)
            if content:
                rendered_content = rt(attr, content, jinja_context)
                setattr(task, attr, rendered_content)

I guess it comes down to whether you view that the rendered params are special cases of normal templated variables or whether they are actually now the context for rendering - this was my original view, but after writing out what I think the updated render_templates would look like, it's a little more six of one and half a dozen of the other now. I'm happy to resubmit the PR in whichever form seems best...

NB: It seems that get_template_context() is already being called twice during the execution of TaskInstance.run(). If there is a worry about performance of the templating then perhaps we should make render_templates() accept an optional context to avoid the double call, something like...

        if not mark_success:
            context = self.get_template_context()

            task_copy = copy.copy(task)
            self.task = task_copy

            def signal_handler(signum, frame):
                '''Setting kill signal handler'''
                logging.error("Killing subprocess")
                task_copy.on_kill()
                raise AirflowException("Task received SIGTERM signal")
            signal.signal(signal.SIGTERM, signal_handler)

            self.render_templates(context=context)

    ...

    def render_templates(self, context=None):
        task = self.task
        jinja_context = context or self.get_template_context()

landscape-bot · 2016-05-24T16:07:20Z

Repository health increased by 0.11% when pulling 4025764 on withnale:jinja_task_params into ee24855 on apache:master.

No new problems were introduced.
1 problem was fixed (including 0 errors and 1 code smell).

withnale · 2016-05-27T19:33:04Z

@mistercrunch any chance of an update?

jlowin · 2016-05-30T15:33:03Z

airflow/hooks/hive_hooks.py

@@ -489,7 +489,8 @@ def to_csv(
            schema='default',
            delimiter=',',
            lineterminator='\r\n',
-            output_header=True):
+            output_header=True,
+            fetch_size=1000):


Does this have the potential to break existing implementations of HiveHook, if someone is expecting > 1000 rows? I think this should default to None.

Also, is this related to the PR?

jlowin · 2016-05-30T15:37:27Z

I think this looks good and the explicit render_params option should alleviate the concerns about performance. +1 from me on the params bit but @withnale not sure about the Hive Hook bit.

@mistercrunch are you ok with it?

withnale · 2016-05-31T06:52:48Z

Ah. The hive-hook shouldn't be there. I think I must have included it from someone elses commit when doing a rebase. Will tidy it up and do a force push later today.

codecov-io · 2016-06-03T20:08:15Z

Current coverage is 67.78%

Merging #1488 into master will increase coverage by <.01%

@@             master      #1488   diff @@
==========================================
  Files           116        116          
  Lines          8285       8289     +4   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           5611       5618     +7   
+ Misses         2674       2671     -3   
  Partials          0          0

Powered by Codecov. Last updated by 89edb6f...0d41e15

withnale · 2016-06-03T20:40:48Z

A bit later than planned but the PR has been rebased again to current.

bolkedebruin · 2016-06-07T06:34:15Z

@withnale Thanks for all the hard work! Sorry this one got overlooked a bit. As @mistercrunch felt a bit strongly about this I would like hime to chime in before merging. So I hope you still have a bit of patience with us ;).

msumit · 2016-10-15T12:43:08Z

Hi @withnale.. sorry for long delay in reviewing this PR. If you still wants to pursue this PR, please resolve the conflict and address the concern raised by @mistercrunch. You would also need to change the commit message, so that it doesn't cross the 50 char limit.

We've 1 weeks time before this PR gets marked as Abandoned and gets closed without being merged to master.

….1 (apache#1488) Signed-off-by: Renovate Bot <bot@renovateapp.com> Co-authored-by: Renovate Bot <bot@renovateapp.com>

mistercrunch reviewed May 11, 2016
View reviewed changes

withnale force-pushed the jinja_task_params branch from a8309cf to 69be5e4 Compare May 24, 2016 13:10

withnale force-pushed the jinja_task_params branch from 69be5e4 to 4025764 Compare May 24, 2016 16:04

jlowin reviewed May 30, 2016
View reviewed changes

withnale force-pushed the jinja_task_params branch 2 times, most recently from 0d41e15 to 5548dbb Compare June 3, 2016 19:50

[AIRFLOW-103] Allow jinja templates to be used in task params

d1387b3

withnale force-pushed the jinja_task_params branch from 5548dbb to d1387b3 Compare June 3, 2016 19:50

asfgit closed this in dcef468 Oct 22, 2016

alekstorm pushed a commit to alekstorm/incubator-airflow that referenced this pull request Jun 1, 2017

closes apache#1488 *PR abandonned by submitter*

2a4ac8d

LvffY mentioned this pull request Aug 5, 2020

Unable to log into an Azure blob storage #10191

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AIRFLOW-103] Allow jinja templates to be used in task params #1488

[AIRFLOW-103] Allow jinja templates to be used in task params #1488

withnale commented May 11, 2016

mistercrunch May 11, 2016

mistercrunch Jun 9, 2016 •

edited

Loading

mistercrunch commented May 11, 2016

withnale commented May 11, 2016

mistercrunch commented May 12, 2016

landscape-bot commented May 24, 2016

withnale commented May 24, 2016

landscape-bot commented May 24, 2016

withnale commented May 27, 2016

jlowin May 30, 2016

jlowin May 30, 2016

jlowin commented May 30, 2016 •

edited

Loading

withnale commented May 31, 2016

codecov-io commented Jun 3, 2016 •

edited

Loading

withnale commented Jun 3, 2016

bolkedebruin commented Jun 7, 2016

msumit commented Oct 15, 2016

[AIRFLOW-103] Allow jinja templates to be used in task params #1488

[AIRFLOW-103] Allow jinja templates to be used in task params #1488

Conversation

withnale commented May 11, 2016

mistercrunch May 11, 2016

Choose a reason for hiding this comment

mistercrunch Jun 9, 2016 • edited Loading

Choose a reason for hiding this comment

mistercrunch commented May 11, 2016

withnale commented May 11, 2016

mistercrunch commented May 12, 2016

landscape-bot commented May 24, 2016

withnale commented May 24, 2016

landscape-bot commented May 24, 2016

withnale commented May 27, 2016

jlowin May 30, 2016

Choose a reason for hiding this comment

jlowin May 30, 2016

Choose a reason for hiding this comment

jlowin commented May 30, 2016 • edited Loading

withnale commented May 31, 2016

codecov-io commented Jun 3, 2016 • edited Loading

Current coverage is 67.78%

withnale commented Jun 3, 2016

bolkedebruin commented Jun 7, 2016

msumit commented Oct 15, 2016

mistercrunch Jun 9, 2016 •

edited

Loading

jlowin commented May 30, 2016 •

edited

Loading

codecov-io commented Jun 3, 2016 •

edited

Loading