Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jinja Substitution Within a HiveOperator #257

Closed
Vandiver247 opened this issue Aug 13, 2015 · 6 comments
Closed

Jinja Substitution Within a HiveOperator #257

Vandiver247 opened this issue Aug 13, 2015 · 6 comments

Comments

@Vandiver247
Copy link

I'm sorry if this is a obvious question since I am not really familiar with Jinja and the documentation has me lost. The HiveOperator allows you to substitute hiveconf templating to jinja templating. What I cant figure out is how to define the jinja template variables within the DAGs context. So for instance if I have the hiveconf variable ${var} which gets translated to {{ var }} how do I give {{ var }} a value? I know I can pass template variables to operators using params, however, if I then go into the hql and change the variable names to ${params.var} then the hiveconf to jinja translation fails completely.

@mistercrunch
Copy link
Member

@Vandiver247 , you hit something that is not very well documented/intuitive.

The key is to pass a dictionary with your parameters to the user_defined_macros parameter of the DAG constructor. These will be in the main namespace of all templates for that DAG.

dag DAG(dag_id="foo", user_defined_macros={'var': 'hello'})
enables

{{ var }} and ${var}

Let me know if it doesn't work.

@mistercrunch
Copy link
Member

@Vandiver247
Copy link
Author

Ok. That seems to work for passing most things. However, I'm still running into a problem. Basically, what I'm trying to do is set one of the variables to the execution date which is currently the default macro

{{ ds }}

However, I need the timestamp in a different format so I really want to use something like

{{ execution_date.strftime('%H:%M:%S') }}

When I do this outside of the DAG declaration it works fine returning me the string in the format I want. However, if I attempt to access the macro in the user_defined_macros section, it instead returns the following string

"{{ execution_date.strftime('%H:%M:%S') }}"

In other words, this works:

procTm = '''{{ execution_date.strftime('%H:%M:%S') }}'''

But this does not:

dag = DAG('extractData',
    default_args=default_args,
    schedule_interval=timedelta(days=1),
    start_date=datetime(2016, 6, 3, 00, 49),
    end_date=datetime(2020, 8, 31, 23),
    user_defined_macros={
        'procDt': '''{{ execution_date.strftime('%H:%M:%S') }}''',
    },
    )

I assume that this is due to how Jinja works when processing the context. That said, is there a way for me to either make this work, or to access the current execution date outside of macros? Finally, one other question. Where are the default variables like {{ ds }} set for Airflow. I've been digging through the code and can't seem to find them.

Thanks for the help.

@mistercrunch
Copy link
Member

Here is where the context object is created:
https://github.com/airbnb/airflow/blob/master/airflow/models.py#L914

Jinja doesn't do a multi-pass rendering, so when it resolves {{ procDt}} is replaces it with '{{ execution_date.strftime('%H:%M:%S') }}' and this is what you end up with. If we were to do another pass with jinja it would resolve as you seemed to expect, but we don't.

So clearly {{ execution_date.strftime('%H:%M:%S') }} just works and that's fine, but what if you did want to apply a complex function that is not exposed as the object's method? Here's an example of that using a lambda, but you could have arbitrarily complex things instead:

dag = DAG('extractData',
    default_args=default_args,
    schedule_interval=timedelta(days=1),
    start_date=datetime(2016, 6, 3, 00, 49),
    end_date=datetime(2020, 8, 31, 23),
    user_defined_macros={
        'hms_formatter': lambda dt: dt.strftime('%H:%M:%S'),
        'macro_lib': my_gigantic_macro_library_and_more,
    },
)

and in the template:
{{ hms_formatter(execution_date) }}

@mistercrunch
Copy link
Member

This commit clarifies some of this in the docs
966d7f0

@Vandiver247
Copy link
Author

Thank you! I appreciate the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants