Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging handler TaskHandlerWithCustomFormatter adds prefix three times #35622

Closed
1 of 2 tasks
Aakcht opened this issue Nov 14, 2023 · 6 comments · Fixed by #38502
Closed
1 of 2 tasks

Logging handler TaskHandlerWithCustomFormatter adds prefix three times #35622

Aakcht opened this issue Nov 14, 2023 · 6 comments · Fixed by #38502
Assignees
Labels

Comments

@Aakcht
Copy link
Contributor

Aakcht commented Nov 14, 2023

Apache Airflow version

2.7.3

What happened

I tried using airflow.utils.log.task_handler_with_custom_formatter.TaskHandlerWithCustomFormatter handler for task logging and noticed that the prefix specified in task_log_prefix_template is added three times.

What you think should happen instead

The prefix should be added only one time.

How to reproduce

At first I encountered this issue in Kubernetes deployment with custom logging configuration, but it is also reproduced locally with the default configuration using airflow standalone command:

  1. Copy logging template from https://github.com/apache/airflow/blob/main/airflow/config_templates/airflow_local_settings.py and place it to $AIRFLOW_HOME/config/custom_logging_config.py

  2. In the logging template change task handler to

        "task": {
            "class": "airflow.utils.log.task_handler_with_custom_formatter.TaskHandlerWithCustomFormatter",
            "formatter": "airflow",
            "filters": ["mask_secrets"],
            "stream": "ext://sys.stdout",
        },
  1. In airflow.cfg set:
[logging]
...
logging_config_class = custom_logging_config.DEFAULT_LOGGING_CONFIG
...
task_log_prefix_template = {{ ti.dag_id }}-{{ ti.task_id }}-{{ execution_date }}-{{ try_number }}
  1. Run airflow standalone and start any DAG in the UI.

  2. Check task logs in the terminal, for example:

scheduler  | example_xcom-puller-2023-11-14T07:19:15.633728+00:00-:example_xcom-puller-2023-11-14T07:19:15.633728+00:00-:example_xcom-puller-2023-11-14T07:19:15.633728+00:00-:[2023-11-14 10:19:33,087] {taskinstance.py:1159} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: example_xcom.puller manual__2023-11-14T07:19:15.633728+00:00 [queued]>
scheduler  | example_xcom-puller-2023-11-14T07:19:15.633728+00:00-:example_xcom-puller-2023-11-14T07:19:15.633728+00:00-:example_xcom-puller-2023-11-14T07:19:15.633728+00:00-:[2023-11-14 10:19:33,087] {taskinstance.py:1361} INFO - Starting attempt 1 of 1

Operating System

Debian "11 (Bullseye)" or Ubuntu

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

Can be reproduced both locally and using airflow helm chart.

Anything else

I found this issue #10202 , but it looks like it was about airflow 1.

I wouldn't mind submitting a PR for this issue, but I'm not sure how to fix it except for adding a check in https://github.com/apache/airflow/blob/main/airflow/utils/log/task_handler_with_custom_formatter.py#L37 if the context was already set. But this doesn't look like correct approach.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Aakcht Aakcht added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Nov 14, 2023
@Aakcht Aakcht changed the title Logger TaskHandlerWithCustomFormatter adds prefix three times Logging handler TaskHandlerWithCustomFormatter adds prefix three times Nov 14, 2023
@Taragolis
Copy link
Contributor

@Aakcht Sorry for the late response. Is it still a case now, maybe you found a solution?
Asked because that I do not know that we have this task handler, and only one reference into the documentation about configuration prefix, and that is all: https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#task-log-prefix-template 😢

As I could see this handler initially added as part of Airflow Contrib (deprecated in Airflow 2.0)

And after migration to the Airflow 2.0 it moved into the core: #9395

Unfortunetly after that I can't find any significant changes in it so this might be a signal that it is broken for a long time or even abandoned

@Taragolis Taragolis added pending-response and removed needs-triage label for new issues that we didn't triage yet labels Feb 19, 2024
@Aakcht
Copy link
Contributor Author

Aakcht commented Feb 20, 2024

Hi, @Taragolis ! Just retested it on Airflow 2.8.1 version - the issue is still there. Haven't gotten around to trying to figure out what the problem is yet , so no solution for now 🙁. I guess the idea from my original comment about checking if a context is already set in TaskHandlerWithCustomFormatter.set_context method would work, but this approach doesn't look correct .

@Taragolis
Copy link
Contributor

The most I can do now is mark it as Good First Issue, and some one (maybe you?) could invest their time a investigate is possible to fix that, and if so it would be nice also create some documentation about this handler.

@TiDeane
Copy link
Contributor

TiDeane commented Mar 4, 2024

Hello, can I be assigned to this?

I've replicated the bug in version 2.8.2, and @Aakcht's suggestion of checking if the context was already set in the TaskHandlerWithCustomFormatter.set_context method really did fix it, but I'd like to take some time to try and find out why it's being called multiple times in the first place and fix it that way.

@brki
Copy link
Contributor

brki commented Mar 22, 2024

@TiDeane If you're working on this, perhaps you can make a related doc fix too? I noticed that the example given for https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#task-log-prefix-template does not work.

Given example is:

{ti.dag_id}-{ti.task_id}-{execution_date}-{try_number}

That doesn't work (it results in no prefix being added). Corrected, it would be:

{{ti.dag_id}}-{{ti.task_id}}-{{execution_date}}-{{ti.try_number}}

@TiDeane
Copy link
Contributor

TiDeane commented Mar 23, 2024

@brki I am, and thank you for the suggestion! I'll update the docs along with the bugfix.

A little progress update for anyone interested: from the testing I've done, it seems like the bug is caused because TaskHandlerWithCustomFormatter's set_context is called multiple times from logging_mixin’s set_context. Also, I'm not sure if it's intentional, but some TaskHandlerWithCustomFormatter's have their Formatter as a TimezoneAware object and each time they still add another prefix. The number of prefixes increases with each call, but caps at 3 (subsequent calls don't add more prefixes).

It might just be that the simplest and most efficient solution would be to simply check if the context has already been set at the beginning of TaskHandlerWithCustomFormatter's set_context, like was initially suggested. Any alternatives I can think of would either require changing the handlers' structure or slow down execution for every other handler.

By adding a simple if self.prefix_jinja_template is not None: return at the beginning of the TaskHandlerWithCustomFormatter's set_context method the bug is completely fixed, and something similar is done by the ElasticSearch es_task_handler as well: https://github.com/apache/airflow/blob/main/airflow/providers/elasticsearch/log/es_task_handler.py#L443.

TiDeane added a commit to TiDeane/airflow that referenced this issue Mar 25, 2024
… once

When using the TaskHandlerWithCustomFormatter to add a prefix to logs, it was previously adding the prefix multiple times. This happened because it was being called multiple times from logging_mixin.py, and worsened because even when the handler's formatter was a TimezoneAware formatter (to include UTC offset), it was still adding an additional prefix. Because of this, I felt that any solution outside of the TaskHandlerWithCustomFormatter itself would either require a restructuring of the handlers' structure or slow down execution for all other handlers. And so, the solution I settled on was to add to TaskHandlerWithCustomFormatter's initial 'if' statement a simple 'or self.prefix_jinja_template is not None', so that it returns if the prefix had already been set. This is similar to what is done by the ElasticSearch es_task_handler.py.

Note: also fixed the documentation's example for the handler, as the previous one was incorrect and didn't work.
TiDeane added a commit to TiDeane/airflow that referenced this issue Apr 30, 2024
… once

When using the TaskHandlerWithCustomFormatter to add a prefix to logs, it was previously adding the prefix multiple times. This happened because it was being called multiple times from logging_mixin.py, and worsened because even when the handler's formatter was a TimezoneAware formatter (to include UTC offset), it was still adding an additional prefix. Because of this, I felt that any solution outside of the TaskHandlerWithCustomFormatter itself would either require a restructuring of the handlers' structure or slow down execution for all other handlers. And so, the solution I settled on was to add to TaskHandlerWithCustomFormatter's initial 'if' statement a simple 'or self.prefix_jinja_template is not None', so that it returns if the prefix had already been set. This is similar to what is done by the ElasticSearch es_task_handler.py.

Note: also fixed the documentation's example for the handler, as the previous one was incorrect and didn't work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants