Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add task parameter to set custom logger name #34964

Merged
merged 26 commits into from
Nov 3, 2023

Conversation

Joffreybvn
Copy link
Contributor

@Joffreybvn Joffreybvn commented Oct 16, 2023

Hello,

This PR implements a new parameter for each Operators and Hooks: logger_name. Goal is to make the logging configuration more granular.

Example: having one task to show all DEBUG, while the others are on WARNING. I can configure two loggers, and assign them on my dags.

Example of SQLExecuteQueryOperator:

SQLExecuteQueryOperator(
    ...,
    logger_name="my.custom.logger",
    hook_params={"logger_name": "my.custom.logger"}
)

Use case:

I am passing data between tasks (typically from APIs to a datawarehouse). I use the SQLOperators and templated jinja to do the inserts. But, huge inserts clogs up the logging view on the UI. Sometimes, the webserver crashes because of it.

I would like to be able to:

  • Enable INFO logging for some tasks, while having it on WARNING for others.
  • For the INFO tasks, I would like multiple handlers:
    • One that keeps only warning and error locally (for the webserver)
    • One that throws everything into Elastic, so that the user can visit Kibana to see everything

This is doable with an advanced logging configuration, and the ability to configure the name of the logger at task-level.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added the area:core-operators Operators, Sensors and hooks within Core Airflow label Oct 16, 2023
@uranusjr
Copy link
Member

I wonder if it’s better to put the logic in LoggingMixin instead.

class LoggingMixin:
    _log: logging.Logger
    _logger_name: str | None = None

    @staticmethod
    def _get_log(...) -> Logger:
        if obj._log is None:
            obj._log = logging.getLogger(
                self._logger_name if self._logger_name is not None
                else f"{clazz.__module__}.{clazz.__name__}"
            )
        ...

class Something(LoggingMixin):
    def __init__(self, ..., logger_name: str | None = None) -> None:
        ...
        self._logger_name = logger_name

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please double check if it works as expected ? I am not sure if it will. IMHO There is a good reason why airflow.task is used as base logger.

In default logger configuration it is configured to use "task" handler which:

  • uses FileTaskHandler that redirects the logs to the right file depending on the tasks
  • if remote logging is configured "task" handler is configured to right appropriate remote logging handdler
  • when task is started, FileTaskhandler's (or appropriate remote task handler) set_context() method is called to set the right task and allow the handler to choose the right file or inject task information.
  • secrets are masked using mask_secrets filter.

Unless I am mistaken, I believe none of those happens when you just override the logger to be "custom.logger" (or use default module/name) and by default operator logs will use the "console" handler stdout of the process that runs them and in many cases logs from multiple tasks will be inter-mixed in the output (for example stdout of the celery worker or logs of scheduler if LocalExcecutor is used)

Also you will have no idea which tasks it comes from because they will have no task information added. Console handler will not use the context information set by local task when the task is run.

This can be mitigated by manually configuring handlers in logging config to use "task" handler but with the current implementation it would require to configure handlers manually for each such operator/package you would like to use and it could potentially interfere with standard scheduler/triggerer/webserver log hndling

I doubt this is the intention of that change, but this is IMHO how it will behave (unless I missed something of course).

I think If you want to do selective logging like this possible, you should (maybe) append the logger information to "airflow.task.operators". This would correctly use "task" handler propagated from the "airflow.task" parent logger and the logs will find its way to the right log file

For example you can use custom logger as "custom_logger" and resulting logger should be:

airflow.task.operators.custom_logger

or in case you set it to None:

airflow.task.operators.airflow.providers.common.sql.hooks.sql.DbApiHook

This however would also require to extend howto for logging and explaining in detail how to get selective logging.

Actually in this case maybe you would not even have to use None as special value. I believe that we could simply use airflow.task.operators.<module>.<name> as default immediately.

Since we have no "child" loggers configured by default for airflow.task logger, it means that logging will be handled by "airflow.task" logger as parent logger, so you would not have to have and handle the None value at all. Simply always apppending . to airflow.task.operators should do the trick.

Still in this case "howto" update explaining on how to configure your loggers to get more selective loggers is absolutely necesssary part of this PR IMHO.

@Joffreybvn
Copy link
Contributor Author

Internally, I indeed only tested custom loggers named like airflow.task.operators.custom_logger, and never tried to change the root. Thanks for pointing out the issues ! Will adapt the PR and add howto docs.

Same about the LoggingMixin, I did not dared to touch this object (because used in so many places !). So, thanks for pointing out !

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic. Pending test succeeding, but it looks great - including the docs. Thanks for being so responsive!

@uranusjr uranusjr changed the title Add parameter for custom logger name in Dag's tasks Add task parameter to set custom logger name Oct 20, 2023
@potiuk
Copy link
Member

potiuk commented Oct 20, 2023

BTW. You will need to rebase to make this one works

@potiuk
Copy link
Member

potiuk commented Oct 20, 2023

Static check fixed in main

@eladkal eladkal added this to the Airflow 2.8.0 milestone Oct 22, 2023
@eladkal eladkal added type:improvement Changelog: Improvements and removed type:improvement Changelog: Improvements labels Oct 22, 2023
It is implicitly created when the object is resumed
@potiuk potiuk merged commit 2bcd450 into apache:main Nov 3, 2023
45 checks passed
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core-operators Operators, Sensors and hooks within Core Airflow type:new-feature Changelog: New Features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants