-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add task parameter to set custom logger name #34964
Conversation
e1c63f1
to
f1a2242
Compare
I wonder if it’s better to put the logic in class LoggingMixin:
_log: logging.Logger
_logger_name: str | None = None
@staticmethod
def _get_log(...) -> Logger:
if obj._log is None:
obj._log = logging.getLogger(
self._logger_name if self._logger_name is not None
else f"{clazz.__module__}.{clazz.__name__}"
)
...
class Something(LoggingMixin):
def __init__(self, ..., logger_name: str | None = None) -> None:
...
self._logger_name = logger_name |
21b7400
to
5d90d13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please double check if it works as expected ? I am not sure if it will. IMHO There is a good reason why airflow.task
is used as base logger.
In default logger configuration it is configured to use "task" handler which:
- uses FileTaskHandler that redirects the logs to the right file depending on the tasks
- if remote logging is configured "task" handler is configured to right appropriate remote logging handdler
- when task is started, FileTaskhandler's (or appropriate remote task handler)
set_context()
method is called to set the right task and allow the handler to choose the right file or inject task information. - secrets are masked using mask_secrets filter.
Unless I am mistaken, I believe none of those happens when you just override the logger to be "custom.logger" (or use default module/name) and by default operator logs will use the "console" handler stdout of the process that runs them and in many cases logs from multiple tasks will be inter-mixed in the output (for example stdout of the celery worker or logs of scheduler if LocalExcecutor is used)
Also you will have no idea which tasks it comes from because they will have no task information added. Console handler will not use the context information set by local task when the task is run.
This can be mitigated by manually configuring handlers in logging config to use "task" handler but with the current implementation it would require to configure handlers manually for each such operator/package you would like to use and it could potentially interfere with standard scheduler/triggerer/webserver log hndling
I doubt this is the intention of that change, but this is IMHO how it will behave (unless I missed something of course).
I think If you want to do selective logging like this possible, you should (maybe) append the logger information to "airflow.task.operators". This would correctly use "task" handler propagated from the "airflow.task" parent logger and the logs will find its way to the right log file
For example you can use custom logger as "custom_logger" and resulting logger should be:
airflow.task.operators.custom_logger
or in case you set it to None:
airflow.task.operators.airflow.providers.common.sql.hooks.sql.DbApiHook
This however would also require to extend howto for logging and explaining in detail how to get selective logging.
Actually in this case maybe you would not even have to use None
as special value. I believe that we could simply use airflow.task.operators.<module>.<name>
as default immediately.
Since we have no "child" loggers configured by default for airflow.task
logger, it means that logging will be handled by "airflow.task" logger as parent logger, so you would not have to have and handle the None value at all. Simply always apppending . to airflow.task.operators
should do the trick.
Still in this case "howto" update explaining on how to configure your loggers to get more selective loggers is absolutely necesssary part of this PR IMHO.
Internally, I indeed only tested custom loggers named like Same about the LoggingMixin, I did not dared to touch this object (because used in so many places !). So, thanks for pointing out ! |
2f45c35
to
a870316
Compare
a870316
to
e05db71
Compare
5ee2b5b
to
6656ae2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic. Pending test succeeding, but it looks great - including the docs. Thanks for being so responsive!
6530ee4
to
45b668a
Compare
BTW. You will need to rebase to make this one works |
4bcec7d
to
72a0645
Compare
Static check fixed in main |
72a0645
to
3dc17ca
Compare
…_config_logger_name`
It is implicitly created when the object is resumed
feafee1
to
ded2f91
Compare
Hello,
This PR implements a new parameter for each Operators and Hooks:
logger_name
. Goal is to make the logging configuration more granular.Example: having one task to show all DEBUG, while the others are on WARNING. I can configure two loggers, and assign them on my dags.
Example of SQLExecuteQueryOperator:
Use case:
I am passing data between tasks (typically from APIs to a datawarehouse). I use the SQLOperators and templated jinja to do the inserts. But, huge inserts clogs up the logging view on the UI. Sometimes, the webserver crashes because of it.
I would like to be able to:
This is doable with an advanced logging configuration, and the ability to configure the name of the logger at task-level.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.