-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Rework remote task log handling for the structlog era. #48491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
e25bd96 to
e43c7bb
Compare
amoghrajesh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
Some nits
providers/amazon/src/airflow/providers/amazon/aws/log/cloudwatch_task_handler.py
Show resolved
Hide resolved
providers/amazon/tests/unit/amazon/aws/log/test_cloudwatch_task_handler.py
Show resolved
Hide resolved
|
Does this affect elasticsearch / opensearch logging? asking because I don't see changes in this PR for these providers |
Oh yes, I forgot to deal with this. My plan for now is to make it fall back to the existing airflow.task handler. (I also really don't like we have two almost identical providers for ES and OS. Thanks a lot Elastic. |
b6cea26 to
1d0c3eb
Compare
6600ce2 to
6c2a9f3
Compare
This comment was marked as resolved.
This comment was marked as resolved.
5baf022 to
062fb4b
Compare
Previously this feature was built on top of the stdlib logging.Handler interface, and it worked but had a few issues (even before we switched to structlog for Task SDK): - we had to use what is in many ways a hack with the "set_context" to get information down in to the task handler. - Discovering of the configured task handler was somewhat baroque - The whole thing is just complex due to the features of stdlib logging (loggers, propagate, handler levels etc etc.) - The upload was triggered somewhat "automatically" inside close, which from an abstraction point of view is messy. This changes things to have a more explicit interface purpose made for uploading task log files and for reading them, and perhaps more crucially for things like CloudWatch Logs, it (re)adds the ability to install a structlog processor that will receive every log message as it happens. The return types for the read et al functions were confusing the living daylights out of me, so I've created type alias to give the return types explicit names to reduce (my) confusion.
062fb4b to
6191c13
Compare
|
We are missing handling for HdfsTaskHandler but I believe this is a niche one and we can do it later. created #48685 to followup |
|
mypy providers failure is fixed in #48686 |
|
I haven't changed deps, everything but lowest provider deps is passing, so I'm merging this now. |
providers/microsoft/azure/src/airflow/providers/microsoft/azure/log/wasb_task_handler.py
Show resolved
Hide resolved
I think |
|
@jason810496 ES and OS are a bit of a mess tbh. They are entirely too specialized, and 95% of that isn't needed anymore now that Task SDK writes out JSON logs natively. |
|
We can decide that these two providers (Elastic, open search) would be Airflow 3+ compatible from next release. |
Previously this feature was built on top of the stdlib logging.Handler interface, and it worked but had a few issues (even before we switched to structlog for Task SDK): - we had to use what is in many ways a hack with the "set_context" to get information down in to the task handler. - Discovering of the configured task handler was somewhat baroque - The whole thing is just complex due to the features of stdlib logging (loggers, propagate, handler levels etc etc.) - The upload was triggered somewhat "automatically" inside close, which from an abstraction point of view is messy. This changes things to have a more explicit interface purpose made for uploading task log files and for reading them, and perhaps more crucially for things like CloudWatch Logs, it (re)adds the ability to install a structlog processor that will receive every log message as it happens. The return types for the read et al functions were confusing the living daylights out of me, so I've created type alias to give the return types explicit names to reduce (my) confusion.
Previously this feature was built on top of the stdlib logging.Handler interface, and it worked but had a few issues (even before we switched to structlog for Task SDK): - we had to use what is in many ways a hack with the "set_context" to get information down in to the task handler. - Discovering of the configured task handler was somewhat baroque - The whole thing is just complex due to the features of stdlib logging (loggers, propagate, handler levels etc etc.) - The upload was triggered somewhat "automatically" inside close, which from an abstraction point of view is messy. This changes things to have a more explicit interface purpose made for uploading task log files and for reading them, and perhaps more crucially for things like CloudWatch Logs, it (re)adds the ability to install a structlog processor that will receive every log message as it happens. The return types for the read et al functions were confusing the living daylights out of me, so I've created type alias to give the return types explicit names to reduce (my) confusion.
Previously this feature was built on top of the stdlib logging.Handler interface, and it worked but had a few issues (even before we switched to structlog for Task SDK): - we had to use what is in many ways a hack with the "set_context" to get information down in to the task handler. - Discovering of the configured task handler was somewhat baroque - The whole thing is just complex due to the features of stdlib logging (loggers, propagate, handler levels etc etc.) - The upload was triggered somewhat "automatically" inside close, which from an abstraction point of view is messy. This changes things to have a more explicit interface purpose made for uploading task log files and for reading them, and perhaps more crucially for things like CloudWatch Logs, it (re)adds the ability to install a structlog processor that will receive every log message as it happens. The return types for the read et al functions were confusing the living daylights out of me, so I've created type alias to give the return types explicit names to reduce (my) confusion.
Previously this feature was built on top of the stdlib logging.Handler
interface, and it worked but had a few issues (even before we switched to
structlog for Task SDK):
information down in to the task handler.
(loggers, propagate, handler levels etc etc.)
an abstraction point of view is messy.
This changes things to have a more explicit interface purpose made for
uploading task log files and for reading them, and perhaps more crucially for
things like CloudWatch Logs, it (re)adds the ability to install a structlog
processor that will recieve every log message as it happens.
The return types for the read et al functions were confusing the living
daylights out of me, so I've created type alias to give the return types
explicit names to reduce (my) confusion.