Skip to content

Conversation

@ashb
Copy link
Member

@ashb ashb commented Mar 28, 2025

Previously this feature was built on top of the stdlib logging.Handler
interface, and it worked but had a few issues (even before we switched to
structlog for Task SDK):

  • we had to use what is in many ways a hack with the "set_context" to get
    information down in to the task handler.
  • Discovering of the configured task handler was somewhat baroque
  • The whole thing is just complex due to the features of stdlib logging
    (loggers, propagate, handler levels etc etc.)
  • The upload was triggered somewhat "automatically" inside close, which from
    an abstraction point of view is messy.

This changes things to have a more explicit interface purpose made for
uploading task log files and for reading them, and perhaps more crucially for
things like CloudWatch Logs, it (re)adds the ability to install a structlog
processor that will recieve every log message as it happens.

The return types for the read et al functions were confusing the living
daylights out of me, so I've created type alias to give the return types
explicit names to reduce (my) confusion.

@boring-cyborg boring-cyborg bot added area:logging area:providers area:task-sdk provider:amazon AWS/Amazon - related issues provider:google Google (including GCP) related issues provider:microsoft-azure Azure-related issues labels Mar 28, 2025
@ashb ashb changed the title Rework remote task log handleing for the structlog era. 🚧 Rework remote task log handleing for the structlog era. Mar 28, 2025
@ashb ashb changed the title 🚧 Rework remote task log handleing for the structlog era. 🚧 Rework remote task log handling for the structlog era. Mar 28, 2025
@ashb

This comment was marked as outdated.

@ashb ashb force-pushed the overhaul-task-logging branch 2 times, most recently from e25bd96 to e43c7bb Compare March 31, 2025 15:48
@ashb ashb changed the title 🚧 Rework remote task log handling for the structlog era. Rework remote task log handling for the structlog era. Mar 31, 2025
@ashb ashb marked this pull request as ready for review March 31, 2025 15:49
@ashb ashb requested a review from jedcunningham as a code owner March 31, 2025 16:33
Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1
Some nits

@eladkal
Copy link
Contributor

eladkal commented Apr 1, 2025

Does this affect elasticsearch / opensearch logging? asking because I don't see changes in this PR for these providers

@ashb
Copy link
Member Author

ashb commented Apr 1, 2025

Does this affect elasticsearch / opensearch logging? asking because I don't see changes in this PR for these providers

Oh yes, I forgot to deal with this. My plan for now is to make it fall back to the existing airflow.task handler.

(I also really don't like we have two almost identical providers for ES and OS. Thanks a lot Elastic.

@ashb ashb force-pushed the overhaul-task-logging branch from b6cea26 to 1d0c3eb Compare April 1, 2025 09:26
@ashb ashb force-pushed the overhaul-task-logging branch 3 times, most recently from 6600ce2 to 6c2a9f3 Compare April 2, 2025 08:47
@eladkal

This comment was marked as resolved.

@ashb ashb force-pushed the overhaul-task-logging branch 2 times, most recently from 5baf022 to 062fb4b Compare April 2, 2025 12:18
Previously this feature was built on top of the stdlib logging.Handler
interface, and it worked but had a few issues (even before we switched to
structlog for Task SDK):

- we had to use what is in many ways a hack with the "set_context" to get
  information down in to the task handler.
- Discovering of the configured task handler was somewhat baroque
- The whole thing is just complex due to the features of stdlib logging
  (loggers, propagate, handler levels etc etc.)
- The upload was triggered somewhat "automatically" inside close, which from
  an abstraction point of view is messy.

This changes things to have a more explicit interface purpose made for
uploading task log files and for reading them, and perhaps more crucially for
things like CloudWatch Logs, it (re)adds the ability to install a structlog
processor that will receive every log message as it happens.

The return types for the read et al functions were confusing the living
daylights out of me, so I've created type alias to give the return types
explicit names to reduce (my) confusion.
@ashb ashb force-pushed the overhaul-task-logging branch from 062fb4b to 6191c13 Compare April 2, 2025 12:44
@eladkal
Copy link
Contributor

eladkal commented Apr 2, 2025

We are missing handling for HdfsTaskHandler but I believe this is a niche one and we can do it later. created #48685 to followup

@ashb
Copy link
Member Author

ashb commented Apr 2, 2025

mypy providers failure is fixed in #48686

@ashb
Copy link
Member Author

ashb commented Apr 2, 2025

I haven't changed deps, everything but lowest provider deps is passing, so I'm merging this now.

@ashb ashb merged commit c1088b6 into apache:main Apr 2, 2025
86 of 87 checks passed
@ashb ashb deleted the overhaul-task-logging branch April 2, 2025 14:10
@ashb ashb linked an issue Apr 2, 2025 that may be closed by this pull request
2 tasks
@jason810496
Copy link
Member

We are missing handling for HdfsTaskHandler but I believe this is a niche one and we can do it later. created #48685 to followup

I think ElasticsearchTaskHandler and RedisTaskHandler are missing in this PR. I will create separate issues to track them.

@ashb
Copy link
Member Author

ashb commented Apr 3, 2025

@jason810496 ES and OS are a bit of a mess tbh. They are entirely too specialized, and 95% of that isn't needed anymore now that Task SDK writes out JSON logs natively.

@eladkal
Copy link
Contributor

eladkal commented Apr 3, 2025

We can decide that these two providers (Elastic, open search) would be Airflow 3+ compatible from next release.
they get code changes very rarely anyway. If it simplify stuff we can do that.

nailo2c pushed a commit to nailo2c/airflow that referenced this pull request Apr 4, 2025
Previously this feature was built on top of the stdlib logging.Handler
interface, and it worked but had a few issues (even before we switched to
structlog for Task SDK):

- we had to use what is in many ways a hack with the "set_context" to get
  information down in to the task handler.
- Discovering of the configured task handler was somewhat baroque
- The whole thing is just complex due to the features of stdlib logging
  (loggers, propagate, handler levels etc etc.)
- The upload was triggered somewhat "automatically" inside close, which from
  an abstraction point of view is messy.

This changes things to have a more explicit interface purpose made for
uploading task log files and for reading them, and perhaps more crucially for
things like CloudWatch Logs, it (re)adds the ability to install a structlog
processor that will receive every log message as it happens.

The return types for the read et al functions were confusing the living
daylights out of me, so I've created type alias to give the return types
explicit names to reduce (my) confusion.
diogotrodrigues pushed a commit to diogotrodrigues/airflow that referenced this pull request Apr 6, 2025
Previously this feature was built on top of the stdlib logging.Handler
interface, and it worked but had a few issues (even before we switched to
structlog for Task SDK):

- we had to use what is in many ways a hack with the "set_context" to get
  information down in to the task handler.
- Discovering of the configured task handler was somewhat baroque
- The whole thing is just complex due to the features of stdlib logging
  (loggers, propagate, handler levels etc etc.)
- The upload was triggered somewhat "automatically" inside close, which from
  an abstraction point of view is messy.

This changes things to have a more explicit interface purpose made for
uploading task log files and for reading them, and perhaps more crucially for
things like CloudWatch Logs, it (re)adds the ability to install a structlog
processor that will receive every log message as it happens.

The return types for the read et al functions were confusing the living
daylights out of me, so I've created type alias to give the return types
explicit names to reduce (my) confusion.
simonprydden pushed a commit to simonprydden/airflow that referenced this pull request Apr 8, 2025
Previously this feature was built on top of the stdlib logging.Handler
interface, and it worked but had a few issues (even before we switched to
structlog for Task SDK):

- we had to use what is in many ways a hack with the "set_context" to get
  information down in to the task handler.
- Discovering of the configured task handler was somewhat baroque
- The whole thing is just complex due to the features of stdlib logging
  (loggers, propagate, handler levels etc etc.)
- The upload was triggered somewhat "automatically" inside close, which from
  an abstraction point of view is messy.

This changes things to have a more explicit interface purpose made for
uploading task log files and for reading them, and perhaps more crucially for
things like CloudWatch Logs, it (re)adds the ability to install a structlog
processor that will receive every log message as it happens.

The return types for the read et al functions were confusing the living
daylights out of me, so I've created type alias to give the return types
explicit names to reduce (my) confusion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:logging area:providers area:task-sdk provider:amazon AWS/Amazon - related issues provider:google Google (including GCP) related issues provider:microsoft-azure Azure-related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Task SDK cannot upload remote logs due to missing handlers

8 participants