Skip to content

Conversation

@ashb
Copy link
Member

@ashb ashb commented Jul 24, 2025

Often remote logging is down using automatic instance profiles, but not
always. If you tried to configure a logger by a connection defined in the
metadata DB it would have not worked (it either caused the supervise job to
fail early, or to just behave as if the connection didn't exist, depending on
the hook's behaviour)

Unfortunately, the way of knowing what the default connection ID various hooks
use is not easily discoverable, at least not easily from the outside (we can't
look at remote.hook as for most log providers that would try to load the
connection, failing in the way we are trying to fix) so I updated the log
config module to keep track of what the default conn id is for the modern log
providers.

Once we have the connection ID we know (or at least have a good idea that
we've got the right one) we then pre-emptively check the secrets backends for
it, if not found there load it from the API server, and then either way. if we
find a connection we put it in the env variable so that it is available.

The reason we use this approach, is that are running in the supervisor process
itself, so SUPERVISOR_COMMS is not and cannot be set yet.

Discovered when digging in to #52501 -- it might fix the problem


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

Often remote logging is down using automatic instance profiles, but not
always. If you tried to configure a logger by a connection defined in the
metadata DB it would have not worked (it either caused the supervise job to
fail early, or to just behave as if the connection didn't exist, depending on
the hook's behaviour)

Unfortunately, the way of knowing what the default connection ID various hooks
use is not easily discoverable, at least not easily from the outside (we can't
look at `remote.hook` as for most log providers that would try to load the
connection, failing in the way we are trying to fix) so I updated the log
config module to keep track of what the default conn id is for the modern log
providers.

Once we have the connection ID we know (or at least have a good idea that
we've got the right one) we then pre-emptively check the secrets backends for
it, if not found there load it from the API server, and then either way. if we
find a connection we put it in the env variable so that it is available.

The reason we use this approach, is that are running in the supervisor process
itself, so SUPERVISOR_COMMS is not and cannot be set yet.
@ashb
Copy link
Member Author

ashb commented Jul 24, 2025

This is messier and more complex than I like -- any idea on w a way of cleaning this up greatly appreciated.

@ashb ashb added backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch and removed area:ConfigTemplates labels Jul 24, 2025
@potiuk
Copy link
Member

potiuk commented Jul 24, 2025

This is messier and more complex than I like -- any idea on w a way of cleaning this up greatly appreciated.

We should change remote logging templates for loggers to be part of provider's packages as separate files that should be dropped to the same directory where "generic" airflow config is - and our logging configuration should discover the remote logging file dropped there and read it from there. Such remote logging config that should be copy*pasteable from provider's sources (and docs) should have everything needed for the core to be able to configure it in a generic way.

That should remove the coupling of airlow core from essentially provider feature and make it a bit less messy.
I am assuming your "messy" comment bit here is about precisely this - airflow-core needing to have some provider-specific parts. Maybe there are other "messy" parts that I do not realize :)

Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would see solving this problem in a multiphase way. For unblocking users from immediately using remote logging to load connections and as a short term / mid term fix, I see this solution to be OK.

Over a longer term, I would like to propose and agree with @potiuk's suggestion here, which is asking every provider to have their own logging config files setup to decouple core from knowing what should be present in a provider's logging config, but just knowing how to discover it.

For ex, google could define a providers/google/logging/gcs_remote_logging.yaml with details such as:

logging_schemes:
  - scheme: "gs://"
    handler_class: "airflow.providers.google.cloud.log.gcs_task_handler.GCSRemoteLogIO"
    default_connection_id: "google_cloud_default"
    required_config:
      - "logging.remote_base_log_folder"
      - "logging.google_key_path"

and get done with it. Core can define a discovery mechanism to be ok with it.

For now, I am ok with this PR.

@amoghrajesh
Copy link
Contributor

@potiuk the suggestions from you are thoughts that are relevant more to the process as you said. This PR addresses the fact that the connection to be used still comes from configuration and even with better discovery etc, nothing changes on that part.

@potiuk
Copy link
Member

potiuk commented Jul 25, 2025

Yep. I was just responding to @ashb call to suggestions :) "any idea on w a way of cleaning this up greatly appreciated.". Generally speaking that PR looks good to me as well - besides (as Ash mentioned) being messy :)

@potiuk
Copy link
Member

potiuk commented Jul 25, 2025

And the messiness is because of legacy "embedding" of provider things in core - not because the PR is messy on its own :D

@amoghrajesh amoghrajesh merged commit e4fb686 into apache:main Jul 25, 2025
82 checks passed
@amoghrajesh amoghrajesh deleted the load-remote-logg-conn-from-apiserver branch July 25, 2025 14:26
github-actions bot pushed a commit that referenced this pull request Jul 25, 2025
…he API Server (#53719)

Often remote logging is down using automatic instance profiles, but not
always. If you tried to configure a logger by a connection defined in the
metadata DB it would have not worked (it either caused the supervise job to
fail early, or to just behave as if the connection didn't exist, depending on
the hook's behaviour)

Unfortunately, the way of knowing what the default connection ID various hooks
use is not easily discoverable, at least not easily from the outside (we can't
look at `remote.hook` as for most log providers that would try to load the
connection, failing in the way we are trying to fix) so I updated the log
config module to keep track of what the default conn id is for the modern log
providers.

Once we have the connection ID we know (or at least have a good idea that
we've got the right one) we then pre-emptively check the secrets backends for
it, if not found there load it from the API server, and then either way. if we
find a connection we put it in the env variable so that it is available.

The reason we use this approach, is that are running in the supervisor process
itself, so SUPERVISOR_COMMS is not and cannot be set yet.
(cherry picked from commit e4fb686)

Co-authored-by: Ash Berlin-Taylor <ash@apache.org>
@github-actions
Copy link

Backport successfully created: v3-0-test

Status Branch Result
v3-0-test PR Link

ashb added a commit that referenced this pull request Jul 29, 2025
…he API Server (#53719)

Often remote logging is down using automatic instance profiles, but not
always. If you tried to configure a logger by a connection defined in the
metadata DB it would have not worked (it either caused the supervise job to
fail early, or to just behave as if the connection didn't exist, depending on
the hook's behaviour)

Unfortunately, the way of knowing what the default connection ID various hooks
use is not easily discoverable, at least not easily from the outside (we can't
look at `remote.hook` as for most log providers that would try to load the
connection, failing in the way we are trying to fix) so I updated the log
config module to keep track of what the default conn id is for the modern log
providers.

Once we have the connection ID we know (or at least have a good idea that
we've got the right one) we then pre-emptively check the secrets backends for
it, if not found there load it from the API server, and then either way. if we
find a connection we put it in the env variable so that it is available.

The reason we use this approach, is that are running in the supervisor process
itself, so SUPERVISOR_COMMS is not and cannot be set yet.
(cherry picked from commit e4fb686)

Co-authored-by: Ash Berlin-Taylor <ash@apache.org>
ashb added a commit that referenced this pull request Jul 29, 2025
…he API Server (#53719) (#53761)

Often remote logging is down using automatic instance profiles, but not
always. If you tried to configure a logger by a connection defined in the
metadata DB it would have not worked (it either caused the supervise job to
fail early, or to just behave as if the connection didn't exist, depending on
the hook's behaviour)

Unfortunately, the way of knowing what the default connection ID various hooks
use is not easily discoverable, at least not easily from the outside (we can't
look at `remote.hook` as for most log providers that would try to load the
connection, failing in the way we are trying to fix) so I updated the log
config module to keep track of what the default conn id is for the modern log
providers.

Once we have the connection ID we know (or at least have a good idea that
we've got the right one) we then pre-emptively check the secrets backends for
it, if not found there load it from the API server, and then either way. if we
find a connection we put it in the env variable so that it is available.

The reason we use this approach, is that are running in the supervisor process
itself, so SUPERVISOR_COMMS is not and cannot be set yet.
(cherry picked from commit e4fb686)

Co-authored-by: Ash Berlin-Taylor <ash@apache.org>
@potiuk potiuk linked an issue Aug 7, 2025 that may be closed by this pull request
2 tasks
ferruzzi pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Aug 7, 2025
apache#53719)

Often remote logging is down using automatic instance profiles, but not
always. If you tried to configure a logger by a connection defined in the
metadata DB it would have not worked (it either caused the supervise job to
fail early, or to just behave as if the connection didn't exist, depending on
the hook's behaviour)

Unfortunately, the way of knowing what the default connection ID various hooks
use is not easily discoverable, at least not easily from the outside (we can't
look at `remote.hook` as for most log providers that would try to load the
connection, failing in the way we are trying to fix) so I updated the log
config module to keep track of what the default conn id is for the modern log
providers.

Once we have the connection ID we know (or at least have a good idea that
we've got the right one) we then pre-emptively check the secrets backends for
it, if not found there load it from the API server, and then either way. if we
find a connection we put it in the env variable so that it is available.

The reason we use this approach, is that are running in the supervisor process
itself, so SUPERVISOR_COMMS is not and cannot be set yet.
fweilun pushed a commit to fweilun/airflow that referenced this pull request Aug 11, 2025
apache#53719)

Often remote logging is down using automatic instance profiles, but not
always. If you tried to configure a logger by a connection defined in the
metadata DB it would have not worked (it either caused the supervise job to
fail early, or to just behave as if the connection didn't exist, depending on
the hook's behaviour)

Unfortunately, the way of knowing what the default connection ID various hooks
use is not easily discoverable, at least not easily from the outside (we can't
look at `remote.hook` as for most log providers that would try to load the
connection, failing in the way we are trying to fix) so I updated the log
config module to keep track of what the default conn id is for the modern log
providers.

Once we have the connection ID we know (or at least have a good idea that
we've got the right one) we then pre-emptively check the secrets backends for
it, if not found there load it from the API server, and then either way. if we
find a connection we put it in the env variable so that it is available.

The reason we use this approach, is that are running in the supervisor process
itself, so SUPERVISOR_COMMS is not and cannot be set yet.
@ashb ashb added this to the Airflow 3.0.4 milestone Aug 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:logging area:task-sdk backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remote Logging to Azure Blob Storage Broke in Airflow 3.0

3 participants