Skip to content

Conversation

@dheerajturaga
Copy link
Member

@dheerajturaga dheerajturaga commented Oct 29, 2025

The dag_processor was unable to retrieve connections from the database,
causing GitHook (and other hooks) to fail with:
AirflowNotFoundException: The conn_id <conn_id> isn't defined

Root cause: DagProcessorManager was running in FALLBACK context, which only
loads EnvironmentVariablesBackend, not MetastoreBackend. This meant
connections stored in the database were inaccessible.

The DagProcessorManager (parent process) needs database access for connection
retrieval during bundle initialization (e.g., GitDagBundle.__init__GitHook
needs git credentials). Child DagFileProcessorProcess instances run user code
and should remain isolated from direct database access.

This ensures correct secrets backend chains (when no external secrets backend is configured):

  • Manager (parent): EnvironmentVariablesBackendMetastoreBackend (database access)
  • Parser (child): EnvironmentVariablesBackend

Note: This is temporary until AIP-92 removes direct DB access from DagProcessorManager.
Long-term, the manager should use the Execution API instead of direct database access.

Affects: DAG bundle processing with GitHook and any other hooks that rely on
database-stored connections during bundle initialization in the manager process.

@dheerajturaga
Copy link
Member Author

cc: @kaxil this is a critical bug and we should aim for 3.1.2

@kaxil kaxil added this to the Airflow 3.1.2 milestone Oct 30, 2025
…ontext

  The dag_processor was unable to retrieve connections from the database,
  causing GitHook (and other hooks) to fail with:
    AirflowNotFoundException: The conn_id `<conn_id>` isn't defined

  Root cause: dag_processor was running in FALLBACK context, which only
  loads EnvironmentVariablesBackend, not MetastoreBackend. This meant
  connections stored in the database were inaccessible.

  The dag_processor is a server-side component that needs database access
  for connection retrieval, similar to the scheduler and API server.

  Fix: Set _AIRFLOW_PROCESS_CONTEXT=server in DagProcessorJobRunner._execute()
  to enable MetastoreBackend, matching the pattern already used in:
  - SchedulerJobRunner (scheduler_job_runner.py:1064)
  - API FastAPI server (api_fastapi/main.py:24)

  This ensures the dag_processor uses the correct secrets backend chain:
    EnvironmentVariablesBackend → MetastoreBackend (database access)

  Affects: DAG bundle processing with GitHook and any other hooks that
  rely on database-stored connections during DAG parsing.

Signed-off-by: Kaxil Naik <kaxilnaik@gmail.com>
@kaxil kaxil force-pushed the bugfix/connections-missing-in-dag-processor branch from 218f394 to 85dec24 Compare October 30, 2025 21:46
@kaxil kaxil changed the title Fix GitHook connection retrieval in dag_processor by setting server context Fix connection retrieval in DagProcessorManager for bundle initialization Oct 30, 2025
@kaxil kaxil merged commit ae2a4fd into apache:main Oct 30, 2025
62 checks passed
kaxil pushed a commit that referenced this pull request Oct 30, 2025
…zation (#57459)

The dag_processor was unable to retrieve connections from the database,
causing GitHook (and other hooks) to fail with:
  AirflowNotFoundException: The conn_id `<conn_id>` isn't defined

Root cause: DagProcessorManager was running in FALLBACK context, which only
loads EnvironmentVariablesBackend, not MetastoreBackend. This meant
connections stored in the database were inaccessible.

The `DagProcessorManager` (parent process) needs database access for connection
retrieval during bundle initialization (e.g., `GitDagBundle.__init__` → `GitHook`
needs git credentials). Child `DagFileProcessorProcess` instances run user code
and should remain isolated from direct database access.

This ensures correct secrets backend chains (when no external secrets backend is configured):
- Manager (parent): `EnvironmentVariablesBackend` → `MetastoreBackend` (database access)
- Parser (child): `EnvironmentVariablesBackend`

Note: This is temporary until AIP-92 removes direct DB access from DagProcessorManager.
Long-term, the manager should use the Execution API instead of direct database access.

Affects: DAG bundle processing with GitHook and any other hooks that rely on
database-stored connections during bundle initialization in the manager process.

(cherry picked from commit ae2a4fd)
@dheerajturaga dheerajturaga deleted the bugfix/connections-missing-in-dag-processor branch October 30, 2025 23:19
@dheerajturaga
Copy link
Member Author

Thanks @kaxil !

@ephraimbuddy ephraimbuddy added the type:bug-fix Changelog: Bug Fixes label Nov 10, 2025
Copilot AI pushed a commit to jason810496/airflow that referenced this pull request Dec 5, 2025
…zation (apache#57459)

The dag_processor was unable to retrieve connections from the database,
causing GitHook (and other hooks) to fail with:
  AirflowNotFoundException: The conn_id `<conn_id>` isn't defined

Root cause: DagProcessorManager was running in FALLBACK context, which only
loads EnvironmentVariablesBackend, not MetastoreBackend. This meant
connections stored in the database were inaccessible.

The `DagProcessorManager` (parent process) needs database access for connection
retrieval during bundle initialization (e.g., `GitDagBundle.__init__` → `GitHook`
needs git credentials). Child `DagFileProcessorProcess` instances run user code
and should remain isolated from direct database access.

This ensures correct secrets backend chains (when no external secrets backend is configured):
- Manager (parent): `EnvironmentVariablesBackend` → `MetastoreBackend` (database access)
- Parser (child): `EnvironmentVariablesBackend`

Note: This is temporary until AIP-92 removes direct DB access from DagProcessorManager.
Long-term, the manager should use the Execution API instead of direct database access.

Affects: DAG bundle processing with GitHook and any other hooks that rely on
database-stored connections during bundle initialization in the manager process.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants