Skip to content

Docs: Clarify that masking in Connection 'extra' JSON is keyword-dependent #58514

@varaprasadregani

Description

@varaprasadregani

What do you see as an issue?

Docs link: Masking Sensitive Data

Source code link: secrets_masker.py

The documentation currently states:

“Airflow will by default mask Connection passwords and keys from a Connection’s extra (JSON) field when they appear in Task logs, in the Variable and in the Rendered fields views of the UI.”

This statement is misleading because it implies that all keys in a Connection’s extra JSON field are masked. However, only keys whose names contain known sensitive keywords are actually redacted.

The complete list of sensitive keywords from the source code is:
access_token, api_key, apikey, authorization, passphrase, passwd, password, private_key, secret, token, keyfile_dict, service_account

Code used to reproduce this:
I verified this behavior using the following DAG, extracting values from a Connection's extra field (bigquery_connection_id) and Airflow Variables:

from airflow import DAG
from airflow.operators.bash import BashOperator
import pendulum
from airflow.hooks.base import BaseHook
from airflow.models import Variable

# Fetch connection and extract 'extra' JSON
conn = BaseHook.get_connection(conn_id="bigquery_connection_id")
extra_data = conn.extra_dejson

# Test specific keys from 'extra'
keyfile_dict = extra_data.get("keyfile_dict", "not found") # Contains 'keyfile_dict'
param1_token = extra_data.get("param1_token", "not found") # Contains 'token'
hello = extra_data.get("hello", "not found")               # No sensitive keyword

# Test Variables
test_keyfile_dict = Variable.get("test_keyfile_dict")
service_account = Variable.get("service_account")

with DAG(
    dag_id="test_masking",
    start_date=pendulum.datetime(2024, 1, 1, tz="UTC"),
    schedule=None,
    catchup=False,
) as dag:

    test_masking = BashOperator(
        task_id="masking_task",
        bash_command=f"echo '{ keyfile_dict }' > { param1_token } > { hello } > { test_keyfile_dict } > { service_account }"
    )

From my testing:

  • extra__google_cloud_platform__keyfile_dict (from the connection’s extra JSON) → Masked everywhere (Rendered templates, UI, logs).
  • hello (no sensitive keyword) → Not masked.
  • Variable.get("test_keyfile_dict")Masked only in Variables UI.
  • Variable.get("service_account")Masked in Variables UI, Rendered templates, and logs.

The docs should clarify that not all keys in a Connection’s extra JSON are masked—only those containing a sensitive keyword.

Screenshots of observations:
Rendered Templates:

Image

Logs:

Image

Variables UI:

Image

Solving the problem

I suggest two specific updates to the documentation to fix this ambiguity and clarify the scope of masking:

1. Update the default masking paragraph
Clarify that masking is conditional on the key name.

Current Text:

Airflow will by default mask Connection passwords and sensitive Variables and keys from a Connection’s extra (JSON) field when they appear in Task logs, in the Variable and in the Rendered fields views of the UI.

Proposed Text:

Airflow will by default mask Connection passwords, sensitive Variables, and keys from a Connection’s extra (JSON) field whose names contain one or more of the sensitive keywords when they appear in Task logs, in the Variables UI, and in the Rendered fields views of the UI. Keys in the extra JSON that do not include any of these sensitive keywords will not be redacted automatically.

2. Update the "Sensitive field names" section
In the Sensitive field names section, explicitly list the default keywords and add a table showing how the variable source and keyword affect where the data is masked.

Suggested Addition:

Default Sensitive Keywords:
access_token, api_key, apikey, authorization, passphrase, passwd, password, private_key, secret, token, keyfile_dict, service_account.

Examples of Masking Behavior:

Source Key / Variable Name Matching Keyword Masking Scope
Connection Extra google_keyfile_dict keyfile_dict Everywhere (Logs, Rendered Templates, UI)
Connection Extra hello None Not Masked
Variable service_account service_account Everywhere (Logs, Rendered Templates, UI)
Variable test_keyfile_dict keyfile_dict Variables UI Only

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions