-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
What do you see as an issue?
Docs link: Masking Sensitive Data
Source code link: secrets_masker.py
The documentation currently states:
“Airflow will by default mask Connection passwords and keys from a Connection’s extra (JSON) field when they appear in Task logs, in the Variable and in the Rendered fields views of the UI.”
This statement is misleading because it implies that all keys in a Connection’s extra JSON field are masked. However, only keys whose names contain known sensitive keywords are actually redacted.
The complete list of sensitive keywords from the source code is:
access_token, api_key, apikey, authorization, passphrase, passwd, password, private_key, secret, token, keyfile_dict, service_account
Code used to reproduce this:
I verified this behavior using the following DAG, extracting values from a Connection's extra field (bigquery_connection_id) and Airflow Variables:
from airflow import DAG
from airflow.operators.bash import BashOperator
import pendulum
from airflow.hooks.base import BaseHook
from airflow.models import Variable
# Fetch connection and extract 'extra' JSON
conn = BaseHook.get_connection(conn_id="bigquery_connection_id")
extra_data = conn.extra_dejson
# Test specific keys from 'extra'
keyfile_dict = extra_data.get("keyfile_dict", "not found") # Contains 'keyfile_dict'
param1_token = extra_data.get("param1_token", "not found") # Contains 'token'
hello = extra_data.get("hello", "not found") # No sensitive keyword
# Test Variables
test_keyfile_dict = Variable.get("test_keyfile_dict")
service_account = Variable.get("service_account")
with DAG(
dag_id="test_masking",
start_date=pendulum.datetime(2024, 1, 1, tz="UTC"),
schedule=None,
catchup=False,
) as dag:
test_masking = BashOperator(
task_id="masking_task",
bash_command=f"echo '{ keyfile_dict }' > { param1_token } > { hello } > { test_keyfile_dict } > { service_account }"
)From my testing:
extra__google_cloud_platform__keyfile_dict(from the connection’s extra JSON) → Masked everywhere (Rendered templates, UI, logs).hello(no sensitive keyword) → Not masked.Variable.get("test_keyfile_dict")→ Masked only in Variables UI.Variable.get("service_account")→ Masked in Variables UI, Rendered templates, and logs.
The docs should clarify that not all keys in a Connection’s extra JSON are masked—only those containing a sensitive keyword.
Screenshots of observations:
Rendered Templates:
Logs:
Variables UI:
Solving the problem
I suggest two specific updates to the documentation to fix this ambiguity and clarify the scope of masking:
1. Update the default masking paragraph
Clarify that masking is conditional on the key name.
Current Text:
Airflow will by default mask Connection passwords and sensitive Variables and keys from a Connection’s extra (JSON) field when they appear in Task logs, in the Variable and in the Rendered fields views of the UI.
Proposed Text:
Airflow will by default mask Connection passwords, sensitive Variables, and keys from a Connection’s extra (JSON) field whose names contain one or more of the sensitive keywords when they appear in Task logs, in the Variables UI, and in the Rendered fields views of the UI. Keys in the extra JSON that do not include any of these sensitive keywords will not be redacted automatically.
2. Update the "Sensitive field names" section
In the Sensitive field names section, explicitly list the default keywords and add a table showing how the variable source and keyword affect where the data is masked.
Suggested Addition:
Default Sensitive Keywords:
access_token,api_key,apikey,authorization,passphrase,passwd,password,private_key,secret,token,keyfile_dict,service_account.Examples of Masking Behavior:
Source Key / Variable Name Matching Keyword Masking Scope Connection Extra google_keyfile_dictkeyfile_dictEverywhere (Logs, Rendered Templates, UI) Connection Extra helloNone Not Masked Variable service_accountservice_accountEverywhere (Logs, Rendered Templates, UI) Variable test_keyfile_dictkeyfile_dictVariables UI Only
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct