Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Secret masking fails on io objects #16071

Closed
madison-ookla opened this issue May 25, 2021 · 4 comments · Fixed by #16118
Closed

Secret masking fails on io objects #16071

madison-ookla opened this issue May 25, 2021 · 4 comments · Fixed by #16118
Assignees
Labels
area:logging kind:bug This is a clearly a bug
Milestone

Comments

@madison-ookla
Copy link
Contributor

madison-ookla commented May 25, 2021

Apache Airflow version: 2.1.0

Kubernetes version (if you are using kubernetes) (use kubectl version): N/A

Environment: *NIX

  • Cloud provider or hardware configuration: N/A
  • OS (e.g. from /etc/os-release): N/A
  • Kernel (e.g. uname -a): N/A
  • Install tools:
  • Others:

What happened:

Due to the new secrets masker, logging will fail when an IO object is passed to a logging call.

What you expected to happen:

Logging should succeed when an IO object is passed to the logging cal.

How to reproduce it:

Sample DAG:

import logging
from datetime import datetime

from airflow import DAG
from airflow.operators.python import PythonOperator


log = logging.getLogger(__name__)


def log_io():
    file = open("/tmp/foo", "w")
    log.info("File: %s", file)


# Create the DAG -----------------------------------------------------------------------
dag = DAG(
    dag_id="Test_Log_IO",
    schedule_interval=None,
    catchup=False,
    default_args={
        "owner": "madison.swain-bowden",
        "depends_on_past": False,
        "start_date": datetime(2021, 5, 4),
    },
)


with dag:
    PythonOperator(
        task_id="log_io",
        python_callable=log_io,
    )

Logging that occurs when run on Airflow (task subsequently fails):

[2021-05-25 11:27:08,080] {logging_mixin.py:104} INFO - Running <TaskInstance: Test_Log_IO.log_io 2021-05-25T18:25:17.679660+00:00 [running]> on host Madisons-MacBook-Pro
[2021-05-25 11:27:08,137] {taskinstance.py:1280} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=madison.swain-bowden
AIRFLOW_CTX_DAG_ID=Test_Log_IO
AIRFLOW_CTX_TASK_ID=log_io
AIRFLOW_CTX_EXECUTION_DATE=2021-05-25T18:25:17.679660+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2021-05-25T18:25:17.679660+00:00
[2021-05-25 11:27:08,138] {taskinstance.py:1481} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/Users/madison/programs/anaconda3/envs/ookla-airflow/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1137, in _run_raw_task
    self._prepare_and_execute_task_with_callbacks(context, task)
  File "/Users/madison/programs/anaconda3/envs/ookla-airflow/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks
    result = self._execute_task(context, task_copy)
  File "/Users/madison/programs/anaconda3/envs/ookla-airflow/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1341, in _execute_task
    result = task_copy.execute(context=context)
  File "/Users/madison/programs/anaconda3/envs/ookla-airflow/lib/python3.9/site-packages/airflow/operators/python.py", line 150, in execute
    return_value = self.execute_callable()
  File "/Users/madison/programs/anaconda3/envs/ookla-airflow/lib/python3.9/site-packages/airflow/operators/python.py", line 161, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/Users/madison/git/airflow-dags/ookla/dags/Test_Log_IO/log_io.py", line 13, in log_io
    log.info("File: %s", file)
  File "/Users/madison/programs/anaconda3/envs/ookla-airflow/lib/python3.9/logging/__init__.py", line 1446, in info
    self._log(INFO, msg, args, **kwargs)
  File "/Users/madison/programs/anaconda3/envs/ookla-airflow/lib/python3.9/logging/__init__.py", line 1589, in _log
    self.handle(record)
  File "/Users/madison/programs/anaconda3/envs/ookla-airflow/lib/python3.9/logging/__init__.py", line 1599, in handle
    self.callHandlers(record)
  File "/Users/madison/programs/anaconda3/envs/ookla-airflow/lib/python3.9/logging/__init__.py", line 1661, in callHandlers
    hdlr.handle(record)
  File "/Users/madison/programs/anaconda3/envs/ookla-airflow/lib/python3.9/logging/__init__.py", line 948, in handle
    rv = self.filter(record)
  File "/Users/madison/programs/anaconda3/envs/ookla-airflow/lib/python3.9/logging/__init__.py", line 806, in filter
    result = f.filter(record)
  File "/Users/madison/programs/anaconda3/envs/ookla-airflow/lib/python3.9/site-packages/airflow/utils/log/secrets_masker.py", line 157, in filter
    record.__dict__[k] = self.redact(v)
  File "/Users/madison/programs/anaconda3/envs/ookla-airflow/lib/python3.9/site-packages/airflow/utils/log/secrets_masker.py", line 203, in redact
    return tuple(self.redact(subval) for subval in item)
  File "/Users/madison/programs/anaconda3/envs/ookla-airflow/lib/python3.9/site-packages/airflow/utils/log/secrets_masker.py", line 203, in <genexpr>
    return tuple(self.redact(subval) for subval in item)
  File "/Users/madison/programs/anaconda3/envs/ookla-airflow/lib/python3.9/site-packages/airflow/utils/log/secrets_masker.py", line 205, in redact
    return list(self.redact(subval) for subval in item)
  File "/Users/madison/programs/anaconda3/envs/ookla-airflow/lib/python3.9/site-packages/airflow/utils/log/secrets_masker.py", line 205, in <genexpr>
    return list(self.redact(subval) for subval in item)
io.UnsupportedOperation: not readable
[2021-05-25 11:27:08,145] {taskinstance.py:1524} INFO - Marking task as FAILED. dag_id=Test_Log_IO, task_id=log_io, execution_date=20210525T182517, start_date=20210525T182707, end_date=20210525T182708
[2021-05-25 11:27:08,197] {local_task_job.py:151} INFO - Task exited with return code 1

Anything else we need to know:

If I set the value defined here to False, the task completes successfully and the line is logged appropriately: https://github.com/apache/airflow/blob/2.1.0/airflow/cli/commands/task_command.py#L205

Example output (when set to False):

[2021-05-25 11:48:54,185] {logging_mixin.py:104} INFO - Running <TaskInstance: Test_Log_IO.log_io 2021-05-25T18:48:45.911082+00:00 [running]> on host Madisons-MacBook-Pro
[2021-05-25 11:48:54,262] {taskinstance.py:1280} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=madison.swain-bowden
AIRFLOW_CTX_DAG_ID=Test_Log_IO
AIRFLOW_CTX_TASK_ID=log_io
AIRFLOW_CTX_EXECUTION_DATE=2021-05-25T18:48:45.911082+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2021-05-25T18:48:45.911082+00:00
[2021-05-25 11:48:54,264] {log_io.py:13} INFO - File: <_io.TextIOWrapper name='/tmp/foo' mode='w' encoding='UTF-8'>
[2021-05-25 11:48:54,264] {python.py:151} INFO - Done. Returned value was: None
[2021-05-25 11:48:54,274] {taskinstance.py:1184} INFO - Marking task as SUCCESS. dag_id=Test_Log_IO, task_id=log_io, execution_date=20210525T184845, start_date=20210525T184854, end_date=20210525T184854
[2021-05-25 11:48:54,305] {taskinstance.py:1245} INFO - 0 downstream tasks scheduled from follow-on schedule check
[2021-05-25 11:48:54,339] {local_task_job.py:151} INFO - Task exited with return code 0

Unfortunately the logging that caused this problem for me originally is being done by a third party library, so I can't alter the way this works on our end.

@madison-ookla madison-ookla added the kind:bug This is a clearly a bug label May 25, 2021
@mik-laj
Copy link
Member

mik-laj commented May 26, 2021

Cc: @ashb

@ashb ashb self-assigned this May 27, 2021
@ashb ashb added this to the Airflow 2.1.1 milestone May 27, 2021
@ashb
Copy link
Member

ashb commented May 27, 2021

@madison-ookla A temporary work around if you need it set https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#hide-sensitive-var-conn-fields to false

@madison-ookla
Copy link
Contributor Author

@ashb unfortunately that setting seems to have no effect. If I set that to False, the logging still fails. I believe it's due to this line: https://github.com/apache/airflow/blob/2.1.0/airflow/cli/commands/task_command.py#L205

@ashb
Copy link
Member

ashb commented May 27, 2021

Hmm curious, be setting that of should have disabled the redacting.

I'll look at what is going on there too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logging kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants