-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mask secrets in stdout for airflow tasks test ...
CLI command (#17476)
#21281
Mask secrets in stdout for airflow tasks test ...
CLI command (#17476)
#21281
Conversation
…he#17476) Add a context manager to secrets_masker inside of which stdout is captured, redacted according to the filters on 'airflow.task' logger and then spit back into stdout. Filters stdout that doesn't go through logger according to airflow.task logger's filters.
def __exit__(self, *args): | ||
""" | ||
On exit from the context, close the `contextlib.redirect_stdout` context to stop capturing lines | ||
from stdout, then route all lines one by one through self.redirect_to. | ||
|
||
""" | ||
self._redirector.__exit__(*args) | ||
for log in self.stdout_lines: | ||
sys.stdout.write(log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this means all of the task’s output will be buffered, and only spit out in one go when the task is finished, which seems wrong. I would call stdout.write
in write
directly after redaction instead.
This can be significantly simplified to
class StdoutRedactor:
@contextlib.contextmanager
@classmethod
def enable(cls):
with contextlib.redirect_stdout(cls()):
yield
def write(self, content):
sys.stdout.write(redact(content))
def flush(self):
sys.stdout.flush()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem here is that every time it writes to stdout
you’ll need to exit the internal redirect context manager because otherwise it’ll get stuck in an infinite recursive loop. I thought about doing that but since __exit__
expects a few arguments that only exist during the external context manager wrapper’s exit function. Do you have any thoughts about being able to disable and reenable the internal redirect context manager within the ‘write’ function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it’ll get stuck in an infinite recursive loop
Ah right. sys.__stdout__
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great find! I'll implement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realised a problem, sys.stdout
may have already been patched before we try to patch it (can happen if the user runs Airflow with custom interpreter startup code), in which case sys.__stdout__
would point to the wrong, unpatched stream.
Something like this is needed:
@dataclasses.dataclass()
class StdoutRedactor:
stdout: TextIO
@contextlib.contextmanager
@classmethod
def enable(cls):
with contextlib.redirect_stdout(cls(sys.stdout)):
yield
def write(self, content):
self.stdout.write(redact(content))
def flush(self):
self.stdout.flush()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So just to make sure that I'm understanding all of this right, we're basically saving a copy of sys.stdout from before it gets redirected, and then writing to that which bypasses the redirect context that we've already defined before that write occurs? And the reason that we use the dataclass decorator is just to simplify the initialization of self.stdout and save a few lines since we don't need to use __init__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that’s my understanding.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
We should finish this. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
The problem with
airflow tasks test ...
is that anystdout
that comes from a task inside an operator or other callable does not go through a logger. It runs in the main process. This results in secrets that are printed tostdout
to not be masked according to theairflow.task
logger.I wanted a way to capture and filter stdout without having to route everything through a secondary logger. I decided a context manager that filters
stdout
would be the best way to go about this. The context manager redacts all secrets according to the filters on theairflow.task
logger.Closes #17476