-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
logging error #19816
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
Bad workaround patch /home/airflow/.local/lib/python3.8/site-packages/airflow/utils/log/secrets_masker.py
|
The redact function needs to have a hard recursion limit. |
The redact function already has a hard recursion limit (by default 5) in secrets_masker.py Setting AIRFLOW__CORE__HIDE_SENSITIVE_VAR_CONN_FIELDS to "False" should avoid executing this buggy function, but it doesn't work either :-(
|
I've seen this happen as well, it happens when the Since the failing item itself is embedded into the warning-log, it will trigger the same issue again and it repeats forever. airflow/airflow/utils/log/secrets_masker.py Line 215 in 968952b
|
AAAAARGH ! Really nice find @hterik - would you like to attempt to fix it ? I think one of the solutions would be to pass current depth to the log() call with some custom name and have the log pass it as starting depth to redact method? But possibly we could find a nicer solution :) |
When redact warning log on "unredactable" item is printed, the log entered an infinite recursion, because the item was attempted to be redacted again in the log. This PR converts the item to str() - in the worst case the str converstion will fail and raise exception - but this will be about right - but it will not attempt to redact the item again. Fixws: apache#19816
When redact warning log on "unredactable" item is printed, the log entered an infinite recursion, because the item was attempted to be redacted again in the log. This PR converts the item to str() - in the worst case the str converstion will fail and raise exception - but this will be about right - but it will not attempt to redact the item again. Fixes: apache#19816
* Fix infinite recursion on redact log When redact warning log on "unredactable" item is printed, the log entered an infinite recursion, because the item was attempted to be redacted again in the log. This PR converts the item to str() - in the worst case the str converstion will fail and raise exception - but this will be about right - but it will not attempt to redact the item again. Fixes: #19816 * Update airflow/utils/log/secrets_masker.py
* Fix infinite recursion on redact log When redact warning log on "unredactable" item is printed, the log entered an infinite recursion, because the item was attempted to be redacted again in the log. This PR converts the item to str() - in the worst case the str converstion will fail and raise exception - but this will be about right - but it will not attempt to redact the item again. Fixes: #19816 * Update airflow/utils/log/secrets_masker.py (cherry picked from commit f441833)
Just wanted to add some notes about getting the same error on the latest MWAA setup on Airflow 2.2.2. Tasks were failing instantly once they were executed with the same stack overflow error! The problem finally was that LOG LEVEL! We were deploying it through Terraform and the mwaa module allows you to use log-level configuration up to DEBUG. However, MWAA UI seems not to like them and doesn't display the DEBUG selection. As soon as we changed it to INFO+ the tasks started running normally. I hope this saves someone all the frustration that I went through while searching for the cause of a fresh cluster not working UPDATE |
@mitsos1os You just saved my life. Thanks for sharing. Been stuck on this for two days just because of DEBUG level. |
Glad I was helpful! |
We can get into trouble for types that cannot be initiated with re2's `type(obj)()` call. The `secrets_masker` thus fails, which triggers a warning log, which also fails because we pass the object to the logger, which is then masked again, and so forth. We can break the recursion by emitting a log without trying to redact the value again (this ensures no new bug will cause a stack overflow). This issue has occured previously: apache#19816 (comment) Additionally, we fix this particular bug by ensuring whatever re2 receives is a simple `str`. I noticed this issue while working with a DAG that calls Airflow's DB cleanup function. Example DAG: ``` from datetime import datetime from airflow import DAG from airflow.models import Variable from airflow.operators.python import PythonOperator class MyStringClass(str): def __init__(self, required_arg): pass def fail(task_instance): # make sure the `SecretsMasker` has a replacer Variable.set(key="secret", value="secret_value") Variable.get("secret") # trigger the infinite recursion task_instance.log.info("%s", MyStringClass("secret_value")) with DAG( dag_id="secrets_masker_recursion", start_date=datetime(2023, 9, 26), ): PythonOperator(task_id="fail", python_callable=fail) ```
* Fix for infinite recursion due to secrets_masker We can get into trouble for types that cannot be initiated with re2's `type(obj)()` call. The `secrets_masker` thus fails, which triggers a warning log, which also fails because we pass the object to the logger, which is then masked again, and so forth. We can break the recursion by emitting a log without trying to redact the value again (this ensures no new bug will cause a stack overflow). This issue has occured previously: #19816 (comment) Additionally, we fix this particular bug by ensuring whatever re2 receives is a simple `str`. I noticed this issue while working with a DAG that calls Airflow's DB cleanup function. Example DAG: ``` from datetime import datetime from airflow import DAG from airflow.models import Variable from airflow.operators.python import PythonOperator class MyStringClass(str): def __init__(self, required_arg): pass def fail(task_instance): # make sure the `SecretsMasker` has a replacer Variable.set(key="secret", value="secret_value") Variable.get("secret") # trigger the infinite recursion task_instance.log.info("%s", MyStringClass("secret_value")) with DAG( dag_id="secrets_masker_recursion", start_date=datetime(2023, 9, 26), ): PythonOperator(task_id="fail", python_callable=fail) ``` * Improve error message --------- Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
* Fix for infinite recursion due to secrets_masker We can get into trouble for types that cannot be initiated with re2's `type(obj)()` call. The `secrets_masker` thus fails, which triggers a warning log, which also fails because we pass the object to the logger, which is then masked again, and so forth. We can break the recursion by emitting a log without trying to redact the value again (this ensures no new bug will cause a stack overflow). This issue has occured previously: #19816 (comment) Additionally, we fix this particular bug by ensuring whatever re2 receives is a simple `str`. I noticed this issue while working with a DAG that calls Airflow's DB cleanup function. Example DAG: ``` from datetime import datetime from airflow import DAG from airflow.models import Variable from airflow.operators.python import PythonOperator class MyStringClass(str): def __init__(self, required_arg): pass def fail(task_instance): # make sure the `SecretsMasker` has a replacer Variable.set(key="secret", value="secret_value") Variable.get("secret") # trigger the infinite recursion task_instance.log.info("%s", MyStringClass("secret_value")) with DAG( dag_id="secrets_masker_recursion", start_date=datetime(2023, 9, 26), ): PythonOperator(task_id="fail", python_callable=fail) ``` * Improve error message --------- Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
* Fix for infinite recursion due to secrets_masker We can get into trouble for types that cannot be initiated with re2's `type(obj)()` call. The `secrets_masker` thus fails, which triggers a warning log, which also fails because we pass the object to the logger, which is then masked again, and so forth. We can break the recursion by emitting a log without trying to redact the value again (this ensures no new bug will cause a stack overflow). This issue has occured previously: #19816 (comment) Additionally, we fix this particular bug by ensuring whatever re2 receives is a simple `str`. I noticed this issue while working with a DAG that calls Airflow's DB cleanup function. Example DAG: ``` from datetime import datetime from airflow import DAG from airflow.models import Variable from airflow.operators.python import PythonOperator class MyStringClass(str): def __init__(self, required_arg): pass def fail(task_instance): # make sure the `SecretsMasker` has a replacer Variable.set(key="secret", value="secret_value") Variable.get("secret") # trigger the infinite recursion task_instance.log.info("%s", MyStringClass("secret_value")) with DAG( dag_id="secrets_masker_recursion", start_date=datetime(2023, 9, 26), ): PythonOperator(task_id="fail", python_callable=fail) ``` * Improve error message --------- Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
* Fix for infinite recursion due to secrets_masker We can get into trouble for types that cannot be initiated with re2's `type(obj)()` call. The `secrets_masker` thus fails, which triggers a warning log, which also fails because we pass the object to the logger, which is then masked again, and so forth. We can break the recursion by emitting a log without trying to redact the value again (this ensures no new bug will cause a stack overflow). This issue has occured previously: apache/airflow#19816 (comment) Additionally, we fix this particular bug by ensuring whatever re2 receives is a simple `str`. I noticed this issue while working with a DAG that calls Airflow's DB cleanup function. Example DAG: ``` from datetime import datetime from airflow import DAG from airflow.models import Variable from airflow.operators.python import PythonOperator class MyStringClass(str): def __init__(self, required_arg): pass def fail(task_instance): # make sure the `SecretsMasker` has a replacer Variable.set(key="secret", value="secret_value") Variable.get("secret") # trigger the infinite recursion task_instance.log.info("%s", MyStringClass("secret_value")) with DAG( dag_id="secrets_masker_recursion", start_date=datetime(2023, 9, 26), ): PythonOperator(task_id="fail", python_callable=fail) ``` * Improve error message --------- Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> GitOrigin-RevId: 6e8f646bf9aa071154069bfcace22e53b4d35574
* Fix for infinite recursion due to secrets_masker We can get into trouble for types that cannot be initiated with re2's `type(obj)()` call. The `secrets_masker` thus fails, which triggers a warning log, which also fails because we pass the object to the logger, which is then masked again, and so forth. We can break the recursion by emitting a log without trying to redact the value again (this ensures no new bug will cause a stack overflow). This issue has occured previously: apache/airflow#19816 (comment) Additionally, we fix this particular bug by ensuring whatever re2 receives is a simple `str`. I noticed this issue while working with a DAG that calls Airflow's DB cleanup function. Example DAG: ``` from datetime import datetime from airflow import DAG from airflow.models import Variable from airflow.operators.python import PythonOperator class MyStringClass(str): def __init__(self, required_arg): pass def fail(task_instance): # make sure the `SecretsMasker` has a replacer Variable.set(key="secret", value="secret_value") Variable.get("secret") # trigger the infinite recursion task_instance.log.info("%s", MyStringClass("secret_value")) with DAG( dag_id="secrets_masker_recursion", start_date=datetime(2023, 9, 26), ): PythonOperator(task_id="fail", python_callable=fail) ``` * Improve error message --------- Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> GitOrigin-RevId: 6e8f646bf9aa071154069bfcace22e53b4d35574
* Fix for infinite recursion due to secrets_masker We can get into trouble for types that cannot be initiated with re2's `type(obj)()` call. The `secrets_masker` thus fails, which triggers a warning log, which also fails because we pass the object to the logger, which is then masked again, and so forth. We can break the recursion by emitting a log without trying to redact the value again (this ensures no new bug will cause a stack overflow). This issue has occured previously: apache/airflow#19816 (comment) Additionally, we fix this particular bug by ensuring whatever re2 receives is a simple `str`. I noticed this issue while working with a DAG that calls Airflow's DB cleanup function. Example DAG: ``` from datetime import datetime from airflow import DAG from airflow.models import Variable from airflow.operators.python import PythonOperator class MyStringClass(str): def __init__(self, required_arg): pass def fail(task_instance): # make sure the `SecretsMasker` has a replacer Variable.set(key="secret", value="secret_value") Variable.get("secret") # trigger the infinite recursion task_instance.log.info("%s", MyStringClass("secret_value")) with DAG( dag_id="secrets_masker_recursion", start_date=datetime(2023, 9, 26), ): PythonOperator(task_id="fail", python_callable=fail) ``` * Improve error message --------- Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> GitOrigin-RevId: 6e8f646bf9aa071154069bfcace22e53b4d35574
Apache Airflow version
2.2.2 (latest released)
Operating System
Linux-5.4.0-1056-azure-x86_64-with-glibc2.2.5
Versions of Apache Airflow Providers
No response
Deployment
Other Docker-based deployment
Deployment details
kubernetes 1.21.2 (AKS)
What happened
DAG using logging class produce stack overflow and abort task. It seems like a bug in /home/airflow/.local/lib/python3.8/site-packages/airflow/utils/log/secrets_masker.py
ENV:
AIRFLOW__CORE__HIDE_SENSITIVE_VAR_CONN_FIELDS="False"
Dag Code fragment:
log_stream = StringIO()
logging.basicConfig(stream=log_stream, level=logging.DEBUG)
logger = logging.getLogger("airflow.task")
logger.addHandler(logging.StreamHandler(log_stream))
...
logger.info("------------------------------ ------------------- ------------------------------")
Log in worker pod:
What you expected to happen
No response
How to reproduce
No response
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: