-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Apache Airflow Provider(s)
http
Versions of Apache Airflow Providers
apache-airflow-providers-http==5.3.0
Apache Airflow version
2.11.0
Operating System
debian bookworm
Deployment
Official Apache Airflow Helm Chart
Deployment details
not relevant
What happened
A HttpHook was instantiated: hook = HttpHook(...)
A value for extra_options was created dynamically, based on some condition.
In this case, it looked like:
extra_options_val = {"proxies": {"http": "redacted", "https": "redacted"}}
hook.run(..., extra_options=extra_options_val))
This runs OK.
A second call is made (to get a second page of results):
hook.run(..., extra_options=extra_options_val))
This fails (because the "proxies" value was popped out of the extra_options_val dict on the first call to hook.run().
What should happen:
The extra_options dict passed in to run() should not be modified.
HttpHook.run() should make a deepcopy of extra_options, or HttpHook._configure_session_from_extra() should not remove values from the extra_options parameter.
What you think should happen instead
The extra_options dict passed in to run() should not be modified.
HttpHook.run() should make a deepcopy of extra_options, or HttpHook._configure_session_from_extra() should not remove values from the extra_options parameter.
How to reproduce
from copy import deepcopy
import airflow
from airflow.providers.standard.operators.python import PythonOperator
from airflow.providers.http.hooks.http import HttpHook
def http_hook_test():
extra_options = {"verify_ssl": False}
original_extra_options = deepcopy(extra_options)
# jsonplaceholder Connection has
# * Host = https://jsonplaceholder.typicode.com
# * Extra = {"foo": "bar"}
# Note: some Extra value is necessary, otherwise the extra_options passed to run are not used, see https://github.com/apache/airflow/issues/51685
http_hook = HttpHook(method='GET', http_conn_id='jsonplaceholder')
http_hook.run(
endpoint=f"/posts/1",
extra_options=extra_options,
)
assert extra_options == original_extra_options, f"extra_options: {extra_options}, original_extra_options: {original_extra_options}"
with airflow.DAG(
dag_id='http_hook_test_dag',
schedule_interval=None,
start_date=airflow.utils.dates.days_ago(1),
catchup=False,
) as dag:
test_http_hook = PythonOperator(
task_id='test_http_hook',
python_callable=http_hook_test,
)
Anything else
Modifying a mutable value passed in should, imho, be avoided in framework libraries. Sometimes it can make sense, for example in update_these_values(values_to_update). But in general, the caller of a framework library function will not expect it to modified, unless the function documentation explicitly states that it will be modified.
This also seems to apply to the Airflow 3.0.2 HttpHook.
The provided reproducible example uses verify_ssl instead of proxies in extra_options because it's easier to clearly reproduce, but the same behaviour is seen with proxies.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct