-
Notifications
You must be signed in to change notification settings - Fork 16.5k
Description
Apache Airflow version: 1.10.10
Environment:
- Cloud provider or hardware configuration: AWS EKS
- OS (e.g. from /etc/os-release):
python:3.7-stretchdocker image - Install tools:
pip install --constraint https://raw.githubusercontent.com/apache/airflow/1.10.10/requirements/requirements-python3.7.txt apache-airflow[gcp, google_auth]==1.10.10
What happened:
httplib2 throws an exception when extending GoogleDriveHook to create a resumable upload. The libraries dependencies are based on requirements-python3.7.txt.
What you expected to happen:
GoogleCloudBaseHook should exclude 308 response code from httplib2's list of redirect status codes. I believe this is the same issue reported by googleapis/google-api-python-client#803 and fixed in googleapis/google-api-python-client#813.
Since we are already using set_user_agent from googleapiclient.http, is it possible to use build_http instead of directly creating from httplib2?
How to reproduce it:
Running this Code will give us
from airflow.contrib.hooks.gdrive_hook import GoogleDriveHook
hook = GoogleDriveHook(gcp_conn_id="google_cloud_default")
service = hook.get_conn()
# Set chunksize to 1MB to test resumable upload.
media = MediaFileUpload('/tmp/file_larger_than_1mb', chunksize=1024*1024, resumable=True)
file_metadata = {"name": "file_larger_than_1mb"}
service.files().create(body=file_metadata, media_body=media, fields="id").execute()the following exception:
[2020-05-11 15:54:43,012] {taskinstance.py:1145} ERROR - Redirected but the response is missing a Location: header.
Traceback (most recent call last):
File "/anaconda3/envs/airflow_1_10_10/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task
result = task_copy.execute(context=context)
File "/anaconda3/envs/airflow_1_10_10/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 113, in execute
return_value = self.execute_callable()
File "/anaconda3/envs/airflow_1_10_10/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 118, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/Users/waiyan/airflow/dags/repo/utils/dag_factory.py", line 166, in upload_gdrive_and_count
s3_object, gdrive_file_request
File "/Users/waiyan/airflow/dags/repo/common/utils/s3_to_gdrive.py", line 49, in upload_from_s3_to_gdrive_and_count
google_drive_file_response = gdrive_hook.upload(local_file, google_drive_file)
File "/Users/waiyan/airflow/dags/repo/common/hooks/gcp_gdrive_hook.py", line 180, in upload
.execute(num_retries=self.num_retries)
File "/anaconda3/envs/airflow_1_10_10/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
return wrapped(*args, **kwargs)
File "/anaconda3/envs/airflow_1_10_10/lib/python3.7/site-packages/googleapiclient/http.py", line 862, in execute
_, body = self.next_chunk(http=http, num_retries=num_retries)
File "/anaconda3/envs/airflow_1_10_10/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
return wrapped(*args, **kwargs)
File "/anaconda3/envs/airflow_1_10_10/lib/python3.7/site-packages/googleapiclient/http.py", line 1037, in next_chunk
self.resumable_uri, method="PUT", body=data, headers=headers
File "/anaconda3/envs/airflow_1_10_10/lib/python3.7/site-packages/google_auth_httplib2.py", line 198, in request
uri, method, body=body, headers=request_headers, **kwargs)
File "/anaconda3/envs/airflow_1_10_10/lib/python3.7/site-packages/googleapiclient/http.py", line 1813, in new_request
connection_type=connection_type,
File "/anaconda3/envs/airflow_1_10_10/lib/python3.7/site-packages/httplib2/__init__.py", line 1991, in request
cachekey,
File "/anaconda3/envs/airflow_1_10_10/lib/python3.7/site-packages/httplib2/__init__.py", line 1690, in _request
content,
httplib2.RedirectMissingLocation: Redirected but the response is missing a Location: header.
Temporary workaround:
Downgrading httplib2==0.15.0 works as a temporary workaround.