Skip to content

GoogleCloudBaseHook does not exclude 308 response code #8810

@waiyan1612

Description

@waiyan1612

Apache Airflow version: 1.10.10

Environment:

  • Cloud provider or hardware configuration: AWS EKS
  • OS (e.g. from /etc/os-release): python:3.7-stretch docker image
  • Install tools: pip install --constraint https://raw.githubusercontent.com/apache/airflow/1.10.10/requirements/requirements-python3.7.txt apache-airflow[gcp, google_auth]==1.10.10

What happened:

httplib2 throws an exception when extending GoogleDriveHook to create a resumable upload. The libraries dependencies are based on requirements-python3.7.txt.

What you expected to happen:

GoogleCloudBaseHook should exclude 308 response code from httplib2's list of redirect status codes. I believe this is the same issue reported by googleapis/google-api-python-client#803 and fixed in googleapis/google-api-python-client#813.

Since we are already using set_user_agent from googleapiclient.http, is it possible to use build_http instead of directly creating from httplib2?

How to reproduce it:

Running this Code will give us

from airflow.contrib.hooks.gdrive_hook import GoogleDriveHook
hook = GoogleDriveHook(gcp_conn_id="google_cloud_default")
service = hook.get_conn()

# Set chunksize to 1MB to test resumable upload.
media = MediaFileUpload('/tmp/file_larger_than_1mb', chunksize=1024*1024, resumable=True)
file_metadata = {"name": "file_larger_than_1mb"}
service.files().create(body=file_metadata, media_body=media, fields="id").execute()

the following exception:

[2020-05-11 15:54:43,012] {taskinstance.py:1145} ERROR - Redirected but the response is missing a Location: header.
Traceback (most recent call last):
  File "/anaconda3/envs/airflow_1_10_10/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/anaconda3/envs/airflow_1_10_10/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 113, in execute
    return_value = self.execute_callable()
  File "/anaconda3/envs/airflow_1_10_10/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 118, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/Users/waiyan/airflow/dags/repo/utils/dag_factory.py", line 166, in upload_gdrive_and_count
    s3_object, gdrive_file_request
  File "/Users/waiyan/airflow/dags/repo/common/utils/s3_to_gdrive.py", line 49, in upload_from_s3_to_gdrive_and_count
    google_drive_file_response = gdrive_hook.upload(local_file, google_drive_file)
  File "/Users/waiyan/airflow/dags/repo/common/hooks/gcp_gdrive_hook.py", line 180, in upload
    .execute(num_retries=self.num_retries)
  File "/anaconda3/envs/airflow_1_10_10/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/anaconda3/envs/airflow_1_10_10/lib/python3.7/site-packages/googleapiclient/http.py", line 862, in execute
    _, body = self.next_chunk(http=http, num_retries=num_retries)
  File "/anaconda3/envs/airflow_1_10_10/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/anaconda3/envs/airflow_1_10_10/lib/python3.7/site-packages/googleapiclient/http.py", line 1037, in next_chunk
    self.resumable_uri, method="PUT", body=data, headers=headers
  File "/anaconda3/envs/airflow_1_10_10/lib/python3.7/site-packages/google_auth_httplib2.py", line 198, in request
    uri, method, body=body, headers=request_headers, **kwargs)
  File "/anaconda3/envs/airflow_1_10_10/lib/python3.7/site-packages/googleapiclient/http.py", line 1813, in new_request
    connection_type=connection_type,
  File "/anaconda3/envs/airflow_1_10_10/lib/python3.7/site-packages/httplib2/__init__.py", line 1991, in request
    cachekey,
  File "/anaconda3/envs/airflow_1_10_10/lib/python3.7/site-packages/httplib2/__init__.py", line 1690, in _request
    content,
httplib2.RedirectMissingLocation: Redirected but the response is missing a Location: header.

Temporary workaround:

Downgrading httplib2==0.15.0 works as a temporary workaround.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions