Skip to content

BUG: SSL handshake error with Python 3.10 and Pandas read_csv for URLs #47189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
turnerm opened this issue Jun 1, 2022 · 5 comments
Closed
3 tasks done
Labels
Closing Candidate May be closeable, needs more eyeballs IO CSV read_csv, to_csv Python 3.10

Comments

@turnerm
Copy link

turnerm commented Jun 1, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
url = ("https://iridl.ldeo.columbia.edu/"
       "SOURCES/.UCSB/.CHIRPS/.v2p0/.monthly/"
       ".global/.T/last/subgrid/0./add/T/"
       "table%3A/1/%3Atable/.csv")
pd.read_csv(url)

Issue Description

With Python 3.10, reading the CHIRPS rainfall data csv file from the URL in the provided example fails with the following error:

Traceback (most recent call last):
  File "/usr/lib/python3.10/urllib/request.py", line 1348, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/lib/python3.10/http/client.py", line 1282, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1328, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1277, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1037, in _send_output
    self.send(msg)
  File "/usr/lib/python3.10/http/client.py", line 975, in send
    self.connect()
  File "/usr/lib/python3.10/http/client.py", line 1454, in connect
    self.sock = self._context.wrap_socket(self.sock,
  File "/usr/lib/python3.10/ssl.py", line 512, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/lib/python3.10/ssl.py", line 1070, in _create
    self.do_handshake()
  File "/usr/lib/python3.10/ssl.py", line 1341, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:997)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/turnerm/sync/pa-aa-toolbox/run_chirps.py", line 21, in <module>
    df = pd.read_csv(url)
  File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/util/_decorators.py", line 317, in wrapper
    return func(*args, **kwargs)
  File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 927, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 582, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1421, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1707, in _make_engine
    self.handles = get_handle(  # type: ignore[call-overload]
  File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/io/common.py", line 672, in get_handle
    ioargs = _get_filepath_or_buffer(
  File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/io/common.py", line 336, in _get_filepath_or_buffer
    with urlopen(req_info) as req:
  File "/home/turnerm/sync/pa-aa-toolbox/venv/lib/python3.10/site-packages/pandas/io/common.py", line 239, in urlopen
    return urllib.request.urlopen(*args, **kwargs)
  File "/usr/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.10/urllib/request.py", line 519, in open
    response = self._open(req, data)
  File "/usr/lib/python3.10/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 1391, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "/usr/lib/python3.10/urllib/request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:997)>

This error is not present in Python 3.6-3.9. I suspect it is due to the increased security for default TLS settings in Python 3.10. A workaround I found based on this SO post:

import ssl
from urllib.request import urlopen

import pandas as pd

url = ("https://iridl.ldeo.columbia.edu/"
       "SOURCES/.UCSB/.CHIRPS/.v2p0/.monthly/"
       ".global/.T/last/subgrid/0./add/T/"
       "table%3A/1/%3Atable/.csv")

context=ssl.create_default_context()
context.set_ciphers("DEFAULT")
result = urlopen(url, context=context)
df = pd.read_csv(result)

Expected Behavior

The csv should be read correctly into a dataframe, and should look like:

       Time
0  Apr 2022

(Note that this dataset is not completely static, the date may eventually change, but it should be of a similar format)

Installed Versions

INSTALLED VERSIONS

commit : 3bf2cb1
python : 3.10.4.final.0
python-bits : 64
OS : Linux
OS-release : 5.13.0-41-generic
Version : #46~20.04.1-Ubuntu SMP Wed Apr 20 13:16:21 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.5.0.dev0+849.g3bf2cb1b2
numpy : 1.22.4
pytz : 2022.1
dateutil : 2.8.2
setuptools : 58.1.0
pip : 22.1.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.9.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.4.0
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : 2022.5.0
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : 0.8.9
xarray : 2022.3.0
xlrd : 2.0.1
xlwt : 1.3.0
zstandard : None

@turnerm turnerm added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 1, 2022
@simonjayhawkins
Copy link
Member

Thanks @turnerm for the report.

A workaround I found based on this SO post:

from that post...

To make connections possible again it is necessary to use weaker security settings.

I'm no security expert, but that can only be a bad thing?

Expected Behavior

The csv should be read correctly into a dataframe, and should look like:

I don't think pandas should implement any workarounds that weaken security, so removing the bug label and labelling as won't fix and closing candidate to see what others think.

@simonjayhawkins simonjayhawkins added IO CSV read_csv, to_csv Closing Candidate May be closeable, needs more eyeballs and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 2, 2022
@simonjayhawkins simonjayhawkins added this to the No action milestone Jun 2, 2022
@twoertwein
Copy link
Member

twoertwein commented Jun 2, 2022

I'm using python 3.10.4 on Fedora 36. I have no issues reading this CSV file!

On an older machine with python 3.10, I get the reported SSL error. I assume the certificates/openssl installed on that machine might be too old.

edit:

and according to ssllabs the site supports TLS 1.2. I assume that your openssl installations does not support 1.2.

@turnerm
Copy link
Author

turnerm commented Jun 2, 2022

I'm a bit out of my depth at this point, but in case it's useful information, I do believe my machine supports 1.2:

$ openssl ciphers -v | awk '{print $2}' | sort | uniq
SSLv3
TLSv1
TLSv1.2
TLSv1.3

I first encountered this error on GitHub actions, also when running on Ubuntu 20.04 (same as my machine).

@twoertwein
Copy link
Member

I'm a bit out of my depth at this point

Me too :) The older machine on which I also get the error, also seems to support TLS 1.2.

I think the issue is related to the python/openssl installation - unfortunately, I don't know what is wrong (I would assume it works when you upgrade to Ubuntu 22.04/Fedora 36). Pandas simply uses urllib (and fsspec) to open URLs. If you believe that this is not an issue with the python/openssl installation, please feel free to open an issue at urllib.

@phofl
Copy link
Member

phofl commented Jul 3, 2022

Closing as won't fix on our side

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs IO CSV read_csv, to_csv Python 3.10
Projects
None yet
Development

No branches or pull requests

4 participants