You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "./pandas_bug.py", line 7, in fetch_file
pandas.read_html(url)
File "/usr/lib/python3.6/site-packages/pandas/io/html.py", line 904, in read_html
keep_default_na=keep_default_na)
File "/usr/lib/python3.6/site-packages/pandas/io/html.py", line 731, in _parse
parser = _parser_dispatch(flav)
File "/usr/lib/python3.6/site-packages/pandas/io/html.py", line 691, in _parser_dispatch
raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it
Problem description
read_html() doesn't appear to be multi-threading safe. This specific issue seems to be caused by setting _IMPORTS in html.py to True too early resulting in the second thread entering _parser_dispatch and throwing an exception while the first thread hasn't finished the check.
I have written a potential fix and will open a PR shortly.
Expected Output
No exception should be thrown since lxml is installed and the program works fine without multi-threading.
Code Sample
Output
Problem description
read_html() doesn't appear to be multi-threading safe. This specific issue seems to be caused by setting
_IMPORTS
in html.py to True too early resulting in the second thread entering_parser_dispatch
and throwing an exception while the first thread hasn't finished the check.I have written a potential fix and will open a PR shortly.
Expected Output
No exception should be thrown since lxml is installed and the program works fine without multi-threading.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.11.3-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
pandas: 0.20.1
pytest: None
pip: 9.0.1
setuptools: 36.0.1
Cython: None
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: