Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_html failing in many tests: AssertionError: No tables found #4214

Closed
yarikoptic opened this issue Jul 11, 2013 · 43 comments · Fixed by #4232 or #4257
Closed

read_html failing in many tests: AssertionError: No tables found #4214

yarikoptic opened this issue Jul 11, 2013 · 43 comments · Fixed by #4232 or #4257
Labels
Testing pandas testing functions or related to the test suite
Milestone

Comments

@yarikoptic
Copy link
Contributor

typical error with

======================================================================
FAIL: test_banklist (pandas.io.tests.test_html.TestReadHtmlBase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/yoh/deb/gits/pkg-exppsy/build-area/pandas-0.11.0+git43-g7b2eaa4/debian/tmp/usr/lib/python2.7/dist-packages/pandas/io/tests/test_html.py", line 114, in test_banklist
    attrs={'id': 'table'})
  File "/home/yoh/deb/gits/pkg-exppsy/build-area/pandas-0.11.0+git43-g7b2eaa4/debian/tmp/usr/lib/python2.7/dist-packages/pandas/io/tests/test_html.py", line 67, in run_read_html
    return read_html(*args, **kwargs)
  File "/home/yoh/deb/gits/pkg-exppsy/build-area/pandas-0.11.0+git43-g7b2eaa4/debian/tmp/usr/lib/python2.7/dist-packages/pandas/io/html.py", line 900, in read_html
    attrs)
  File "/home/yoh/deb/gits/pkg-exppsy/build-area/pandas-0.11.0+git43-g7b2eaa4/debian/tmp/usr/lib/python2.7/dist-packages/pandas/io/html.py", line 769, in _parse
    raise retained
AssertionError: No tables found
@yarikoptic
Copy link
Contributor Author

I guess another related failure

======================================================================
FAIL: pandas.io.tests.test_html.test_bs4_finds_tables
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/yoh/deb/gits/pkg-exppsy/build-area/pandas-0.11.0+git43-g7b2eaa4/debian/tmp/usr/lib/python2.7/dist-packages/pandas/io/tests/test_html.py", line 453, in test_bs4_finds_tables
    assert get_elements_from_url(filepath, 'table')
AssertionError

@jreback
Copy link
Contributor

jreback commented Jul 11, 2013

can you show ci/print_versions.py?

@yarikoptic
Copy link
Contributor Author

Here is what I have ATM (without running the build with adjusted
matplotlib backend and PYTHONPATH... which shouldn't matter -- just want
to make clear if I am not missing anything):

$> ci/print_versions.py

INSTALLED VERSIONS

Python: 2.7.5.final.0
OS: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.41-2 x86_64
LC_ALL: None
LANG: en_US

Cython: 0.19
Numpy: 1.7.1
Scipy: 0.10.1
statsmodels: 0.4.2
patsy: Not installed
scikits.timeseries: Not installed
dateutil: 1.5
pytz: 2012c
bottleneck: Not installed
PyTables: 2.3.1
numexpr: 2.0.1
matplotlib: 1.1.1rc2
openpyxl: 1.6.1
xlrd: 0.6.1
xlwt: 0.7.4
sqlalchemy: 0.7.9
lxml: 3.2.0
bs4: 4.2.0
html5lib: 0.95-dev

On Thu, 11 Jul 2013, jreback wrote:

can you show ci/print_versions.py?


Reply to this email directly or [1]view it on GitHub.

References

Visible links

  1. read_html failing in many tests: AssertionError: No tables found #4214 (comment)

Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Senior Research Associate, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik

@jreback
Copy link
Contributor

jreback commented Jul 11, 2013

@cpcloud ?

@cpcloud
Copy link
Member

cpcloud commented Jul 11, 2013

@yarikoptic this is master correct?

@cpcloud
Copy link
Member

cpcloud commented Jul 11, 2013

7097368 should have fixed these issues

it looks like you're using an older version of html5lib. i believe 1.0b2 is out.

also, check out the reading html gotchas

@yarikoptic
Copy link
Contributor Author

yes -- master 0.11.0+git43-g7b2eaa4

thanks for the fix -- I will check it out

html5lib -- well, I have the freshiest non-beta release ;) thanks for the note though!

@cpcloud
Copy link
Member

cpcloud commented Jul 11, 2013

i'll try this on a vagrant box

the commit i referenced above is already in master

@cpcloud
Copy link
Member

cpcloud commented Jul 11, 2013

@yarikoptic is it of any significance that your version says 0.11.0?

$ git describe
v0.12.0rc1-43-g7b2eaa4

@cpcloud
Copy link
Member

cpcloud commented Jul 11, 2013

@yarikoptic there are no guarantees that your bs4 version will work...

see the optional dependencies docs

also your lxml version might need to be upgraded.

i would say: leave html5lib alone for now that should be fine, but change your bs4 from 4.2.0 to either 4.2.1, 4.1.3 or 4.0.2. then run tests. if lxml still fails upgrade to 3.2.1

if none of this works then i'll mark it as a bug

@cpcloud
Copy link
Member

cpcloud commented Jul 11, 2013

there's a release of bs4 that really shouldn't have existed, i believe it is 4.2.0.

@cpcloud
Copy link
Member

cpcloud commented Jul 11, 2013

fyi travis installs bs4 4.0.2 and that works...i'm not sure how different ubuntu 12.04 is from debian sid, but i imagine they share a lot of similarity, i'm sure you know much more than i about this

@jreback
Copy link
Contributor

jreback commented Jul 11, 2013

It think @cpcloud should setup a version dependency hotline, kind of like AA :)

@cpcloud
Copy link
Member

cpcloud commented Jul 11, 2013

Hello, this is read_html-one-one, what's your emergency?

@yarikoptic
Copy link
Contributor Author

ah -- thanks -- should have been indeed 0.12.0~rc1+git... as the Debian
perspective version -- fixed in my debian branch.

On Thu, 11 Jul 2013, Phillip Cloud wrote:

[1]@yarikoptic is it of any significance that your version says 0.11.0?

$ git describe
v0.12.0rc1-43-g7b2eaa4

Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Senior Research Associate, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik

@cpcloud
Copy link
Member

cpcloud commented Jul 11, 2013

excellent!

@cpcloud
Copy link
Member

cpcloud commented Jul 11, 2013

does that mean bs4==4.2.0 works for you?

@yarikoptic
Copy link
Contributor Author

yikes -- good to know... well at least in current Debian stable we have
4.1.0 -- is that good enough? ;-)

On Thu, 11 Jul 2013, Phillip Cloud wrote:

there's a release of bs4 that really shouldn't have existed, i believe it
is 4.2.0.

Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Senior Research Associate, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik

@cpcloud
Copy link
Member

cpcloud commented Jul 11, 2013

that should be ok, as long as it passes for you. does this also fix #4212? if so please close both, thanks for the report 😄

@cpcloud
Copy link
Member

cpcloud commented Jul 11, 2013

also the other issues you've raised today might be fixed now that your version is correct

@yarikoptic
Copy link
Contributor Author

sorry -- I guess my mental pipelining broke -- what should have fixed my issue? downgrade of bs4 to < 4.2.0?
that commit mentioned 7097368 has been in master for a while so was already included in the version I have built -- or am I wrong?

@cpcloud
Copy link
Member

cpcloud commented Jul 11, 2013

The bs4 downgrade should be the fix.

@yarikoptic
Copy link
Contributor Author

ok -- will give it a try. but as for the reports correction it would be better if pandas internally switched between backends/skipped the tests if broken bs4 is present; since otherwise it becomes impractical for me to build them across all debian/ubuntu releases having the broken bs4

@yarikoptic
Copy link
Contributor Author

btw -- what about bs4 4.2.1 -- should that be "good enough"?

@cpcloud
Copy link
Member

cpcloud commented Jul 11, 2013

Re bs4 skip good idea. I'll add a warning to the code as well since iirc 4.0.2 worked fine on my arch Linux box.

4.2.1 should be fine.

@cpcloud
Copy link
Member

cpcloud commented Jul 11, 2013

I mean 4.2.0 worked fine.

@yarikoptic
Copy link
Contributor Author

... so bs4 4.2.0 is not at fault? (anyways -- rebuilding/testing now with 4.2.1 installed)

@cpcloud
Copy link
Member

cpcloud commented Jul 11, 2013

no bs4 4.2.0 is at fault

@yarikoptic
Copy link
Contributor Author

then what "I mean 4.2.0 worked fine. " is about? ;) or it is at fault only in some deployments?

@cpcloud
Copy link
Member

cpcloud commented Jul 11, 2013

works fine on my arch linux machine, but fails on debian distros, not sure exactly why

@cpcloud
Copy link
Member

cpcloud commented Jul 11, 2013

let me check that though...

@yarikoptic
Copy link
Contributor Author

wow -- with 4.2.1 only #4215 is left failing:

FAIL: test_invalid_colormap (pandas.tests.test_graphics.TestDataFrameGroupByPlots)

Traceback (most recent call last):
File "/home/yoh/deb/gits/pkg-exppsy/build-area/pandas-0.12.0~rc1+git43-g7b2eaa4/debian/tmp/usr/lib/python2.7/dist-packages/pandas/tests/test_graphics.py", line 995, in test_invalid_colormap
self.assertRaises(ValueError, df.plot, colormap='invalid_colormap')
AssertionError: ValueError not raised


Ran 3622 tests in 743.502s

FAILED (SKIP=83, failures=1)

On Thu, 11 Jul 2013, Phillip Cloud wrote:

let me check that though...

Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Senior Research Associate, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik

@cpcloud
Copy link
Member

cpcloud commented Jul 11, 2013

that good news! not sure where that is coming from...i'll see what i can do about it

@jreback
Copy link
Contributor

jreback commented Jul 12, 2013

@cpcloud ok....you are going to add a skip test / warning message ? if 4.2.0 is installed (maybe just raise in read_html...hey user your 4.2.0 is broken, this won't work.....

@cpcloud
Copy link
Member

cpcloud commented Jul 13, 2013

i'm going to actually raise if no tables found and bs4==4.2.0, since there's nothing else to fallback on after that. i've cooked up a fairly informative error message

@cpcloud
Copy link
Member

cpcloud commented Jul 13, 2013

hard to test for this...we don't have an extensive list of OSes that work with bs4

@yarikoptic
Copy link
Contributor Author

with 0.12.0~rc1+git79-g50eff60 and bs4 4.2.0 installed (on sparc)

======================================================================
FAIL: pandas.io.tests.test_html.test_bs4_finds_tables
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/yoh/deb/gits/pkg-exppsy/pandas/debian/tmp/usr/lib/python2.7/dist-packages/pandas/io/tests/test_html.py", line 472, in test_bs4_finds_tables
    assert get_elements_from_url(filepath, 'table')
AssertionError

@yarikoptic
Copy link
Contributor Author

ah -- just now mentioned that it is closed -- let me know if I need to open a new one

@cpcloud
Copy link
Member

cpcloud commented Jul 16, 2013

4.2.0 won't work.

@yarikoptic
Copy link
Contributor Author

then test should be skipped?

On Mon, 15 Jul 2013, Phillip Cloud wrote:

4.2.0 won't work.


Reply to this email directly or [1]view it on GitHub.

References

Visible links

  1. read_html failing in many tests: AssertionError: No tables found #4214 (comment)

Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Senior Research Associate, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik

@cpcloud
Copy link
Member

cpcloud commented Jul 16, 2013

R u sure u have master?

@yarikoptic
Copy link
Contributor Author

as I have mentioned I had 50eff60 which was the master few hours back, now there is some new changes:

*$> git log --pretty=oneline 0.12.0~rc1+git79-g50eff60..origin/master
1e69dade1309b494e43f26f9831326c2ce63f7de Merge pull request #4248 from cpcloud/mpl-1.1.1-build
5be1e5a3f27e43c8b8cae57fed684e62587580fb Merge pull request #4243 from kjordahl/master
5b908a5a1d801f67b78d179b934f860bffe37533 BLD: use mpl 1.1.1 in python 2.7 production travis build
53b7a74fd756cfd2e6fca58326a577558ab97909 Merge pull request #4247 from jreback/series_dups
bbcfd929205ef79c00a403050f107c5e05e0b300 ENH: implement non-unique indexing in series (GH4246)
56e6b173f1c94162bfe5b7dfacb58062011dc96b DOC: Fix typos in CONTRIBUTING.md

@cpcloud
Copy link
Member

cpcloud commented Jul 16, 2013

i'll open a straw man issue for your most recent find -- will be knocked down by the other html test fixes pr...

@yarikoptic thanks for all the reports!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants