Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: u'href' when scraping prices #90

Closed
rheasman opened this issue Oct 11, 2017 · 4 comments
Closed

KeyError: u'href' when scraping prices #90

rheasman opened this issue Oct 11, 2017 · 4 comments
Assignees
Labels
bug Bugs that impacts on main KiCost functionality.

Comments

@rheasman
Copy link

OS:

Ubuntu 17.04

Steps I went through to install kicost:

pip install -I kicost

Version installed:

KiCost 0.1.39

Error generated:

Traceback (most recent call last):
File "/usr/local/bin/kicost", line 11, in
sys.exit(main())
File "/usr/local/lib/python2.7/dist-packages/kicost/main.py", line 177, in main
scrape_retries=args.retries)
File "/usr/local/lib/python2.7/dist-packages/kicost/kicost.py", line 161, in kicost
id, url, part_num, price_tiers, qty_avail = result.get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
KeyError: u'href'

File I ran it on (using "kicost -i filename"):

LVBoard.xml.gz

File is gzipped because github wouldn't let me upload an xml

@hildogjr
Copy link
Owner

hildogjr commented Oct 13, 2017

Here I used kicost -i LVBoard.xml -w -s (-s because my Internet is not so good, this serialized the web scrape, the kicost standard is 30 parallels scrapes).
In my case the error was in the tme.py (I am linking with #76 ), to turn around this error (not look in the tme web page distributor) just use the --exclude dist tme or -e dist tme at the end of kicost call command.
Let us now if worked.

@adamheinrich
Copy link
Contributor

adamheinrich commented Oct 13, 2017

Hi, it looks like the bug is in the TME code I have written, sorry for this. According to the log when being run with -s and -d flags, the problem is with the C0603C104M5RACTU part:

tme ['C37', 'C40']
Found product table for C0603C104M5RACTU from tme
Traceback (most recent call last):
  File "/usr/local/bin/kicost", line 11, in <module>
    load_entry_point('kicost==0.1.39', 'console_scripts', 'kicost')()
  File "/usr/local/lib/python2.7/dist-packages/kicost/__main__.py", line 177, in main
    scrape_retries=args.retries)
  File "/usr/local/lib/python2.7/dist-packages/kicost/kicost.py", line 130, in kicost
    id, url, part_num, price_tiers, qty_avail = scrape_part(args)
  File "/usr/local/lib/python2.7/dist-packages/kicost/kicost.py", line 1219, in scrape_part
    html_tree, url[d] = get_part_html_tree(part, d, dist_module.get_part_html_tree, local_part_html, scrape_retries, scrape_logger)
  File "/usr/local/lib/python2.7/dist-packages/kicost/kicost.py", line 1176, in get_part_html_tree
    return get_html_tree_func(dist, part.fields[key], extra_search_terms, local_part_html=local_part_html, scrape_retries=scrape_retries)
  File "/usr/local/lib/python2.7/dist-packages/kicost/distributors/tme/tme.py", line 216, in get_part_html_tree
    if (not l['href'].startswith('./katalog')) and l.text == match:
  File "/usr/lib/python2.7/dist-packages/bs4/element.py", line 997, in __getitem__
    return self.attrs[key]
KeyError: u'href'

TME's HTML tree for this part contains a link without the href parameter specified (see the attached image). And my access to l['href'] causes the script to crash. I'll try to fix it.

tme_a_nohref

@adamheinrich
Copy link
Contributor

adamheinrich commented Oct 13, 2017

It looks like I should have used something like l.get('href', '') instead of l['href']. It would be probably better to use it in other places as well -- working on a PR which does that.

@hildogjr
Copy link
Owner

hildogjr commented Oct 13, 2017

@adamheinrich , you are right.
Changing the line 216 to l.get('href', '').startswith('./katalog')) I have no more errors.

xesscorp added a commit that referenced this issue Nov 28, 2017
Prevent KeyError when accessing HTML tree (Issue #90)
@hildogjr hildogjr added the bug Bugs that impacts on main KiCost functionality. label Dec 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs that impacts on main KiCost functionality.
Projects
None yet
Development

No branches or pull requests

4 participants