octopart integration #314

xesscorp · 2018-09-03T00:34:08Z

I've created a new branch called "octopart". It gets part information using only the Octopart API (no web scraping). It works for the half-dozen XML input files I've tried it on. It currently has these problems:

Price breaks for Digi-Key parts always seem to be for reeled quantities (3000, 6000, etc). The cut-tape, small-quantity pricing is not returned. I'm not sure if there is an option in the API for enabling this.
RS Components always seems to return an empty price list.
It only runs under Python 2.7 because it uses urllib. (This should be easy to fix.)

All the changes I made are in the kicost.py file. This may not be the best place for it, but it makes the changes obvious.

I also noticed that there is a large delay (several seconds) in kicost before the part data is fetched from Octopart (which takes less than a second). This is caused by another parallel scrape that seems to be checking for the distributor websites? If so, I guess we could remove that since Octopart is being used. (I can't say for sure: this is new code I've never seen before.)

romain145 · 2018-09-03T14:45:53Z

Python 3 uses 2 different modules for urllib parse and request:

diff --git a/kicost/kicost.py b/kicost/kicost.py
index df6e24c..0f99c90 100644
--- a/kicost/kicost.py
+++ b/kicost/kicost.py
@@ -362,14 +362,15 @@ def kicost(in_file, eda_tool_name, out_filename,
         logger.log(DEBUG_OVERVIEW, '# Getting part data from Octopart...')

         import json
-        import urllib
+        import urllib.parse
+        import urllib.request

         dist_xlate = {'Digi-Key':'digikey', 'Mouser':'mouser', 'Newark':'newark', 'Farnell':'farnell', 'RS Components':'rs', 'TME':'tme'}

         def get_part_info(query, parts):
-            url = 'http://octopart.com/api/v3/parts/match?queries=%s' % urllib.quote(json.dumps(query))
+            url = 'http://octopart.com/api/v3/parts/match?queries=%s' % urllib.parse.quote(json.dumps(query))
             url += '&apikey=96df69ba'
-            results = json.loads(urllib.urlopen(url).read())['results']
+            results = json.loads(urllib.request.urlopen(url).read())['results']
             for result in results:
                 i = int(result['reference'])
                 for item in result['items']:

Hope that helps.

xesscorp · 2018-09-03T17:24:05Z

Thanks for the tip, @romain145 . I ended up replacing the urllib stuff with the requests library. Python 2/3 compatibility has been restored.

I also removed the parallel scan of the distributor websites. It doesn't appear to have affected the operation of kicost. Timing on a small number of examples shows a speedup of around 50x over the web-scraping version.

romain145 · 2018-09-03T18:22:06Z

I confirm. One of my projects with 50+ references takes 3.8s! that is a massive improvement.

Digi-Key price break seems to work ok for non-reeled items, cf this fuse: 0215004.HXP.
RS price list return pricing but in the wrong currency, cf this connector: 61900311121 reports pricing as being in USD instead of GBP (maybe a config problem on my side?)
Runs using Python 3.6.5.

Attached XML BOM.
rboard.xml.txt

Is the API key supposed to be the same for all KiCost users?

xesscorp · 2018-09-03T20:51:39Z

Regarding the API key: it's unique to an application. So I have registered KiCost with Octopart and that's the assigned key. Individual users do not need their own key. (This is something I did not realize before.)

romain145 · 2018-09-03T21:40:17Z

For some reason, the distributors parts number have a prefix, such as "2401_855-", or "2401_81-" that is probably appended by Octopart (The links are trackers that point to Octopart as well). Not sure what the prefix means though...
So the BOM copy/paste import on the distributor's website doesn't work.

xesscorp · 2018-09-03T21:49:35Z

I just looked at my spreadsheets and they also have the prepended number. It looks like it's a unique code for each distributor. It may be safe to strip off the initial digits and the underscore to get the actual part number.

anderwm · 2018-09-04T15:45:15Z

This is pretty awesome
Although it worked, I got a

Traceback (most recent call last):
File "/home/andersonm/kicad5_pro/ve_kicost_oct/lib/python3.6/site-packages/tqdm-4.25.0-py3.6.egg/tqdm/_tqdm.py", line 885, in del
self.close()
File "/home/andersonm/kicad5_pro/ve_kicost_oct/lib/python3.6/site-packages/tqdm-4.25.0-py3.6.egg/tqdm/_tqdm.py", line 1090, in close
self._decr_instances(self)
File "/home/andersonm/kicad5_pro/ve_kicost_oct/lib/python3.6/site-packages/tqdm-4.25.0-py3.6.egg/tqdm/_tqdm.py", line 454, in _decr_instances
cls.monitor.exit()
File "/home/andersonm/kicad5_pro/ve_kicost_oct/lib/python3.6/site-packages/tqdm-4.25.0-py3.6.egg/tqdm/_monitor.py", line 52, in exit
self.join()
File "/usr/lib/python3.6/threading.py", line 1053, in join
raise RuntimeError("cannot join current thread")
RuntimeError: cannot join current thread

I believe the special distributor part number is how Octopart is making money (and also why the API is free to use). They must get some royalty whenever somebody buys a part with their tracking part numbers. It's probably in our best interest to use their numbers if we can in the cart building process so it will remain free.
You are right, the Digikey quantities/packaging are screwed up (Always giving me reel when available)

romain145 · 2018-09-04T16:15:26Z

@anderwm I've had the same error recently, see #313 (comment)

Then the BOM should probably be filled on Octopart BOM tool. By clicking on "Create a New BOM", then there is a "paste" link on the left where to paste the data from the spreadsheet but even then it seems confused by the prefix and does not recognise the MPN.

anderwm · 2018-09-04T17:10:30Z

It looks like that special number is just used by Octopart to build a link to Digikey/whoever with them as the referrer, so that is how they are making money. To get at the real digikey part number there is a field called "sku" under each "offer". It may be good enough to just include Octopart's link in the spreadsheet the way you are doing, but have the anchor be the actual sku. The way Kicost works for me I never visit the links that come out of it. I just use the comma separated text to start a cart/BOM at all the distributors I need. I'm not sure if there is a way for Octopart to make money on that or not...or even if we should care.
It appears that each packaging option comes in a separate "offer" in the response from Octopart. This could be nice as I have times I would like to buy all cut tape/reel and other times when I don't care. In any event, there are multiple offers from Digikey and if you don't take the "cut tape" one or "tube" one you might not get the quantities you would expect.

xesscorp · 2018-09-04T18:05:17Z

Thanks, @anderwm. Using the sku field solves the problem with ordering from the distributors.

Thanks also for pointing out that the Octopart response includes multiple offers. By collecting the price breaks for all the offers from Digi-Key, I was able to build a complete price table.

anderwm · 2018-09-04T18:28:51Z

No problem. Still need some way to pick which package type to use as the sku though(as Cat# in datasheet). If not you wind up with a bunch of reels in your cart that you have to change over. We could use the amount being purchased, or default to cut tape, or pass in something from the command line I guess.

xesscorp · 2018-09-04T19:10:46Z

I'll try using the SKU from the offer with the smallest quantities in the list. If you order enough parts to consume a reel, I think they might automatically apply reel pricing.

hildogjr · 2018-09-04T19:30:53Z

In Digikey, at least, you are advised to that (and order the next price break) when checkout the cart.

xesscorp · 2018-09-04T19:40:55Z

Let's try it that way, then.

anderwm · 2018-09-04T20:35:20Z

So possibly it is a Me problem then. But most of the Digikey part numbers are coming up as reel for me in the Cat#. For instance, CL21C090CBANNNC shows in the spreadshed as Digikey Cat# 1276-2558-2-ND and the -2 means reel.
I assumed that it was just because that happened to be the first or last offer that you came across from Digikey. I guess you have to keep track of which sku corresponds to the lowest minimum quantity, but that isn't the part number showing up for me.

Edit: I read your earlier comment as that is what you were doing currently. If what I am describing is not implemented yet disregard this comment.

xesscorp · 2018-09-04T20:36:26Z

Right, I haven't fixed the SKU problem.

anderwm · 2018-09-04T20:37:36Z

sorry...got it

xesscorp · 2018-09-04T23:16:07Z

KiCost now selects the SKU, available quantity, and web page from the cut-tape version. I'm distinguishing the cut-tape part by looking for the offer with the smallest quantity difference between the first two entries in its pricing table. This seems to work after my extensive testing on a single example. Each part offer also has a "packaging" field that could be used if that gives more accurate results.

anderwm · 2018-09-05T16:22:22Z

This worked quite well for me and the couple of BOMs I tested, and is crazy fast.

Does this version make it trivial to add distributors to show in the spreadsheet? As in, if I want to add Arrow is it just a matter of adding it to the distributor_dict somewhere?

hildogjr · 2018-09-05T16:41:50Z

We could set some road map:

Finish the Octopart main tests;
Pass the code into distributors/octopart.py;
a. Re-mafacture the the distributor_dict dict() if necessary;
b. Fix the spreadsheet.py if some bug found;
c. So add more distributors here (add Arrow, ... Other distributors #116) using the Octopart API (should be easy at this point, like just some definition at the octopart class() initialization);
d. Add some currency conversion using the CurrencyConverter(), already present but not with updated rates, or other web service; Specify spreadsheet currency #65
Merge with the main branch (we could set KiCost v1.0 here?);
a. Rename the scrape file routines to *_webscrap (e.g. digikey_webscrape.py, mouse_webscrape.py);
b. Use the web scrape just in complete failure of APIs (or at least keep for history and future decisions);
c. May be possible to create the others APIs (e.g. digikey.py);
d. Use this new APIs as preference over Octopart (e.g.: in the case of creation of digikey API).

I like the folder structure distributors and eda_tools but each sub folder inside of distributor/* I think is too much to keep just 1 file inside, but could be the most pythonized way.

Or even the could be used octopart_api.py so, digikey_webscrape.py and digikey_api.py should be easy to deal at distributors/__init__.py.

Some opinions about?

anderwm · 2018-09-05T17:13:34Z

Sounds good...I stole the Newark code and added Arrow just to see if it really was that easy. It seems to be:
https://github.com/anderwm/KiCost/tree/octopart

hildogjr · 2018-09-05T17:18:01Z

@anderwm, are your arrow/arrow.py scrape module functional or just created to update the distributor_dict?

anderwm · 2018-09-05T17:18:51Z

just to update the distributor_dict...it is all just the Newark code with find/replace on newark->arrow

hildogjr · 2018-09-05T17:22:06Z

Nice, so will be easy to expand Octopart.
If @xesscorp agree, we could direct the effort to this road map (because it is ugly to replicate unuseful code just to update the dict().
And @mmmaisel could give us some tips in the class organization / hierarchic.

xesscorp · 2018-09-05T18:19:10Z

I'll try to move the octopart code to distributors/octopart.py tonight.

Do we just want to add Arrow to the list of distributors? Anyone else?

I don't have any experience with the currency conversion code, so I'll leave that to someone else.

I don't know about using the web scraping stuff in case the Octopart (or other) APIs fail. The web scraping routines go bad over time as the distributors change the structure of their web pages. If the APIs ever fail, then the web scrapers will have probably aged to the point where they'll fail as soon as we try them. It seems like a waste of effort to keep the web scrapers up-to-date and to also build the fail-over code. In the event the APIs ever fail (or their access is restricted), I would rather just go back and re-enable the web scrapers and make whatever changes are necessary at that time. (In addition, I think we've all agreed that we're never going to get the Mouser web scraper working again.)

As for making interfaces to individual distributor APIs like Digi-Key, I don't see the point unless somebody really wants to do that. It's just more code and messing with API keys without giving us any new features or better part data than Octopart already provides (I think).

hildogjr · 2018-09-05T22:18:52Z

So OK @xesscorp, after you make the code migration to octopart.py and changes at distributor_dict (even if necessary to change the fileds, because now it is not necessary some of than). I will code something about #65.
Please, modify kicost_gui.py if necessary (it read some of the dict() above).

mmmaisel · 2018-09-06T15:09:25Z

@hildogjr I would create a new distributor module for the octopart API stuff in distributors/octopart.py.
The existing class structure / virtual methods from distributor.py should do fine for this. It may only need a new method for selecting the distributors which shall be retrieved.

xesscorp · 2018-09-07T02:45:54Z

I pushed a new version to the octopart branch. The new code is encapsulated in the octopart.py file in the distributors directory. I also added Arrow as a distributor and stripped as much from the code as possible while still allowing it to run. Turns out we need almost nothing in those distributor files. I didn't remove any code from the original set of distributors.

Eventually, it might be best to create a PartsList object that handles part grouping and information procurement. Then the query_octopart function could be rolled into that as a method.

hildogjr · 2018-09-07T13:58:06Z

Agreed @mmmaisel.
If you have some time to have a look in... (I really like the way that you organize the classes, easy to control and maintenance).
@xesscorp, so, could we remove the other distributors from octopart branch? (just to create and test a functional version. At this point we keep them at the main branch).

hildogjr · 2018-09-07T16:33:35Z

Some propose for new organization of the dict() struct, the old one:

distributor_dict.update({
        'arrow': {
            'octopart_name': 'Arrow Electronics, Inc.',
            'module': 'arrow',  # The directory name containing this file.
            'scrape': 'web',     # Allowable values: 'web' or 'local'.
            'label': 'Arrow',  # Distributor label used in spreadsheet columns.
            'order_cols': ['part_num', 'purch', 'refs'],  # Sort-order for online orders.
            'order_delimiter': ',',  # Delimiter for online orders.
            # Formatting for distributor header in worksheet.
            'wrk_hdr_format': {
                'font_size': 14,
                'font_color': 'white',
                'bold': True,
                'align': 'center',
                'valign': 'vcenter',
                'bg_color': '#000000'  # Arrow black.
            },
        }
})

Could have the initialization at octopart.py, the arrow folder have not a topological function at KiCost. So:

distributor_dict.update({
        'arrow': {
            'module': {'name': 'octopart', 'octopart_name': 'Arrow Electronics, Inc.', 'type': 'api'} ,     # Allowable values: 'scrape', 'api' or 'local'. (older 'web' make no sense)
            'order' : {'cols': ['part_num', 'purch', 'refs'], 'order_delimiter': ','}, # Sort-order for online orders & delimiter.
            # Formatting for distributor header in worksheet.
            'wrk_hdr_format': {
                'label': 'Arrow',  # Distributor label used in spreadsheet columns.
                'font_size': 14, 'font_color': 'white', 'bold': True,
                'align': 'center', 'valign': 'vcenter', 'bg_color': '#000000'  # Arrow black.
            },
        }
})

In my experience with the total code of KiCost, this will provide a better integration and separation between the informations.

hildogjr · 2018-09-18T23:26:27Z

@mmmaisel, do you think is necessary / good to keep the fake_browser for the API requests? I am just planing how to implement #65 and #315 using, the already present on the KiCost installation, Currency_Converter package, but I will need some download_file method at the web routines.
I am thinking the better point to create this at the code.

@xesscorp, could we release this this new KiCost before the next steps?

mmmaisel · 2018-09-19T16:51:20Z

@hildogjr I think fake_browser is obsolete now as there is no need to fake some browser (including state/cookie tracking) for APIs.
However, I think a free standing function which setup some things (like a user-agent) prior to the request call may be useful.

If you remove fake_browser, I think all functions except dist_init_distributor_dict from "local.py", the scrape_part and get_part_html_tree functions from "distributor.py" and the dynamic module loader from __init__.py can be removed as well.

hildogjr · 2018-09-20T00:44:20Z

More fix and clean up did.

Fix some issues on local distributor template (is marked as #TODO because the string configuration should be shared with the eda_tools package);
Implemented the Specify spreadsheet currency #65 and Convert currency of local distributor #315 (currency conversion). Needs some improvement (that I have in plan) but I miss the definitions and distributor class standardization and change to present the correct currency symbol on the spreadsheet.

hildogjr · 2018-09-24T00:21:37Z

Fix the presentation of #65 and #315 in the spreadsheet using the babel package.

hildogjr · 2018-09-25T01:00:39Z

I think this is full functional. Do you all agree? (ready to release with some more documentation check?)
See https://forum.kicad.info/t/kicost-not-finding-components-still-working-in-september-2018/12741/2
almost all scrape module are present some problem.

I am moving forward and create
https://github.com/xesscorp/KiCost/upload/octopart_class
branch with next steps of evolution:

Distributors new class format;
Eda also became a class format;
Each distributor / eda will be a file and not a folder/submodule.
Better installation, keeping the AUTHOR.rst file on it (used by the GUI) and Create desktop icon when install GUI #151.

xesscorp · 2018-09-26T12:55:59Z

I assume the octopart branch is ready to release. The only real change to the documentation that I can think of would be to remove these options that no longer apply:

-s
-np
--rt
--throttling_delay

Then move the current master branch to something like "web_scraping" and make the octopart branch into the master. Then release it on PyPi.

hildogjr · 2018-09-26T13:12:38Z

@xesscorp, I already removed this from documentation, but could you do the double check?

Something more: since march/2018 PyPI changed the documentation file style interpreter. I had to reconfigure the style to *.MD instead *.RST (our RST doesn't pass on the compliance), what created a bad look in the page https://pypi.org/project/kicost/

xesscorp · 2018-09-26T19:19:26Z

I looked at the documentation. Seems OK except the "-np" option is still listed in the "Command-Line Options" section. Also, the "Parallel Web Scraping" section needs to be removed.

In regards to the README.RST and HISTORY.RST files, you can easily convert those into markdown files using pandoc:

pandoc readme.rst -o readme.md
pandoc history.rst -o history.md

The problem is you still need to keep the RST files because the documentation uses those to generate the manual in the docs directory using Sphinx unless you translate the doc files into markdown as well.

hildogjr · 2018-09-27T01:39:00Z

Did the remove in the docs. Also change some 'scrape' words to 'scrape/query'.

Could we use panddoc on the manual/site files and migrate them definitely?

hildogjr · 2018-09-30T13:39:06Z

We need release the new version octopart (now already old, octopart_class will take as development branch, see #320).
The numbers of users complaining problem about scrape problem are growing, at least 4 last week contacted me.

@xesscorp, could you fix th RST files?

xesscorp · 2018-10-01T01:45:20Z

I added readme.md and history.md to the repository. The readme.rst and history.rst files are still there. You'll need to regenerate the markdown versions whenever you modify the RST files.

hildogjr · 2018-10-03T15:42:31Z

Released as 1.0.0
There are issues already see by me, some fix in octopar_class (that will not release because may new check of others here).
Starting merging the branch and organizing the repo.

hildogjr · 2018-10-07T23:33:56Z

Re-opened because KiCost need to deal and warning with "HTTP Status Codes" of https://octopart.com/api/docs/v3/overview

hildogjr · 2018-10-23T11:49:36Z

#331 fix will have to contemplate this enhancement.

This was referenced Sep 15, 2018

Version 0.1.45 (not released) fails with SyntaxError on Py2.7 #267

Closed

Components with manf# and digikey# may create empty groups (crashing spreadsheet creation) #304

Closed

hildogjr added a commit that referenced this issue Sep 15, 2018

Simplefied dict structure #314

4248a71

hildogjr added a commit that referenced this issue Sep 20, 2018

Clean up on #314

485a3e7

hildogjr added a commit that referenced this issue Sep 20, 2018

Clean up on #314

19a3500

hildogjr added a commit that referenced this issue Sep 20, 2018

Clean up on #314

dce0542

hildogjr added a commit that referenced this issue Sep 20, 2018

#314

beade6b

hildogjr added a commit that referenced this issue Sep 21, 2018

Fix local template copy error #314

2fe3610

hildogjr added a commit that referenced this issue Sep 23, 2018

Spreadsheet currency presentation #314

68539a0

hildogjr added a commit that referenced this issue Sep 25, 2018

#314 next class formats

28c2cae

hildogjr added a commit that referenced this issue Sep 27, 2018

#314 doc

79cea93

hildogjr mentioned this issue Sep 28, 2018

Octopart (and EDAs) class format #320

Closed

hildogjr closed this as completed Oct 3, 2018

hildogjr reopened this Oct 7, 2018

hildogjr mentioned this issue Oct 21, 2018

Was octpart apikey invalidated? #331

Closed

hildogjr closed this as completed Oct 23, 2018

octopart integration #314

octopart integration #314

Comments

xesscorp commented Sep 3, 2018

romain145 commented Sep 3, 2018

xesscorp commented Sep 3, 2018

romain145 commented Sep 3, 2018

xesscorp commented Sep 3, 2018

romain145 commented Sep 3, 2018

xesscorp commented Sep 3, 2018

anderwm commented Sep 4, 2018 • edited Loading

romain145 commented Sep 4, 2018

anderwm commented Sep 4, 2018

xesscorp commented Sep 4, 2018

anderwm commented Sep 4, 2018

xesscorp commented Sep 4, 2018

hildogjr commented Sep 4, 2018

xesscorp commented Sep 4, 2018

anderwm commented Sep 4, 2018 • edited Loading

xesscorp commented Sep 4, 2018

anderwm commented Sep 4, 2018

xesscorp commented Sep 4, 2018

anderwm commented Sep 5, 2018

hildogjr commented Sep 5, 2018 • edited Loading

anderwm commented Sep 5, 2018

hildogjr commented Sep 5, 2018

anderwm commented Sep 5, 2018

hildogjr commented Sep 5, 2018 • edited Loading

xesscorp commented Sep 5, 2018

hildogjr commented Sep 5, 2018 • edited Loading

mmmaisel commented Sep 6, 2018

xesscorp commented Sep 7, 2018

hildogjr commented Sep 7, 2018

hildogjr commented Sep 7, 2018

hildogjr commented Sep 18, 2018 • edited Loading

mmmaisel commented Sep 19, 2018

hildogjr commented Sep 20, 2018 • edited Loading

hildogjr commented Sep 24, 2018

hildogjr commented Sep 25, 2018 • edited Loading

xesscorp commented Sep 26, 2018

hildogjr commented Sep 26, 2018

xesscorp commented Sep 26, 2018

hildogjr commented Sep 27, 2018 • edited Loading

hildogjr commented Sep 30, 2018

xesscorp commented Oct 1, 2018

hildogjr commented Oct 3, 2018

hildogjr commented Oct 7, 2018

hildogjr commented Oct 23, 2018

anderwm commented Sep 4, 2018 •

edited

Loading

anderwm commented Sep 4, 2018 •

edited

Loading

hildogjr commented Sep 5, 2018 •

edited

Loading

hildogjr commented Sep 5, 2018 •

edited

Loading

hildogjr commented Sep 5, 2018 •

edited

Loading

hildogjr commented Sep 18, 2018 •

edited

Loading

hildogjr commented Sep 20, 2018 •

edited

Loading

hildogjr commented Sep 25, 2018 •

edited

Loading

hildogjr commented Sep 27, 2018 •

edited

Loading