Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

octopart integration #314

Closed
xesscorp opened this issue Sep 3, 2018 · 54 comments
Closed

octopart integration #314

xesscorp opened this issue Sep 3, 2018 · 54 comments
Labels
feature-request New features resquested.

Comments

@xesscorp
Copy link
Collaborator

xesscorp commented Sep 3, 2018

I've created a new branch called "octopart". It gets part information using only the Octopart API (no web scraping). It works for the half-dozen XML input files I've tried it on. It currently has these problems:

  1. Price breaks for Digi-Key parts always seem to be for reeled quantities (3000, 6000, etc). The cut-tape, small-quantity pricing is not returned. I'm not sure if there is an option in the API for enabling this.
  2. RS Components always seems to return an empty price list.
  3. It only runs under Python 2.7 because it uses urllib. (This should be easy to fix.)

All the changes I made are in the kicost.py file. This may not be the best place for it, but it makes the changes obvious.

I also noticed that there is a large delay (several seconds) in kicost before the part data is fetched from Octopart (which takes less than a second). This is caused by another parallel scrape that seems to be checking for the distributor websites? If so, I guess we could remove that since Octopart is being used. (I can't say for sure: this is new code I've never seen before.)

@romain145
Copy link

Python 3 uses 2 different modules for urllib parse and request:

diff --git a/kicost/kicost.py b/kicost/kicost.py
index df6e24c..0f99c90 100644
--- a/kicost/kicost.py
+++ b/kicost/kicost.py
@@ -362,14 +362,15 @@ def kicost(in_file, eda_tool_name, out_filename,
         logger.log(DEBUG_OVERVIEW, '# Getting part data from Octopart...')

         import json
-        import urllib
+        import urllib.parse
+        import urllib.request

         dist_xlate = {'Digi-Key':'digikey', 'Mouser':'mouser', 'Newark':'newark', 'Farnell':'farnell', 'RS Components':'rs', 'TME':'tme'}

         def get_part_info(query, parts):
-            url = 'http://octopart.com/api/v3/parts/match?queries=%s' % urllib.quote(json.dumps(query))
+            url = 'http://octopart.com/api/v3/parts/match?queries=%s' % urllib.parse.quote(json.dumps(query))
             url += '&apikey=96df69ba'
-            results = json.loads(urllib.urlopen(url).read())['results']
+            results = json.loads(urllib.request.urlopen(url).read())['results']
             for result in results:
                 i = int(result['reference'])
                 for item in result['items']:

Hope that helps.

@xesscorp
Copy link
Collaborator Author

xesscorp commented Sep 3, 2018

Thanks for the tip, @romain145 . I ended up replacing the urllib stuff with the requests library. Python 2/3 compatibility has been restored.

I also removed the parallel scan of the distributor websites. It doesn't appear to have affected the operation of kicost. Timing on a small number of examples shows a speedup of around 50x over the web-scraping version.

@romain145
Copy link

I confirm. One of my projects with 50+ references takes 3.8s! that is a massive improvement.

  1. Digi-Key price break seems to work ok for non-reeled items, cf this fuse: 0215004.HXP.
  2. RS price list return pricing but in the wrong currency, cf this connector: 61900311121 reports pricing as being in USD instead of GBP (maybe a config problem on my side?)
  3. Runs using Python 3.6.5.

Attached XML BOM.
rboard.xml.txt

Is the API key supposed to be the same for all KiCost users?

@xesscorp
Copy link
Collaborator Author

xesscorp commented Sep 3, 2018

Regarding the API key: it's unique to an application. So I have registered KiCost with Octopart and that's the assigned key. Individual users do not need their own key. (This is something I did not realize before.)

@romain145
Copy link

For some reason, the distributors parts number have a prefix, such as "2401_855-", or "2401_81-" that is probably appended by Octopart (The links are trackers that point to Octopart as well). Not sure what the prefix means though...
So the BOM copy/paste import on the distributor's website doesn't work.

@xesscorp
Copy link
Collaborator Author

xesscorp commented Sep 3, 2018

I just looked at my spreadsheets and they also have the prepended number. It looks like it's a unique code for each distributor. It may be safe to strip off the initial digits and the underscore to get the actual part number.

@anderwm
Copy link

anderwm commented Sep 4, 2018

  1. This is pretty awesome
  2. Although it worked, I got a

Traceback (most recent call last):
File "/home/andersonm/kicad5_pro/ve_kicost_oct/lib/python3.6/site-packages/tqdm-4.25.0-py3.6.egg/tqdm/_tqdm.py", line 885, in del
self.close()
File "/home/andersonm/kicad5_pro/ve_kicost_oct/lib/python3.6/site-packages/tqdm-4.25.0-py3.6.egg/tqdm/_tqdm.py", line 1090, in close
self._decr_instances(self)
File "/home/andersonm/kicad5_pro/ve_kicost_oct/lib/python3.6/site-packages/tqdm-4.25.0-py3.6.egg/tqdm/_tqdm.py", line 454, in _decr_instances
cls.monitor.exit()
File "/home/andersonm/kicad5_pro/ve_kicost_oct/lib/python3.6/site-packages/tqdm-4.25.0-py3.6.egg/tqdm/_monitor.py", line 52, in exit
self.join()
File "/usr/lib/python3.6/threading.py", line 1053, in join
raise RuntimeError("cannot join current thread")
RuntimeError: cannot join current thread

  1. I believe the special distributor part number is how Octopart is making money (and also why the API is free to use). They must get some royalty whenever somebody buys a part with their tracking part numbers. It's probably in our best interest to use their numbers if we can in the cart building process so it will remain free.
  2. You are right, the Digikey quantities/packaging are screwed up (Always giving me reel when available)

@romain145
Copy link

@anderwm I've had the same error recently, see #313 (comment)

  1. Then the BOM should probably be filled on Octopart BOM tool. By clicking on "Create a New BOM", then there is a "paste" link on the left where to paste the data from the spreadsheet but even then it seems confused by the prefix and does not recognise the MPN.

@anderwm
Copy link

anderwm commented Sep 4, 2018

  1. It looks like that special number is just used by Octopart to build a link to Digikey/whoever with them as the referrer, so that is how they are making money. To get at the real digikey part number there is a field called "sku" under each "offer". It may be good enough to just include Octopart's link in the spreadsheet the way you are doing, but have the anchor be the actual sku. The way Kicost works for me I never visit the links that come out of it. I just use the comma separated text to start a cart/BOM at all the distributors I need. I'm not sure if there is a way for Octopart to make money on that or not...or even if we should care.
  2. It appears that each packaging option comes in a separate "offer" in the response from Octopart. This could be nice as I have times I would like to buy all cut tape/reel and other times when I don't care. In any event, there are multiple offers from Digikey and if you don't take the "cut tape" one or "tube" one you might not get the quantities you would expect.

@xesscorp
Copy link
Collaborator Author

xesscorp commented Sep 4, 2018

Thanks, @anderwm. Using the sku field solves the problem with ordering from the distributors.

Thanks also for pointing out that the Octopart response includes multiple offers. By collecting the price breaks for all the offers from Digi-Key, I was able to build a complete price table.

@anderwm
Copy link

anderwm commented Sep 4, 2018

No problem. Still need some way to pick which package type to use as the sku though(as Cat# in datasheet). If not you wind up with a bunch of reels in your cart that you have to change over. We could use the amount being purchased, or default to cut tape, or pass in something from the command line I guess.

@xesscorp
Copy link
Collaborator Author

xesscorp commented Sep 4, 2018

I'll try using the SKU from the offer with the smallest quantities in the list. If you order enough parts to consume a reel, I think they might automatically apply reel pricing.

@hildogjr
Copy link
Owner

hildogjr commented Sep 4, 2018

In Digikey, at least, you are advised to that (and order the next price break) when checkout the cart.

@xesscorp
Copy link
Collaborator Author

xesscorp commented Sep 4, 2018

Let's try it that way, then.

@anderwm
Copy link

anderwm commented Sep 4, 2018

So possibly it is a Me problem then. But most of the Digikey part numbers are coming up as reel for me in the Cat#. For instance, CL21C090CBANNNC shows in the spreadshed as Digikey Cat# 1276-2558-2-ND and the -2 means reel.
I assumed that it was just because that happened to be the first or last offer that you came across from Digikey. I guess you have to keep track of which sku corresponds to the lowest minimum quantity, but that isn't the part number showing up for me.

Edit: I read your earlier comment as that is what you were doing currently. If what I am describing is not implemented yet disregard this comment.

@xesscorp
Copy link
Collaborator Author

xesscorp commented Sep 4, 2018

Right, I haven't fixed the SKU problem.

@anderwm
Copy link

anderwm commented Sep 4, 2018

sorry...got it

@xesscorp
Copy link
Collaborator Author

xesscorp commented Sep 4, 2018

KiCost now selects the SKU, available quantity, and web page from the cut-tape version. I'm distinguishing the cut-tape part by looking for the offer with the smallest quantity difference between the first two entries in its pricing table. This seems to work after my extensive testing on a single example. Each part offer also has a "packaging" field that could be used if that gives more accurate results.

@anderwm
Copy link

anderwm commented Sep 5, 2018

This worked quite well for me and the couple of BOMs I tested, and is crazy fast.

Does this version make it trivial to add distributors to show in the spreadsheet? As in, if I want to add Arrow is it just a matter of adding it to the distributor_dict somewhere?

@hildogjr
Copy link
Owner

hildogjr commented Sep 5, 2018

We could set some road map:

  1. Finish the Octopart main tests;
  2. Pass the code into distributors/octopart.py;
    a. Re-mafacture the the distributor_dict dict() if necessary;
    b. Fix the spreadsheet.py if some bug found;
    c. So add more distributors here (add Arrow, ... Other distributors #116) using the Octopart API (should be easy at this point, like just some definition at the octopart class() initialization);
    d. Add some currency conversion using the CurrencyConverter(), already present but not with updated rates, or other web service; Specify spreadsheet currency #65
  3. Merge with the main branch (we could set KiCost v1.0 here?);
    a. Rename the scrape file routines to *_webscrap (e.g. digikey_webscrape.py, mouse_webscrape.py);
    b. Use the web scrape just in complete failure of APIs (or at least keep for history and future decisions);
    c. May be possible to create the others APIs (e.g. digikey.py);
    d. Use this new APIs as preference over Octopart (e.g.: in the case of creation of digikey API).

I like the folder structure distributors and eda_tools but each sub folder inside of distributor/* I think is too much to keep just 1 file inside, but could be the most pythonized way.

Or even the could be used octopart_api.py so, digikey_webscrape.py and digikey_api.py should be easy to deal at distributors/__init__.py.

Some opinions about?

@anderwm
Copy link

anderwm commented Sep 5, 2018

Sounds good...I stole the Newark code and added Arrow just to see if it really was that easy. It seems to be:
https://github.com/anderwm/KiCost/tree/octopart

@hildogjr
Copy link
Owner

hildogjr commented Sep 5, 2018

@anderwm, are your arrow/arrow.py scrape module functional or just created to update the distributor_dict?

@anderwm
Copy link

anderwm commented Sep 5, 2018

just to update the distributor_dict...it is all just the Newark code with find/replace on newark->arrow

@hildogjr
Copy link
Owner

hildogjr commented Sep 5, 2018

Nice, so will be easy to expand Octopart.
If @xesscorp agree, we could direct the effort to this road map (because it is ugly to replicate unuseful code just to update the dict().
And @mmmaisel could give us some tips in the class organization / hierarchic.

@xesscorp
Copy link
Collaborator Author

xesscorp commented Sep 5, 2018

I'll try to move the octopart code to distributors/octopart.py tonight.

Do we just want to add Arrow to the list of distributors? Anyone else?

I don't have any experience with the currency conversion code, so I'll leave that to someone else.

I don't know about using the web scraping stuff in case the Octopart (or other) APIs fail. The web scraping routines go bad over time as the distributors change the structure of their web pages. If the APIs ever fail, then the web scrapers will have probably aged to the point where they'll fail as soon as we try them. It seems like a waste of effort to keep the web scrapers up-to-date and to also build the fail-over code. In the event the APIs ever fail (or their access is restricted), I would rather just go back and re-enable the web scrapers and make whatever changes are necessary at that time. (In addition, I think we've all agreed that we're never going to get the Mouser web scraper working again.)

As for making interfaces to individual distributor APIs like Digi-Key, I don't see the point unless somebody really wants to do that. It's just more code and messing with API keys without giving us any new features or better part data than Octopart already provides (I think).

@hildogjr
Copy link
Owner

hildogjr commented Sep 5, 2018

So OK @xesscorp, after you make the code migration to octopart.py and changes at distributor_dict (even if necessary to change the fileds, because now it is not necessary some of than). I will code something about #65.
Please, modify kicost_gui.py if necessary (it read some of the dict() above).

@mmmaisel
Copy link
Contributor

mmmaisel commented Sep 6, 2018

@hildogjr I would create a new distributor module for the octopart API stuff in distributors/octopart.py.
The existing class structure / virtual methods from distributor.py should do fine for this. It may only need a new method for selecting the distributors which shall be retrieved.

@xesscorp
Copy link
Collaborator Author

xesscorp commented Sep 7, 2018

I pushed a new version to the octopart branch. The new code is encapsulated in the octopart.py file in the distributors directory. I also added Arrow as a distributor and stripped as much from the code as possible while still allowing it to run. Turns out we need almost nothing in those distributor files. I didn't remove any code from the original set of distributors.

Eventually, it might be best to create a PartsList object that handles part grouping and information procurement. Then the query_octopart function could be rolled into that as a method.

@hildogjr
Copy link
Owner

hildogjr commented Sep 7, 2018

Agreed @mmmaisel.
If you have some time to have a look in... (I really like the way that you organize the classes, easy to control and maintenance).
@xesscorp, so, could we remove the other distributors from octopart branch? (just to create and test a functional version. At this point we keep them at the main branch).

@hildogjr
Copy link
Owner

hildogjr commented Sep 7, 2018

Some propose for new organization of the dict() struct, the old one:

distributor_dict.update({
        'arrow': {
            'octopart_name': 'Arrow Electronics, Inc.',
            'module': 'arrow',  # The directory name containing this file.
            'scrape': 'web',     # Allowable values: 'web' or 'local'.
            'label': 'Arrow',  # Distributor label used in spreadsheet columns.
            'order_cols': ['part_num', 'purch', 'refs'],  # Sort-order for online orders.
            'order_delimiter': ',',  # Delimiter for online orders.
            # Formatting for distributor header in worksheet.
            'wrk_hdr_format': {
                'font_size': 14,
                'font_color': 'white',
                'bold': True,
                'align': 'center',
                'valign': 'vcenter',
                'bg_color': '#000000'  # Arrow black.
            },
        }
})

Could have the initialization at octopart.py, the arrow folder have not a topological function at KiCost. So:

distributor_dict.update({
        'arrow': {
            'module': {'name': 'octopart', 'octopart_name': 'Arrow Electronics, Inc.', 'type': 'api'} ,     # Allowable values: 'scrape', 'api' or 'local'. (older 'web' make no sense)
            'order' : {'cols': ['part_num', 'purch', 'refs'], 'order_delimiter': ','}, # Sort-order for online orders & delimiter.
            # Formatting for distributor header in worksheet.
            'wrk_hdr_format': {
                'label': 'Arrow',  # Distributor label used in spreadsheet columns.
                'font_size': 14, 'font_color': 'white', 'bold': True,
                'align': 'center', 'valign': 'vcenter', 'bg_color': '#000000'  # Arrow black.
            },
        }
})

In my experience with the total code of KiCost, this will provide a better integration and separation between the informations.

@hildogjr
Copy link
Owner

hildogjr commented Sep 18, 2018

@mmmaisel, do you think is necessary / good to keep the fake_browser for the API requests? I am just planing how to implement #65 and #315 using, the already present on the KiCost installation, Currency_Converter package, but I will need some download_file method at the web routines.
I am thinking the better point to create this at the code.

@xesscorp, could we release this this new KiCost before the next steps?

@mmmaisel
Copy link
Contributor

@hildogjr I think fake_browser is obsolete now as there is no need to fake some browser (including state/cookie tracking) for APIs.
However, I think a free standing function which setup some things (like a user-agent) prior to the request call may be useful.

If you remove fake_browser, I think all functions except dist_init_distributor_dict from "local.py", the scrape_part and get_part_html_tree functions from "distributor.py" and the dynamic module loader from __init__.py can be removed as well.

@hildogjr
Copy link
Owner

hildogjr commented Sep 20, 2018

More fix and clean up did.

  1. Fix some issues on local distributor template (is marked as #TODO because the string configuration should be shared with the eda_tools package);
  2. Implemented the Specify spreadsheet currency #65 and Convert currency of local distributor #315 (currency conversion). Needs some improvement (that I have in plan) but I miss the definitions and distributor class standardization and change to present the correct currency symbol on the spreadsheet.

hildogjr added a commit that referenced this issue Sep 20, 2018
hildogjr added a commit that referenced this issue Sep 20, 2018
hildogjr added a commit that referenced this issue Sep 20, 2018
hildogjr added a commit that referenced this issue Sep 20, 2018
hildogjr added a commit that referenced this issue Sep 21, 2018
hildogjr added a commit that referenced this issue Sep 23, 2018
@hildogjr
Copy link
Owner

Fix the presentation of #65 and #315 in the spreadsheet using the babel package.

@hildogjr
Copy link
Owner

hildogjr commented Sep 25, 2018

I think this is full functional. Do you all agree? (ready to release with some more documentation check?)
See https://forum.kicad.info/t/kicost-not-finding-components-still-working-in-september-2018/12741/2
almost all scrape module are present some problem.

I am moving forward and create
https://github.com/xesscorp/KiCost/upload/octopart_class
branch with next steps of evolution:

  1. Distributors new class format;
  2. Eda also became a class format;
  3. Each distributor / eda will be a file and not a folder/submodule.
  4. Better installation, keeping the AUTHOR.rst file on it (used by the GUI) and Create desktop icon when install GUI #151.

hildogjr added a commit that referenced this issue Sep 25, 2018
@xesscorp
Copy link
Collaborator Author

I assume the octopart branch is ready to release. The only real change to the documentation that I can think of would be to remove these options that no longer apply:

-s
-np
--rt
--throttling_delay

Then move the current master branch to something like "web_scraping" and make the octopart branch into the master. Then release it on PyPi.

@hildogjr
Copy link
Owner

@xesscorp, I already removed this from documentation, but could you do the double check?

Something more: since march/2018 PyPI changed the documentation file style interpreter. I had to reconfigure the style to *.MD instead *.RST (our RST doesn't pass on the compliance), what created a bad look in the page https://pypi.org/project/kicost/

@xesscorp
Copy link
Collaborator Author

I looked at the documentation. Seems OK except the "-np" option is still listed in the "Command-Line Options" section. Also, the "Parallel Web Scraping" section needs to be removed.

In regards to the README.RST and HISTORY.RST files, you can easily convert those into markdown files using pandoc:

pandoc readme.rst -o readme.md
pandoc history.rst -o history.md

The problem is you still need to keep the RST files because the documentation uses those to generate the manual in the docs directory using Sphinx unless you translate the doc files into markdown as well.

hildogjr added a commit that referenced this issue Sep 27, 2018
@hildogjr
Copy link
Owner

hildogjr commented Sep 27, 2018

Did the remove in the docs. Also change some 'scrape' words to 'scrape/query'.

Could we use panddoc on the manual/site files and migrate them definitely?

@hildogjr
Copy link
Owner

We need release the new version octopart (now already old, octopart_class will take as development branch, see #320).
The numbers of users complaining problem about scrape problem are growing, at least 4 last week contacted me.

@xesscorp, could you fix th RST files?

@xesscorp
Copy link
Collaborator Author

xesscorp commented Oct 1, 2018

I added readme.md and history.md to the repository. The readme.rst and history.rst files are still there. You'll need to regenerate the markdown versions whenever you modify the RST files.

@hildogjr
Copy link
Owner

hildogjr commented Oct 3, 2018

Released as 1.0.0
There are issues already see by me, some fix in octopar_class (that will not release because may new check of others here).
Starting merging the branch and organizing the repo.

@hildogjr hildogjr closed this as completed Oct 3, 2018
@hildogjr
Copy link
Owner

hildogjr commented Oct 7, 2018

Re-opened because KiCost need to deal and warning with "HTTP Status Codes" of https://octopart.com/api/docs/v3/overview

@hildogjr
Copy link
Owner

#331 fix will have to contemplate this enhancement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New features resquested.
Projects
None yet
Development

No branches or pull requests

5 participants