Skip to content

Commit

Permalink
Merge pull request #104 from Crinibus/dev
Browse files Browse the repository at this point in the history
Refactor tech-scraper
  • Loading branch information
Crinibus authored Oct 11, 2020
2 parents db1177b + 0b57488 commit 1de66bf
Show file tree
Hide file tree
Showing 11 changed files with 52 additions and 1,718 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ fakta_scraper/.vscode/settings.json
.vscode/settings.json
tech_scraping/__pycache__/scraping.cpython-37.pyc
.vscode/launch.json
tech_scraper/.vscode/settings.json
17 changes: 15 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,20 @@ First make sure you have the modules, run this in the terminal:
<br/>

# Tech scraper <a name="tech-scraper"></a>
The tech scraper can scrape prices on products from Komplett.dk, Proshop.dk, Computersalg.dk, Elgiganten.dk, AvXperten.dk, Av-Cables.dk, Amazon.com, eBay.com, Power.dk, Expert.dk, MM-Vision.dk, Coolshop.dk and Sharkgaming.dk
The tech scraper can scrape prices on products from:
- [Komplett.dk](https://www.komplett.dk/)
- [Proshop.dk](https://www.proshop.dk/)
- [Computersalg.dk](https://www.computersalg.dk/)
- [Elgiganten.dk](https://www.elgiganten.dk/)
- [AvXperten.dk](https://www.avxperten.dk/)
- [Av-Cables.dk](https://www.av-cables.dk/)
- [Amazon.com](https://www.amazon.com/)
- [eBay.com](https://www.ebay.com/)
- [Power.dk](https://www.power.dk/)
- [Expert.dk](https://www.expert.dk/)
- [MM-Vision.dk](https://www.mm-vision.dk/)
- [Coolshop.dk](https://www.coolshop.dk/)
- [Sharkgaming.dk](https://www.sharkgaming.dk/)

## Scrape products <a name="scrape-products"></a>
To scrape prices of products run this in the terminal:
Expand Down Expand Up @@ -46,7 +59,7 @@ This adds the category (if new) and the product to the records.json file, and ad

**OBS**: The category can only be one word, so add a underscore instead of a space if needed.<br/>
**OBS**: The url must have the "https://www." part.<br/>
**OBS**: When using Amazon links, delete the part of the url after the last forward slash (/).<br/>
**OBS**: When using Amazon links, delete everything after and including this "ref=sr".<br/>
For example the link: https://www.amazon.com/NVIDIA-GEFORCE-RTX-2080-Founders/dp/B07HWMDDMK/ref=sr_1_2?dchild=1&qid=1601488833&s=computers-intl-ship&sr=1-2<br/>
Should be: https://www.amazon.com/NVIDIA-GEFORCE-RTX-2080-Founders/dp/B07HWMDDMK/<br/>
**OBS**: When using eBay links, delete everything after and including this "?_trkparms="<br/>
Expand Down
20 changes: 17 additions & 3 deletions tech_scraping/README.md → tech_scraper/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
**The tech scraper can scrape prices on products from Komplett.dk, Proshop.dk, Computersalg.dk, Elgiganten.dk, AvXperten.dk, Av-Cables.dk, Amazon.com, eBay.com, Power.dk, Expert.dk, MM-Vision.dk, Coolshop.dk and Sharkgaming.dk**

# Table of contents
- [First setup](#first-setup)
- [Start from scratch](#start-scratch)
- [Scrape products](#scrape-products)
- [Adding products](#adding-products)
- [Links to scrape from](#links-to-scrape-from)
- [Optional arguments](#optional-arguments)

<br/>
Expand Down Expand Up @@ -40,13 +39,28 @@ This adds the category (if new) and the product to the records.json file, and ad

**OBS**: The category can only be one word, so add a underscore instead of a space if needed.<br/>
**OBS**: The url must have the "https://www." part.<br/>
**OBS**: When using Amazon links, delete the part of the url after the last forward slash (/).<br/>
**OBS**: When using Amazon links, delete everything after and including this "ref=sr".<br/>
For example the link: https://www.amazon.com/NVIDIA-GEFORCE-RTX-2080-Founders/dp/B07HWMDDMK/ref=sr_1_2?dchild=1&qid=1601488833&s=computers-intl-ship&sr=1-2<br/>
Should be: https://www.amazon.com/NVIDIA-GEFORCE-RTX-2080-Founders/dp/B07HWMDDMK/<br/>
**OBS**: When using eBay links, delete everything after and including this "?_trkparms="<br/>
For example the link: https://www.ebay.com/itm/Samsung-Galaxy-Note-20-Ultra-256GB-12GB-RAM-SM-N986B-DS-FACTORY-UNLOCKED-6-9/193625604205?_trkparms=aid%3D111001%26algo%3DREC.SEED%26ao%3D1%26asc%3D225074%26meid%3Dd6c93f1458884e65bcc434e38f6f303c%26pid%3D100970%26rk%3D8%26rkt%3D8%26mehot%3Dpp%26sd%3D402319206529%26itm%3D193625604205%26pmt%3D0%26noa%3D1%26pg%3D2380057%26brand%3DSamsung&_trksid=p2380057.c100970.m5481&_trkparms=pageci%3A6ffa204c-042b-11eb-baa4-3a1cc2bb9aea%7Cparentrq%3Ae60676341740a4d6b1579293fff1b710%7Ciid%3A1<br/>
Should be: https://www.ebay.com/itm/Samsung-Galaxy-Note-20-Ultra-256GB-12GB-RAM-SM-N986B-DS-FACTORY-UNLOCKED-6-9/193625604205

### Links to scrape from <a name="links-to-scrape-from"></a>
The tech scraper can scrape prices on products from:
- [Komplett.dk](https://www.komplett.dk/)
- [Proshop.dk](https://www.proshop.dk/)
- [Computersalg.dk](https://www.computersalg.dk/)
- [Elgiganten.dk](https://www.elgiganten.dk/)
- [AvXperten.dk](https://www.avxperten.dk/)
- [Av-Cables.dk](https://www.av-cables.dk/)
- [Amazon.com](https://www.amazon.com/)
- [eBay.com](https://www.ebay.com/)
- [Power.dk](https://www.power.dk/)
- [Expert.dk](https://www.expert.dk/)
- [MM-Vision.dk](https://www.mm-vision.dk/)
- [Coolshop.dk](https://www.coolshop.dk/)
- [Sharkgaming.dk](https://www.sharkgaming.dk/)

### Optional arguments <a name="optional-arguments"></a>
There is some optional arguments you can use when running add_product.py, these are:
Expand Down
6 changes: 3 additions & 3 deletions tech_scraping/add_product.py → tech_scraper/add_product.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import requests
from bs4 import BeautifulSoup
import json
from scraping import change_name, change_æøå, domains
from scraper import change_name, change_æøå, domains
import argparse


Expand Down Expand Up @@ -331,8 +331,8 @@ def add_to_scraper(category, link, url_domain):
"""Add line to scraping.py, so scraping.py can scrape the new product."""
domain = find_domain(url_domain)

with open('scraping.py', 'a+') as python_file:
python_file.write(f' {domain}(\'{category}\', \'{link}\')\n')
with open('scrape_links.py', 'a+') as python_file:
python_file.write(f'scraper.{domain}(\'{category}\', \'{link}\')\n')
print(f'{category}\n{link}')


Expand Down
File renamed without changes.
1 change: 1 addition & 0 deletions tech_scraper/records.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{}
File renamed without changes.
3 changes: 3 additions & 0 deletions tech_scraper/scrape_links.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
import scraper

# Your links is added here
38 changes: 12 additions & 26 deletions tech_scraping/scraper.py → tech_scraper/scraper.py
Original file line number Diff line number Diff line change
Expand Up @@ -226,17 +226,16 @@ def change_name(name):

def change_æøå(name):
"""Change the letters æ, ø and å to international letters to avoid unicode and return the new name."""
new_name = ''
for letter in name:
if letter in 'æøå':
if letter == 'æ':
letter = 'ae'
elif letter == 'ø':
letter = 'oe'
elif letter == 'å':
letter = 'aa'
new_name += letter
return new_name
replace_letters = {
"æ": "ae",
"ø": "oe",
"å": "aa"
}

for letter in replace_letters:
name = name.replace(letter, replace_letters[letter])

return name


class Komplett(Scraper):
Expand Down Expand Up @@ -287,7 +286,7 @@ def get_info(self):
class Amazon(Scraper):
def get_info(self):
self.name = self.html_soup.find('span', id='productTitle').text.strip().lower()
self.price = self.html_soup.find('span', id='priceblock_ourprice').text.replace('$', '')
self.price = self.html_soup.find('span', id='priceblock_ourprice').text.replace('$', '').split('.')[0].replace(',', '')


class eBay(Scraper):
Expand Down Expand Up @@ -348,17 +347,4 @@ def get_info(self):


if __name__ == '__main__':
Komplett('ssd', 'https://www.komplett.dk/product/1133452/hardware/lagring/harddiskssd/ssd-m2/corsair-force-series-mp600-1tb-m2-ssd#')
Proshop('gpu', 'https://www.proshop.dk/Grafikkort/ASUS-GeForce-RTX-2080-Ti-ROG-STRIX-OC-11GB-GDDR6-RAM-Grafikkort/2679518')
Proshop('ssd', 'https://www.proshop.dk/SSD/Corsair-Force-MP600-NVMe-Gen4-M2-1TB/2779161')
Proshop('gpu', 'https://www.proshop.dk/Grafikkort/ASUS-Radeon-RX-5700-XT-ROG-STRIX-OC-8GB-GDDR6-RAM-Grafikkort/2792486')
Komplett('gpu', 'https://www.komplett.dk/product/1168436/hardware/pc-komponenter/grafikkort/asus-geforce-rtx-3080-rog-strix-oc')
Proshop('gpu', 'https://www.proshop.dk/Grafikkort/ASUS-GeForce-RTX-3080-ROG-STRIX-OC-10GB-GDDR6X-RAM-Grafikkort/2876859')
Komplett('baerbar', 'https://www.komplett.dk/product/1161770/gaming/gaming-pc/baerbar/msi-gl65-leopard-156-fhd-144-hz#')
Komplett('baerbar', 'https://www.komplett.dk/product/1159920/gaming/gaming-pc/baerbar/asus-rog-strix-g-g712lw-173-fhd-144-hz#')
Komplett('baerbar', 'https://www.komplett.dk/product/1155433/gaming/gaming-pc/baerbar/asus-rog-zephyrus-g15-ga502iv-156-fhd-240-hz')
Komplett('baerbar', 'https://www.komplett.dk/product/1157681/gaming/gaming-pc/baerbar/acer-nitro-5-an515-55-156-fhd-144-hz')
Komplett('baerbar', 'https://www.komplett.dk/product/1159645/gaming/gaming-pc/baerbar/lenovo-legion-5i-156-fhd-120-hz')
Elgiganten('baerbar', 'https://www.elgiganten.dk/product/gaming/gaming-laptop/157466/asus-tuf-gaming-a15-fx506-15-6-gaming-computer-sort')
Komplett('baerbar', 'https://www.komplett.dk/product/1159926/gaming/gaming-pc/baerbar/asus-tuf-gaming-a15-fx506iv-156-fhd-144-hz#')
Proshop('baerbar', 'https://www.proshop.dk/Baerbar/ASUS-TUF-Gaming-A17-FX706IU-H7209T/2877154')
print('If you want to scrape your products, then run "scrape_links.py" instead of this file')
File renamed without changes.
Loading

0 comments on commit 1de66bf

Please sign in to comment.