Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completely refactor code #129

Merged
merged 92 commits into from
Apr 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
92 commits
Select commit Hold shift + click to select a range
77bc4bd
Update .gitignore
Crinibus Mar 27, 2021
28be1c7
Delete fakta_scraper
Crinibus Mar 27, 2021
cf1a4d5
Rename folder "tech_scraper" to "scraper" and update README
Crinibus Mar 27, 2021
1db2e3e
Delete README inside folder "scraper"
Crinibus Mar 27, 2021
4565310
Delete requirements.txt in folder "scraper"
Crinibus Mar 27, 2021
0f3474c
Update .gitignore
Crinibus Mar 27, 2021
42c84d3
Beginning of refactor code
Crinibus Mar 29, 2021
be81ec5
Rename scraper/const.py to scraper/constants.py
Crinibus Mar 30, 2021
5e8fb6a
Update README.md
Crinibus Mar 30, 2021
1f19227
Update arguments.py
Crinibus Mar 30, 2021
573fe42
Update domains.py
Crinibus Mar 30, 2021
d595e02
Update domains.py
Crinibus Mar 30, 2021
f94ee59
Update README.md
Crinibus Mar 30, 2021
9a47b8d
Delete logfile.log
Crinibus Mar 30, 2021
4a55118
Update filemanager.py
Crinibus Mar 30, 2021
1af7055
Update scrape.py
Crinibus Mar 30, 2021
7a69182
Update __init__.py
Crinibus Mar 30, 2021
51ef1b3
Update main.py
Crinibus Mar 30, 2021
04c96a9
Update arguments.py
Crinibus Mar 30, 2021
01f91cf
Create scraper/logging.ini and import logging.config in main.py
Crinibus Mar 30, 2021
454d8d9
Change how logging is in scraper/scrape.py
Crinibus Mar 30, 2021
7604566
Update scrape.py
Crinibus Mar 31, 2021
205063f
Add two arguments: --reset and --hard-reset
Crinibus Mar 31, 2021
94f2941
Add a new argument: --add and update function validate_arguments
Crinibus Mar 31, 2021
6a5d41d
Update add_product.py
Crinibus Mar 31, 2021
88ac22e
Update __init__.py
Crinibus Mar 31, 2021
d7d4a0c
Change imports, update "main" function and add function "reset"
Crinibus Mar 31, 2021
e24864f
delete 'from scraper import Scraper'
Crinibus Mar 31, 2021
d024884
Make method Scraper.request_url static
Crinibus Mar 31, 2021
4fef5e3
Delete class Logger
Crinibus Mar 31, 2021
0ed3df2
Fix method Format.get_user_product_name
Crinibus Mar 31, 2021
8fe06a3
Save data with added product before calling new_product.save_info method
Crinibus Mar 31, 2021
6d0c029
Add 'from .filemanager import Filemanager' to scraper/__init__.py
Crinibus Mar 31, 2021
c5beb56
Update function 'reset' and create function 'hard_reset'
Crinibus Mar 31, 2021
06eb05d
Add comment
Crinibus Mar 31, 2021
b9e67a1
Fix issue where logging happened twice
Crinibus Mar 31, 2021
4aadd7f
Fix issue when adding new product with existing category
Crinibus Mar 31, 2021
8ad48b4
Create functions "add_product_to_records" and "add_product_to_csv"
Crinibus Mar 31, 2021
4122eab
Add pandas to requirements.txt
Crinibus Apr 1, 2021
4c840fb
Update filemanager.py
Crinibus Apr 1, 2021
7729d59
Update add_product.py
Crinibus Apr 1, 2021
79a5ac7
Update logging.ini
Crinibus Apr 1, 2021
45c1ec2
Update main.py
Crinibus Apr 2, 2021
46c441b
Move dataclass "Info" from scraper/domains.py to scraper/format.py
Crinibus Apr 2, 2021
de6e18e
Update domains.py
Crinibus Apr 2, 2021
599505e
Path to scraper/logfile.log is now absolute
Crinibus Apr 2, 2021
49636b0
Update filemanager.py
Crinibus Apr 2, 2021
97292bd
Update main.py
Crinibus Apr 2, 2021
f067b34
Black formatter on scraper/add_product.py
Crinibus Apr 2, 2021
2d45a87
Update add_product.py
Crinibus Apr 2, 2021
ceba12a
Setup simple logging for adding product and class Scraper
Crinibus Apr 2, 2021
88ba7ce
Update logging.ini
Crinibus Apr 2, 2021
10e17e9
Update domains.py
Crinibus Apr 2, 2021
5e3717c
Update scrape.py
Crinibus Apr 2, 2021
1ca6fda
Update scrape.py
Crinibus Apr 2, 2021
de71121
Update scrape.py
Crinibus Apr 2, 2021
aca3bfd
Update scrape.py
Crinibus Apr 2, 2021
11de4ab
Update format.py
Crinibus Apr 2, 2021
c03cae6
Update main.py
Crinibus Apr 3, 2021
c8669c5
Update main.py
Crinibus Apr 3, 2021
b45f884
Update main.py
Crinibus Apr 3, 2021
b96c391
Delete domain arguments as not needed anymore
Crinibus Apr 3, 2021
fed7ec2
Update main.py
Crinibus Apr 3, 2021
c5d9a42
Update main.py
Crinibus Apr 3, 2021
6400151
Add argument "--threads"
Crinibus Apr 5, 2021
11ce9be
Add function "scrape_with_threads"
Crinibus Apr 5, 2021
0a9c9a8
Update add_product.py
Crinibus Apr 5, 2021
ff25f0d
Update add_product.py
Crinibus Apr 5, 2021
fc3fdb3
Update domains.py
Crinibus Apr 6, 2021
2b472a2
Update domains.py
Crinibus Apr 6, 2021
63ef52c
Update format.py
Crinibus Apr 6, 2021
c61b633
Update scrape.py
Crinibus Apr 6, 2021
130e01c
Update scrape.py
Crinibus Apr 6, 2021
4b65fc3
Update scrape.py
Crinibus Apr 6, 2021
26a91ce
Update main.py
Crinibus Apr 6, 2021
4b7d908
Add 4 new arguments
Crinibus Apr 7, 2021
ba2effb
Update constants.py
Crinibus Apr 7, 2021
8d7b3bf
Update visualize.py
Crinibus Apr 7, 2021
c134901
Update __init__.py
Crinibus Apr 7, 2021
decaa2c
Update main.py
Crinibus Apr 7, 2021
7705ed0
Format with Black
Crinibus Apr 7, 2021
d121e9b
Format with Black
Crinibus Apr 8, 2021
88c1cfa
Update filemanager.py
Crinibus Apr 8, 2021
3cb41ac
Update format.py
Crinibus Apr 8, 2021
5a8b7bc
Delete import of logging module
Crinibus Apr 8, 2021
991cd6d
Update domains.py
Crinibus Apr 8, 2021
d2e6939
Add metavar to arguments "--visualize-category" and "--visualize-id"
Crinibus Apr 8, 2021
d70b336
Add field "currency" in "info" about a product in records.json
Crinibus Apr 8, 2021
59d1cae
Set level of logging to INFO
Crinibus Apr 8, 2021
6409dff
Delete data in products.csv and records.json
Crinibus Apr 8, 2021
172858a
Update README.md
Crinibus Apr 8, 2021
110c560
Add __author__ to scraper/__init__.py
Crinibus Apr 8, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 2 additions & 12 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,12 +1,2 @@

komplett_scraping/komplett_scraping.exe
tech_scraping/.vscode/settings.json
fakta_scraper/geckodriver.log
fakta_scraper/.vscode/settings.json


.vscode/settings.json
tech_scraping/__pycache__/scraping.cpython-37.pyc
.vscode/launch.json
tech_scraper/.vscode/settings.json
tech_scraper/__pycache__/*
.vscode/
__pycache__/
213 changes: 89 additions & 124 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,33 @@
# Table of contents
- [Intro](#intro)
- [Contributing](#contributing)
- [First setup](#first-setup)
- [Tech scraper](#tech-scraper)
- [Scrape products](#scrape-products)
- [Start from scratch](#start-scratch)
- [Adding products](#adding-products)
- [Optional arguments](#optional-arguments)
- [User settings](#user-settings)
- [Visualize data](#visualize-data)
- [Command examples](#command-examples)
- [Available flags](#available-flags)
- [Fakta scraper](#fakta-scraper)
- [Scrape discounts](#scrape-discounts)
- [Start from scratch](#start-scratch)
- [Scrape products](#scrape-products)
- [Adding products](#adding-products)
- [Links to scrape from](#links-to-scrape-from)
- [Optional arguments](#optional-arguments)
- [User settings](#user-settings)
- [Visualize data](#visualize-data)
- [Command examples](#command-examples)
- [Available flags](#available-flags)

<br/>


## Intro <a name="intro"></a>
With this program you can easily scrape and track prices on product at multiple websites. <br/>
This program can also visualize price over time of the products being tracked. That can be helpful if you want to buy a product in the future and wants to know if a discount might be around the corner.

<br/>


## Contributing <a name="contributing"></a>
Feel free to fork the project and create a pull request with new features or refactoring of the code. Also feel free to make issues with problems or suggestions to new features.

<br/>


## First setup <a name="first-setup"></a>
Clone this repository and move into the repository:
```
Expand All @@ -36,179 +44,136 @@ pip install -r requirements.txt

<br/>

# Tech scraper <a name="tech-scraper"></a>
The tech scraper can scrape prices on products from:
- [Komplett.dk](https://www.komplett.dk/)
- [Proshop.dk](https://www.proshop.dk/)
- [Computersalg.dk](https://www.computersalg.dk/)
- [Elgiganten.dk](https://www.elgiganten.dk/)
- [AvXperten.dk](https://www.avxperten.dk/)
- [Av-Cables.dk](https://www.av-cables.dk/)
- [Amazon.com](https://www.amazon.com/)
- [eBay.com](https://www.ebay.com/)
- [Power.dk](https://www.power.dk/)
- [Expert.dk](https://www.expert.dk/)
- [MM-Vision.dk](https://www.mm-vision.dk/)
- [Coolshop.dk](https://www.coolshop.dk/)
- [Sharkgaming.dk](https://www.sharkgaming.dk/)

## Start from scratch <a name="start-scratch"></a>
If you want to start from scratch with no data in the records.json file, then just run the following command:
```
python3 main.py --hard-reset
```

Then just add products like described [here](#add-products).

<br/>

If you just want to reset your data for each product, just delete all data-points inside each product, then run this command:
```
python3 main.py --reset
```
This deletes all the data inside each product, such as id, url and and dates with prices.

<br/>


## Scrape products <a name="scrape-products"></a>
To scrape prices of products run this in the terminal:
```
python3 scrape_links.py
python3 main.py -s
```

## Start from scratch <a name="start-scratch"></a>
If you want to start from scratch with no data in the records.json file, then just delete all the content in records.json apart from two curly brackets:
To scrape with threads run the same command but with the ```--threads``` argument:
```
{}
python3 main.py -s --threads
```
Then delete the lines under the last if-statement in scraper.py.

Then just add products like described [here](#add-products).
<br/>


## Add products <a name="add-products"></a>
Before scraping a new product, run a similar line to this:
```
python3 add_product.py <category> <url>
python3 main.py -a -c <category> -u <url>
```

e.g.
```
python3 add_product.py gpu https://www.komplett.dk/product/1135037/hardware/pc-komponenter/grafikkort/msi-geforce-rtx-2080-super-gaming-x-trio
python3 main.py -a -c vr -u https://www.komplett.dk/product/1168594/gaming/spiludstyr/vr/vr-briller/oculus-quest-2-vr-briller
```

This adds the category (if new) and the product to the records.json file, and adds a line at the end of the scraper.py file so the script can scrape price of the new product.
This adds the category (if new) and the product to the records.json file, and adds a line at the end of the products.csv file so the script can scrape price of the new product.

**OBS**: The category can only be one word, so add a underscore instead of a space if needed.<br/>
**OBS**: The url must have the "https://www." part.<br/>
**OBS**: When using Amazon links, delete everything after and including this "ref=sr".<br/>
For example the link: https://www.amazon.com/NVIDIA-GEFORCE-RTX-2080-Founders/dp/B07HWMDDMK/ref=sr_1_2?dchild=1&qid=1601488833&s=computers-intl-ship&sr=1-2<br/>
Should be: https://www.amazon.com/NVIDIA-GEFORCE-RTX-2080-Founders/dp/B07HWMDDMK/<br/>
**OBS**: When using eBay links, delete everything after and including this "?_trkparms="<br/>
For example the link: https://www.ebay.com/itm/Samsung-Galaxy-Note-20-Ultra-256GB-12GB-RAM-SM-N986B-DS-FACTORY-UNLOCKED-6-9/193625604205?_trkparms=aid%3D111001%26algo%3DREC.SEED%26ao%3D1%26asc%3D225074%26meid%3Dd6c93f1458884e65bcc434e38f6f303c%26pid%3D100970%26rk%3D8%26rkt%3D8%26mehot%3Dpp%26sd%3D402319206529%26itm%3D193625604205%26pmt%3D0%26noa%3D1%26pg%3D2380057%26brand%3DSamsung&_trksid=p2380057.c100970.m5481&_trkparms=pageci%3A6ffa204c-042b-11eb-baa4-3a1cc2bb9aea%7Cparentrq%3Ae60676341740a4d6b1579293fff1b710%7Ciid%3A1<br/>
Should be: https://www.ebay.com/itm/Samsung-Galaxy-Note-20-Ultra-256GB-12GB-RAM-SM-N986B-DS-FACTORY-UNLOCKED-6-9/193625604205



### Optional arguments <a name="optional-arguments"></a>
There is some optional arguments you can use when running add_product.py, these are:

- --komplett

- --proshop

- --computersalg
**OBS**: The url must have the "https://" part.<br/>
**OBS**: If an error occures when adding a product, then the error might happen because the url has a "&" in it, when this happens then just put quotation mark around the url. This should solve the problem. If this doesn't solve the problem then summit a issue.<br/>

- --elgiganten

- --avxperten
<br/>

- --avcables

- --amazon
### Links to scrape from <a name="links-to-scrape-from"></a>
This scraper can (so far) scrape prices on products from:
- [Komplett.dk](https://www.komplett.dk/)
- [Proshop.dk](https://www.proshop.dk/)
- [Computersalg.dk](https://www.computersalg.dk/)
- [Elgiganten.dk](https://www.elgiganten.dk/)
- [AvXperten.dk](https://www.avxperten.dk/)
- [Av-Cables.dk](https://www.av-cables.dk/)
- [Amazon.com](https://www.amazon.com/)
- [eBay.com](https://www.ebay.com/)
- [Power.dk](https://www.power.dk/)
- [Expert.dk](https://www.expert.dk/)
- [MM-Vision.dk](https://www.mm-vision.dk/)
- [Coolshop.dk](https://www.coolshop.dk/)
- [Sharkgaming.dk](https://www.sharkgaming.dk/)

- --ebay
<br/>

- --power

- --expert
## User settings <a name="user-settings"></a>
User settings can be added and changed in the file settings.ini.

- --mmvision
Right now there is only one category of user settings, which is "ChangeName". Under this category you can change how the script changes product names, so similar products will be placed in the same product in records.json file.

- --coolshop
When adding a new setting under the category "ChangeName" in settings.ini, there must be a line with ```key<n>``` and a line with ```value<n>```, where ```<n>``` is the "link" between keywords and valuewords. E.g. ```value3``` is the value to ```key3```.

- --sharkgaming
In ```key<n>``` you set the keywords (seperated by a comma) that the product name must have for to be changed to what ```value<n>``` is equal to. Example if the user settings is the following:

When using one or more of "domain" arguments, only the chosen domains gets added to records.json under the product name.
```
[ChangeName]
key1 = asus,3080,rog,strix,oc
value1 = asus geforce rtx 3080 rog strix oc
```

The script checks if a product name has all of the words in ```key1```, it gets changed to what ```value1``` is.

## User settings <a name="user-settings"></a>
See the [README in tech_scraper](./tech_scraper/README.md#user-settings)
<br/>


## Visualize data <a name="visualize-data"></a>
To visualize your data run the "visualize_data.py" script with some arguments.

See all available flags [here](#available-flags).
To visualize your data, just run main.py with the ```-v``` or ```--visualize``` argument and then specify which products you want to be visualized. These are your options for how you want to visualize your products:

By using the ```--all``` flag you will get graphs for all products in records.json.
- ```-va``` or ```--visualize-all``` to visualize all your products
- ```-vc [<category> [<category> ...]]``` or ```--visualize-category [<category> [<category> ...]]``` to visualize all products in one or more categories
- ```-id [<id> [<id> ...]]``` or ```--visualize-id [<id> [<id> ...]]``` to visualize one or more products with the specified id(s)

By using the ```--partnum``` or ```-p``` and specify a partnumber you will only get the graph for the specified partnumber. You can specify multiple partnumbers just by adding multiple ```--partnum``` or ```-p``` flags.

By using the ```--category``` or ```-c``` you will get graphs for all the product in the specified category. You can specify multiple categories just by adding multiple ```--category``` or ```-c``` flags.


### Command examples <a name="command-examples"></a>
**Show graphs for all products**

To show graphs for all products, run the following command:
```
python3 visualize_data.py --all
python3 main.py -v -va
```

**Show graph(s) for specific products**

To show a graph for only one product, run the following command where ```<partnumber>``` is the partnumber of the product you want a graph for:
To show a graph for only one product, run the following command where ```<id>``` is the id of the product you want a graph for:
```
python3 visualize_data.py --partnum <partnumber>
python3 main.py -v -id <id>
```

For multiple products, just add another flag, like so:
For multiple products, just add another id, like so:
```
python3 visualize_data.py --partnum <partnumber> --partnum <partnumber>
python3 main.py -v -id <id> <id>
```

You can also just use the short flag name, like so:
```
python3 visualize_data.py -p <partnumber>
```

**Show graphs for products in one or more categories**

To show graphs for all products in one category, run the following command where ```<category>``` is the category you want graph from:
```
python3 visualize_data.py --category <category>
python3 main.py -v -vc <category>
```

For multiple categories, just add another flag, like so:
```
python3 visualize_data.py --category <category> --category <category>
```

You can also just use the short flag name, like so:
```
python3 visualize_data.py -c <category>
```


### Available flags <a name="available-flags"></a>
When running visualize_data.py you must use atleast one flag, the available flags are:

- --all

- --partnum <partnum>

- -p <partnum>

- --category <category>

- -c <category>


<br/>

# Fakta scraper <a name="fakta-scraper"></a>
The Fakta scraper can scrape discounts from this week discounts. <br/>
**OBS: Fakta scraper can not run in Linux as it uses the Firefox webdriver which is a .exe file.**

## Scrape discounts <a name="scrape-discounts"></a>
For now you can only search for keywords and get the discounts that match the keywords.
To scrape for discounts about for example Kellogg products, you only have to add the keyword "Kellogg" as a argument when running the fakta_scraper.py script:
```
python3 fakta_scraper.py kellogg
```
You can search for multiple keyword by just adding them as arguments, as such:
```
python fakta_scraper.py <keyword_1> <keyword_2> <keyword_3>
python3 main.py -v -vc <category> <category>
```
The discounts is printed in the terminal.
Loading