v1-CLI

History

Name		Name	Last commit message	Last commit date
parent directory ..
scraper		scraper
.gitignore		.gitignore
README.md		README.md
main-v1.py		main-v1.py
main.py		main.py
read_db.py		read_db.py
requirements.txt		requirements.txt

README.md

1cbyc Web Scraper

1cbyc Web Scraper is a Python-based tool designed to collect data from websites. It uses the requests and BeautifulSoup libraries to retrieve and parse web pages, and stores the extracted log in an SQLite db.

Features

can scrape from multiple web pages
can handle pagination
can store scraped data in an SQLite db (adding support for more soon)
can mimic a web browser by setting custom headers
(not to be a brag but) i added a way to classify the scraped data as individual texts
(since i have lazy pals) i added a way to read the sqlite file without stressing about sqlite on your machine
adding more features soon

For Installation

just clone my repository.

git clone https://github.com/1cbyc/1cbyc-web-scraper.git
cd 1cbyc-web-scraper

then download the required packages.

pip install -r requirements.txt

This is how to use it

simply update the base_url and num_pages variables in main.py to match the target website and the number of pages you want to scrape.

base_url = 'http://nsisong.com/page/'  # you can replace with the actual base URL
num_pages = 5  # adjust the number of pages to scrape based on the target website

then, run the scraper.

python main.py

make sure to check the console output to see the progress and results of the scraping.
also, to view the scraped data, you can use the provided function in database.py to print all data:

from scraper.database import print_all_data
print_all_data()

i gave a shorter way to read the data by adding a read_db.py file to this project but i think i should not be an advocate for shortcuts. so, just do this:

get a db browser for SQLite atleast

like just download and install the db browser for SQLite:

go to the DB Browser for SQLite website.
then, download and install the version suitable for your pc.

so, open the SQLite db File:

open the DB browser.
click "Open Database" and go to the data dir.
then, select the desired .db file you wanna check and click "open".

go through the data:

you can use the "browse data" tab to view the contents of the data table.
you can also run SQL queries using the "execute SQL" tab.

All in all, you now know you can open the scraped_data.db file using an SQLite browser to inspect the data and not use my shitty method.

As for Contributing

to be honest, i want you guys to fork this repository, make improvements, and submit pull requests. probably we'd get a v2.1 release faster. suggest new features too, and i promise to work on it (if it makes sense).

PLEASE USE IT FOR EDUCATIONAL PURPOSES O!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

v1-CLI

v1-CLI

README.md

1cbyc Web Scraper

Features

For Installation

This is how to use it

i gave a shorter way to read the data by adding a read_db.py file to this project but i think i should not be an advocate for shortcuts. so, just do this:

get a db browser for SQLite atleast

so, open the SQLite db File:

go through the data:

All in all, you now know you can open the scraped_data.db file using an SQLite browser to inspect the data and not use my shitty method.

As for Contributing

PLEASE USE IT FOR EDUCATIONAL PURPOSES O!

Files

v1-CLI

Directory actions

More options

Directory actions

More options

Latest commit

History

v1-CLI

Folders and files

parent directory

README.md

1cbyc Web Scraper

Features

For Installation

This is how to use it

i gave a shorter way to read the data by adding a read_db.py file to this project but i think i should not be an advocate for shortcuts. so, just do this:

get a db browser for SQLite atleast

so, open the SQLite db File:

go through the data:

All in all, you now know you can open the scraped_data.db file using an SQLite browser to inspect the data and not use my shitty method.

As for Contributing

PLEASE USE IT FOR EDUCATIONAL PURPOSES O!