- A headline finder from the G1 site, basically takes as data the headline, link, date and respective image.
- The software utilizes SQLAlchemy for database interaction and FastAPI for web framework. It scrapes news data from G1 website, stores it in a CSV file, and then checks if each news already exists in the database before sending the new data to the database.
- Clone this repository:
$ git clone https://github.com/slocksert/g1_WebScrapper.git
-
Activate Poetry env and install dependencies
-
To start the MySQL database using a docker-compose file:
- Create a .env file with these variables:
- MYSQL_HOST
- MYSQL_ROOT_PASSWORD
- MYSQL_DATABASE
- MYSQL_PORT
- Create a .env file with these variables:
-
Start the MySQL compose:
$ docker compose up
- Start the WebScraper:
$ python3 app/main.py