An Python bot for doing ETL(extract, transform, load) personal favorite lists from gamer.com.tw/acgbox.
- Setup all on My Arch Linux VM
- Python version ==3.8 and all dependencies will be managed via pyenv+pipenv
- PostgreSQL 15
Migration from MySQL to PostgreSQL- Find Last Updated Date of a Web Page
- TritonHo/RDBMS course
pandas to_sql method if_exists='append' implementation function update method for only update new ACG collects?- implementation function finding method for find last pages?
- research or implementation function for html tags div to table
- check which files will be stored via podman when not executing MySQL container
- implementation function CRUD API/query method for MySQL
- implementation function load_data
- implementation function modfy_data with advanced string replace in pandas.DataFrame
- refactor some parts codes to class acgbox_crawler(object)
- podman-compose up with docker-selenium
- WARN[0011] aardvark-dns binary not found, container dns will not be enabled
#setup on Arch Linux
#update package databases
sudo pacman -Syy
#install podman
sudo pacman -S podman
#podman-docker
sudo pacman -S podman-docker
#podman-compose
sudo pacman -S podman-compose
#fuse-overlayfs
sudo pacman -S fuse-overlayfs
#podman: /usr/lib/libc.so.6: version `GLIBC_2.38' not found (required by podman)
#upgrading packages
sudo pacman -Syu
#check podman
podman --version
#create a MySQL container with podman-compose
cd db_settingup/
#check out the db_settingup.md
#After Setting UP with Usage with your python projects
#Spawns a shell within the virtualenv.
pipenv shell
#if no packages installed
pipenv install
#add some Packages
pipenv install diagrams
pipenv install "psycopg[binary,pool]"
pipenv install requests
pipenv install beautifulsoup4
pipenv install pandas
pipenv install lxml
pipenv install SQLAlchemy
pipenv install PyYAML
pipenv install pymysql
pipenv install fake-useragent
pipenv install user_agent
pipenv install tornado
#Generate a requirements.txt from Pipfile.lock. to requirements.txt
pipenv requirements > requirements.txt
#Becareful your execute PATH! XD
#Test
pipenv shell
cd src
python main.py
#time a simple command or give resource usage
time python main.py
# real 2m55.699s
# user 0m3.973s
# sys 0m0.858s
#podman
sudo pacman -S aardvark-dns
#docker/podman with selenium
cd docker-selenium
#podman
podman-compose up -d
podman-compose stop
#docker
docker-compose up -d
-
html generator
-
html parser
-
Automation test
- selenium
- Scrapy
- Playwright/python
- enables reliable end-to-end testing for modern web apps.
- MechanicalSoup
== We're Using GitHub Under Protest ==
This project is currently hosted on GitHub. This is not ideal; GitHub is a proprietary, trade-secret system that is not Free and Open Souce Software (FOSS). We are deeply concerned about using a proprietary system like GitHub to develop our FOSS project. We have an open {bug ticket, mailing list thread, etc.} where the project contributors are actively discussing how we can move away from GitHub in the long term. We urge you to read about the Give up GitHub campaign from the Software Freedom Conservancy to understand some of the reasons why GitHub is not a good place to host FOSS projects.
If you are a contributor who personally has already quit using GitHub, please check this resource for how to send us contributions without using GitHub directly.
Any use of this project's code by GitHub Copilot, past or present, is done without our permission. We do not consent to GitHub's use of this project's code in Copilot.