ToonPlex

A webtoon/manga host and an accompanying scraper that uploads to your ToonPlex server.

Introduction

I got tired of having to update my adblock frequently and sometimes the sites go down or load really slowly. (I'd still recommend using those sites, but this is good if you'd like to maintain your own collection.)

Now you can scrape the manhwa/manga/manhua automagically using the scripts provided in scrapers/suite which upload the scraped content to the server, then you can read it in your browser, so that you control the content.

This project comes with 3 distinct codebases:

server: this is the Express.js web backend that hosts and serves the webtoon content
client: this is the React.js web frontend that interacts with the backend to display webtoon content
scraper: this is the scraper terminal application (ideally run on a PC able to run Chromium) that scrapes the webtoon at a given URL and uploads it to the backend

The design choice to have scrapers be run on the end user machine (as opposed to a background process on the server) is twofold: scraping via automating a browser on a legitimate user device evades bot detection algorithms like Cloudflare, and forcing the user to initiate the scrape themselves prevent massive spam on the unsuspecting original host.

A major downside to this system is that, since each webtoon site hosts content in a slightly different manner, a dedicated scraper script (which can be broken at any moment with a slight modification to the site) must be customized extensively for each particular site. Thus, I will initially provide only one scraper script for one of the larger libraries of "free" (read: possibly pirated) webtoons. I will add more as I ... discover new content. The general scraper module's IPC calls are documented below. If you would like to add a scraper script, feel free to submit a PR.

Install

Ideally some things you should have:

PostgreSQL database
A reverse proxy
Enough storage for your content

Simply clone this repository and install dependencies under each codebase using npm install. Build the client by running npx vite build in the client folder, and initialize the database by running npx prisma migrate deploy and npx prisma generate in the server folder.

The server and scraper are both started by using node . in their folders.

Required Configuration

Server and scraper both require a .env file to be present in their respective directories: Server requires:

DB_URL: the URL to the Postgres database
JWTSECRET: the secret for JWT verification
PORT: the port to run the server on
NODE_ENV: production (optional)

Scraper requires:

HOST: the host of the server (include scheme and origin, e.g. http://127.0.0.1:8080)
DOWNLOADPATH: the path to download images to (e.g. ./tmp)

Creating Custom Scraper Script

Note: if your scraper is getting flagged by bot detection software, turn off headless mode and monitor the scrape. after this, you should be able to use headless mode for a while.

Create a script file under scrapers/suite
Inside /scrapers/suite/index.js, create a new object entry containing the name, file name, and targeted host for the scraper
Write your script! The scraper will automatically fork your script as a child process after matching the host. The script should listen to / dispatch the following IPC messages:

Message	Description
`{command: "scrape"}`	Dispatched by the main scraper, the script should initiate the scrape upon receiving this message
`{event: "log", data: string}`	Send this message for the main scraper to write to console under the process pid, recommended to use over `console.log`
`{event: "toon", data: {title: string, summary: string, alttitle: string, authors: array, artists: array, genres: array, tags: array, slug: string, ext: string}}`	Send this message with the data as soon as the toon details have been scraped
`{event: "cover", data: {path: string}}`	Send this message to queue upload of the cover image for this toon
`{event: "chapter", data: {name: string, date: string, order: number, pages: number, ext: string}}`	Send this message to create a record for the chapter
`{event: "image", data: {path: string, page: number, chapter: number}}`	Send this message to queue upload of a content image for a specific chapter's page

The scraper will process requests in a queue, since some requests are dependent on the successful completion of others (e.g. cover image upload depends on a toon record existing first).

Name		Name	Last commit message	Last commit date
Latest commit History 204 Commits
client		client
scrapers		scrapers
server		server
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ToonPlex

Introduction

Install

Required Configuration

Creating Custom Scraper Script

About

Releases

Packages

Languages

ApocalypseCalculator/ToonPlex

Folders and files

Latest commit

History

Repository files navigation

ToonPlex

Introduction

Install

Required Configuration

Creating Custom Scraper Script

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages