Skip to content

This "ChatGPT Friendly Crawl" combines modern async patterns with a robust API to streamline data collection, making it an efficient tool for scalable web scraping.

Notifications You must be signed in to change notification settings

Hardcoreyoyo/ChatGPT-Friendly-Crawl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ChatGPT Friendly Crawl

prerequisite

Ensure Python 11+ is installed. Dependencies can be installed via:

pip install aiohttp pyppeteer

Usage

Before running the crawler, set these environment variables:

  • CHATGPT_CRAWL_VAR_START_URL: Starting URL for the crawl.
  • CHATGPT_CRAWL_VAR_DEPTH: Maximum crawl depth.
  • CHATGPT_CRAWL_VAR_MAX_PAGES: Maximum number of pages to fetch.
export CHATGPT_CRAWL_VAR_START_URL=$target_url && \
export CHATGPT_CRAWL_VAR_DEPTH=$depth_number && \
export CHATGPT_CRAWL_VAR_MAX_PAGES=$max_pages_number && \
python ./chatgpt_crawl.py
export CHATGPT_CRAWL_VAR_START_URL=https://www.google.com && \
export CHATGPT_CRAWL_VAR_DEPTH=2 && \
export CHATGPT_CRAWL_VAR_MAX_PAGES=100 && \
python ./chatgpt_crawl.py

Benefits of Using https://r.jina.ai API

Using the https://r.jina.ai API optimizes the retrieval process, enhancing scalability and reliability without the overhead of managing infrastructure.

Wrap-up

This "ChatGPT Friendly Crawl" combines modern async patterns with a robust API to streamline data collection, making it an efficient tool for scalable web scraping.

About

This "ChatGPT Friendly Crawl" combines modern async patterns with a robust API to streamline data collection, making it an efficient tool for scalable web scraping.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages