A fast and efficient web crawler built with Bun and TypeScript that can crawl websites and perform Google-like search on the crawled content.
- 🚀 Fast crawling using Bun's high-performance runtime
- 🔍 Built-in search functionality to find pages by title
- 📊 Detailed reporting with page counts and performance metrics
- 🎯 Configurable limits to control crawl depth
- 📁 JSON export of crawled data for further analysis
- 🧪 Comprehensive tests ensuring reliability
Make sure you have Bun installed on your system.
# Clone the repository
git clone https://github.com/thetanav/crawler.git
cd crawler
# Install dependencies
bun installCrawl a website and save the results:
bun run index.ts https://example.comThis will crawl up to 100 pages by default and generate:
report.json: Detailed crawl datatitles.json: Title-to-URL mapping
Limit the number of pages to crawl:
bun run index.ts https://example.com --limit=50Crawl a site and immediately search for content:
bun run index.ts https://example.com "search query"Example output:
🔍 Search results for "search query":
────────────────────────────────────────────────────────────
1. Page Title One
📎 https://example.com/page1
(Found 5 times)
2. Page Title Two
📎 https://example.com/page2
(Found 3 times)
────────────────────────────────────────────────────────────
Total: 2 results
Start the web server to access the minimal, Google-inspired search interface:
bun run serverOpen http://localhost:3000 in your browser.
Home Page:
- Clean, minimalist design
- Quick search box
- One-click site crawl button
- Instant feedback on crawl status
Search Results:
- Fast, real-time search results
- Google-like result cards with titles, URLs, and snippets
- Clickable links to visit found pages
- Mention count for relevance
- Mobile-responsive design
Live Crawling:
- Enter any website URL
- Crawl up to 100 pages instantly
- Results available immediately for searching
Search previously crawled data from the command line:
bun search.ts "your search query"This requires that you have already run the crawler to generate report.json.
The web UI is designed with simplicity in mind - inspired by Google's minimalist search interface:
- Clean & Fast - No clutter, just search and results
- Intuitive - Instantly familiar to anyone who's used a search engine
- Mobile-First - Responsive design works great on any device
- Minimal Dependencies - Built with Vite + React for blazing-fast performance
To rebuild the Vite app after making changes:
bun run ui:buildTo run the Vite dev server with hot reloading:
bun run ui:devThen access it at http://localhost:5173
The web UI is built with Vite + React and communicates with the backend server via two APIs:
-
POST /api/crawl - Starts a new crawl
- Sends domain URL and page limit
- Returns success/error status and page count
-
GET /api/search - Searches crawled pages
- Sends search query as URL parameter
- Returns matching pages sorted by relevance
The server automatically caches crawled data in memory, so subsequent searches are instant.
- Bun: Runtime and package manager
- TypeScript: Type safety
- JSDOM: HTML parsing and DOM manipulation
- React: UI framework
- Vite: Build tool and dev server
This project is private and not licensed for public use.
_~~~ Crawled boot.dev ~~~_
