Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .lenv
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# web-capture default configuration
# This file uses Links Notation format (key: value)

# Server port
PORT: 3000

# Browser engine (puppeteer or playwright)
BROWSER_ENGINE: puppeteer
2 changes: 1 addition & 1 deletion ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -515,7 +515,7 @@ RUN yarn install --frozen-lockfile

COPY . .
EXPOSE 3000
ENTRYPOINT ["node", "src/index.js"]
ENTRYPOINT ["node", "bin/web-capture.js", "--serve"]
```

### Docker Compose
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -42,4 +42,4 @@ COPY . .

EXPOSE 3000

ENTRYPOINT ["node", "src/index.js"]
ENTRYPOINT ["node", "bin/web-capture.js", "--serve"]
131 changes: 124 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,37 @@

<img width="1824" alt="Screenshot 2025-05-12 at 3 49 32 AM" src="https://github.com/user-attachments/assets/cbf63dec-7dcd-40e7-9d5d-eddc49fe6169" />

A microservice to fetch URLs and render them as:
A CLI and microservice to fetch URLs and render them as:

- **HTML**: Rendered page content
- **Markdown**: Converted from HTML
- **PNG screenshot**: Full page capture

## Quick Start

### CLI Usage

```bash
# Install globally
npm install -g web-capture

# Capture a URL as HTML (output to stdout)
web-capture https://example.com

# Capture as Markdown and save to file
web-capture https://example.com --format markdown --output page.md

# Take a screenshot
web-capture https://example.com --format png --output screenshot.png

# Start as API server
web-capture --serve

# Start server on custom port
web-capture --serve --port 8080
```

### API Endpoints (Server Mode)

- **HTML**: GET /html?url=<URL>
- **Markdown**: GET /markdown?url=<URL>
Expand All @@ -16,6 +46,51 @@ npm install
yarn install
```

## CLI Reference

### Server Mode

Start the API server:

```bash
web-capture --serve [--port <port>]
```

| Option | Short | Description | Default |
|--------|-------|-------------|---------|
| `--serve` | `-s` | Start as HTTP API server | - |
| `--port` | `-p` | Port to listen on | 3000 (or PORT env) |

### Capture Mode

Capture a URL directly:

```bash
web-capture <url> [options]
```

| Option | Short | Description | Default |
|--------|-------|-------------|---------|
| `--format` | `-f` | Output format: `html`, `markdown`/`md`, `image`/`png` | `html` |
| `--output` | `-o` | Output file path | stdout (text) or auto-generated (images) |
| `--engine` | `-e` | Browser engine: `puppeteer`, `playwright` | `puppeteer` (or BROWSER_ENGINE env) |

### Examples

```bash
# Capture HTML to stdout
web-capture https://example.com

# Capture Markdown to file
web-capture https://example.com -f markdown -o page.md

# Take screenshot with Playwright engine
web-capture https://example.com -f png -e playwright -o screenshot.png

# Pipe HTML to another command
web-capture https://example.com | grep "title"
```

## Available Commands

### Development
Expand Down Expand Up @@ -101,24 +176,66 @@ curl http://localhost:3000/image?url=https://example.com > screenshot.png
curl http://localhost:3000/image?url=https://example.com&engine=playwright > screenshot.png
```

## Configuration

web-capture uses [lino-arguments](https://github.com/link-foundation/lino-arguments) for unified configuration management. Configuration values are resolved with the following priority (highest to lowest):

1. **CLI arguments**: `--port 8080`
2. **Environment variables**: `PORT=8080`
3. **Custom configuration file**: `--configuration path/to/custom.lenv`
4. **Default .lenv file**: `.lenv` in the project root
5. **Built-in defaults**

### Configuration File (.lenv)

Create a `.lenv` file in your project root using Links Notation format:

```lenv
# Server configuration
PORT: 3000

# Browser engine (puppeteer or playwright)
BROWSER_ENGINE: puppeteer
```

### Using Custom Configuration Files

Specify a custom configuration file path:

```bash
web-capture --serve --configuration /path/to/custom.lenv
```

### Environment Variables

All configuration options support environment variables:

```bash
# Set port via environment variable
export PORT=8080
web-capture --serve

# Set browser engine
export BROWSER_ENGINE=playwright
web-capture https://example.com --format png
```

## Browser Engine Support

The service supports both **Puppeteer** and **Playwright** browser engines:

- **Puppeteer**: Default engine, mature and well-tested
- **Playwright**: Alternative engine with similar capabilities

You can choose the engine using the `engine` query parameter or by setting the `BROWSER_ENGINE` environment variable.
You can choose the engine using:
- CLI argument: `--engine playwright`
- Environment variable: `BROWSER_ENGINE=playwright`
- Configuration file: `BROWSER_ENGINE: playwright` in `.lenv`

**Supported engine values:**
- `puppeteer` or `pptr` - Use Puppeteer
- `playwright` or `pw` - Use Playwright

**Environment Variable:**
```bash
export BROWSER_ENGINE=playwright
```

## Development

The service is built with:
Expand Down
Loading