A sophisticated TypeScript-based Playwright service for downloading images from VSCO using browser automation. Built with a modular service architecture.
- π Smart Authentication: Automatic login to VSCO with fallback to manual login
- π― Profile Scraping: Extracts images exclusively from VSCO gallery, avoiding profile avatars
- π Enhanced Metadata Extraction: Extracts upload dates, available image sizes, and srcset from individual VSCO pages
- π High-Resolution Downloads: Downloads original quality images (up to 2048px) instead of thumbnails
- π Concurrent Downloads: Parallel processing with configurable concurrency (1-10 workers)
- β‘ Intelligent Batching: Smart batch processing with rate limiting to respect VSCO's terms
- π Browser Pool Management: Efficient browser context pooling for optimal resource usage
- π Performance Optimized: Up to 10x faster downloads with concurrent processing
- π Retry Logic: Robust retry mechanism with exponential backoff per worker
- π Progress Tracking: Real-time progress reporting across all concurrent workers
- π― Selective Downloads: Skip already downloaded images automatically (with metadata extraction)
- π Rich Manifest Generation: Creates comprehensive VSCO manifest with detailed metadata
- π₯οΈ Debug Mode: Visual debugging with DevTools support
- βοΈ Highly Configurable: Extensive configuration options via CLI or environment
- ποΈ Modular Architecture: Service-oriented design with dependency injection
- π TypeScript: Full type safety with comprehensive type definitions
- π₯ Hot Reload: Development mode with instant TypeScript compilation
- π§ͺ Testing: Playwright test framework integration
- βοΈ Rate Limiting: Intelligent throttling to maintain VSCO compliance
- Node.js 18+
- A VSCO account (optional but recommended for authentication)
- VSCO username(s) you want to download images from
- Navigate to the tool directory:
cd playwright-vsco-downloader
- Install dependencies:
pnpm install
- Install Playwright browsers:
pnpm run install-browsers
- Set up environment variables:
cp .env.example .env
# Edit .env with your credentials
# Download from a VSCO profile (headless mode) using concurrent mode (3 workers)
pnpm run download --username photographer123
# Or use as positional argument
pnpm run download photographer123
# Development mode with hot reload
pnpm run download:dev --username photographer123
# Download with visible browser (useful for debugging/manual login)
pnpm run download --username photographer123 --no-headless
# Support for VSCO URLs
pnpm run download --username "https://vsco.co/photographer123"
# Check environment setup
pnpm run download -- check
# Download with custom settings
pnpm run download --username photographer123 --timeout 45000 --retries 5 --limit 10
# Limit number of images downloaded
pnpm run download --username photographer123 --limit 50
The tool supports concurrent downloads for significantly improved performance:
# Default concurrent mode (3 workers)
pnpm run download --username photographer123
# High-performance mode (10 concurrent workers)
pnpm run download --username photographer123 --concurrency 10
# Conservative mode (5 workers with larger batches)
pnpm run download --username photographer123 --concurrency 5 --batch-size 8
# Custom batch processing
pnpm run download \
--username photographer123 \
--concurrency 4 \
--batch-size 3 \
--delay-between-batches 2000
# Disable batching (process all at once with concurrency limit)
pnpm run download --username photographer123 --concurrency 5 --no-batching
# Sequential mode (1 worker)
pnpm run download --username photographer123 --concurrency 1
Mode | Workers | Typical Speed | Best For |
---|---|---|---|
Sequential | 1 | 1x | Very conservative, debugging |
Default Concurrent | 3 | ~3x faster | Balanced performance & stability |
High Performance | 5-7 | ~5-7x faster | Large downloads, good connection |
Maximum | 10 | ~10x faster | Powerful systems, excellent connection |
-c, --concurrency <number>
: Maximum concurrent downloads (1-10, default: 3)--batch-size <number>
: Images per batch (default: same as concurrency)--delay-between-batches <number>
: Delay between batches in milliseconds (default: 1000ms)--no-batching
: Disable batch processing, process all images at once with concurrency limit
- Concurrency: 3 workers (optimal balance of speed vs. resource usage)
- Batching: Enabled by default to respect VSCO rate limits
- Batch Size: Equals concurrency setting (e.g., 3 workers = 3 images per batch)
- Delays: 1 second between batches to prevent rate limiting
- Browser Pool Management: Reuses browser contexts efficiently
- Context Lifecycle: Automatic cleanup of stale browser contexts
- Error Isolation: Failed downloads don't affect other workers
- Resource Optimization: Memory-efficient context pooling
- Rate Limit Compliance: Smart batching prevents VSCO server overload
For Small Downloads (< 20 images):
pnpm run download --username photographer123 --concurrency 3
For Medium Downloads (20-100 images):
pnpm run download --username photographer123 --concurrency 5 --batch-size 4
For Large Downloads (100+ images):
pnpm run download --username photographer123 --concurrency 7 --batch-size 5 --delay-between-batches 1500
For Maximum Speed (powerful systems):
pnpm run download --username photographer123 --concurrency 10 --no-batching
For Debugging/Conservative:
pnpm run download --username photographer123 --concurrency 1 --no-headless --debug
pnpm run download [username] [options]
# or
pnpm run download --username <username> [options]
Options:
-u, --username <username>
: VSCO username to scrape (required)-h, --headless
: Run in headless mode (default: true)--no-headless
: Run with visible browser-d, --debug
: Enable debug mode with DevTools-t, --timeout <number>
: Timeout in milliseconds (default: 30000)-r, --retries <number>
: Number of retry attempts (default: 3)-l, --limit <number>
: Limit the number of images to download--download-dir <path>
: Download directory--dry-run
: Show what would be downloaded without downloading
Concurrency & Performance Options:
-c, --concurrency <number>
: Maximum concurrent downloads (1-10, default: 3)--batch-size <number>
: Images per batch (default: same as concurrency)--delay-between-batches <number>
: Delay between batches in milliseconds (default: 1000)--no-batching
: Disable batch processing (process all at once with concurrency limit)
pnpm run download -- check
Validates your environment configuration and shows the status of required/optional environment variables.
# Development mode (uses tsx for hot reload)
pnpm run download:dev
# Build TypeScript to JavaScript
pnpm run build
# Build in watch mode (rebuilds on file changes)
pnpm run build:watch
# Type checking without compilation
pnpm run type-check
# Clean build artifacts
pnpm run clean
The tool is built with a modular TypeScript architecture featuring:
- Service-Oriented Design: Each major functionality is encapsulated in a dedicated service class
- Dependency Injection: Services are injected through constructors for better testability
- Type Safety: Comprehensive TypeScript types ensure runtime reliability
- Command Pattern: CLI commands are implemented as separate classes with a common interface
- Configuration Management: Centralized configuration with environment variable support
- Error Boundaries: Structured error handling with detailed logging and recovery mechanisms
The tool can be used programmatically:
import { PlaywrightImageDownloader, Config } from "./src/index.js";
// Simple usage
const downloader = new PlaywrightImageDownloader({
headless: false,
debug: true,
timeout: 45000,
});
const result = await downloader.run();
console.log("Downloaded:", result.stats.downloaded, "images");
// Advanced usage with individual services
import {
BrowserManager,
ManifestService,
DownloadService,
FileSystemService,
} from "./src/index.js";
const config = new Config({ timeout: 60000 });
const browserManager = new BrowserManager(config);
const fsService = new FileSystemService(config);
const manifestService = new ManifestService(config, fsService);
const downloadService = new DownloadService(config, browserManager, fsService, statsTracker);
// Custom workflow implementation
await browserManager.initialize();
const { imageEntries } = await manifestService.getProcessedImageEntries();
// ... custom processing
await browserManager.cleanup();
Create a .env
file based on .env.example
:
# Optional (enables auto-login)
VSCO_EMAIL=your_email@example.com
VSCO_PASSWORD=your_password_here
# Optional configuration overrides
PLAYWRIGHT_TIMEOUT=30000
PLAYWRIGHT_RETRIES=3
PLAYWRIGHT_LIMIT=0
PLAYWRIGHT_HEADLESS=true
PLAYWRIGHT_DEBUG=false
All environment variables can be overridden via command line:
pnpm run download \
--timeout 45000 \
--retries 2 \
--no-headless \
--debug
The tool supports multiple authentication approaches:
Set VSCO_EMAIL
and VSCO_PASSWORD
in your .env
file. The tool will automatically log in to VSCO.
If credentials are not provided, you can:
- Run with
--no-headless
to see the browser - The tool will pause and wait for you to log in manually
- Press Enter in the terminal when login is complete
The tool can work without authentication, but download quality and availability may be limited.
playwright-vsco-downloader/
βββ src/ # TypeScript source files
β βββ index.ts # Public API entry point
β βββ cli.ts # CLI application entry point
β βββ PlaywrightImageDownloader.ts # Main orchestrator class
β βββ api/
β β βββ VscoAPIService.ts # VSCO API integration
β βββ auth/
β β βββ AuthenticationService.ts # VSCO authentication logic
β βββ browser/
β β βββ BrowserManager.ts # Playwright browser lifecycle
β βββ cli/
β β βββ CliApplication.ts # CLI framework and routing
β β βββ EnvironmentChecker.ts # Environment validation
β β βββ OptionParser.ts # Command line argument parsing
β β βββ OutputFormatter.ts # Console output formatting
β β βββ commands/ # Individual CLI commands
β βββ config/
β β βββ Config.ts # Configuration management
β βββ download/
β β βββ DownloadService.ts # Image download logic
β βββ fs/
β β βββ FileSystemService.ts # File operations
β βββ manifest/
β β βββ ManifestService.ts # Manifest processing
β βββ stats/
β β βββ StatsTracker.ts # Progress and metrics tracking
β βββ types/
β βββ index.ts # TypeScript type definitions
βββ dist/ # Compiled JavaScript output
βββ tsconfig.json # TypeScript configuration
βββ playwright.config.js # Playwright test configuration
βββ package.json # Dependencies and scripts
βββ .env.example # Environment template
βββ .gitignore # Git ignore rules
βββ README.md # This file
Since VSCO doesn't allow direct scraping, this tool uses a combination of:
- VSCO API Integration: Fetches public metadata when available
- Browser Automation: Navigates to VSCO pages to extract image URLs
- Authenticated Sessions: Uses login sessions to access higher quality images
- Rate Limiting: Sequential processing with delays to respect VSCO's terms
flowchart TD
A[CLI Entry Point] --> B[Config Manager]
B --> C[Browser Manager]
C --> D[Authentication Service]
D --> E[VSCO Profile Scraper]
E --> F[Download Service]
F --> G[FileSystem Service]
G --> H[Stats Tracker]
B --> I[Environment Variables]
B --> J[CLI Arguments]
E --> K[Profile Metadata]
E --> L[Image URLs & Basic Metadata]
F --> M[Enhanced Metadata Extraction]
M --> N[Individual VSCO Pages]
N --> O[Upload Dates]
N --> P[Available Sizes]
N --> Q[Srcset Data]
F --> R[High-Res Image Download]
R --> S[Image Files]
G --> T[VSCO Manifest JSON]
T --> U[Comprehensive Metadata]
H --> V[Progress Reporting]
H --> W[Statistics]
sequenceDiagram
participant CLI as CLI Application
participant Config as Configuration
participant Browser as Browser Manager
participant Auth as Authentication
participant Scraper as Profile Scraper
participant Download as Download Service
participant Meta as Metadata Extractor
participant FS as FileSystem Service
CLI->>Config: Load configuration
CLI->>Browser: Initialize browser
Browser->>Auth: Authenticate with VSCO
Auth-->>Browser: Login successful
CLI->>Scraper: Scrape profile
Scraper->>Browser: Navigate to profile
Browser-->>Scraper: Profile page loaded
Scraper->>Scraper: Extract gallery images only
Note over Scraper: Uses UserProfileGallery selector
Note over Scraper: Avoids profile avatars
loop For each image
Scraper->>Download: Process image
Download->>FS: Check if exists
alt File exists
Download->>Meta: Extract metadata anyway
Meta->>Browser: Navigate to image page
Browser-->>Meta: Individual page loaded
Meta->>Meta: Extract upload date
Meta->>Meta: Parse srcset sizes
Meta-->>Download: Enhanced metadata
Download-->>CLI: Skip with metadata
else File doesn't exist
Download->>Meta: Extract metadata
Meta->>Browser: Navigate to image page
Browser-->>Meta: Individual page loaded
Meta->>Meta: Extract upload date
Meta->>Meta: Parse srcset sizes
Meta->>Meta: Get high-res URL (2048px)
Meta-->>Download: Enhanced metadata + URL
Download->>Browser: Download high-res image
Browser-->>Download: Image downloaded
Download->>FS: Save image file
end
end
CLI->>FS: Generate VSCO manifest
FS->>FS: Include all enhanced metadata
Note over FS: upload_date, available_sizes, srcset
Note over FS: No bio, follower_count, likes
FS-->>CLI: Manifest created
CLI->>Browser: Cleanup
CLI->>CLI: Report statistics
- Initialization: Launches a Chromium browser instance with optimized settings
- Username Validation: Cleans and validates the provided VSCO username
- Authentication: Handles login to VSCO (automatic or manual)
- Profile Scraping:
- Navigates to the VSCO profile page
- Uses
[data-testid="UserProfileGallery"]
selector to target only gallery images - Avoids profile pictures and avatars
- Loads image grid with infinite scroll handling
- Extracts image URLs and basic metadata from each image
- Enhanced Metadata Extraction:
- For each image (including existing files), navigates to individual VSCO page
- Extracts upload date from
<time datetime>
elements - Parses
srcset
attribute to get all available image sizes - Obtains high-resolution image URLs (up to 2048px)
- Image Processing:
- Checks if image already exists locally
- Downloads high-resolution images with proper naming
- Applies enhanced metadata to all images (downloaded or skipped)
- Retry Logic: Automatically retries failed downloads with exponential backoff
- VSCO Manifest Creation: Generates comprehensive manifest with enhanced metadata
- Cleanup: Closes browser and reports comprehensive statistics
The tool generates a comprehensive VSCO manifest with enhanced metadata:
{
"generated_at": "2025-08-28T10:02:26.556Z",
"version": "1.0.0-vsco",
"source": "vsco_profile_scraper",
"download_method": "playwright",
"profile": {
"username": "photographer123",
"display_name": "Amazing Photographer",
"profile_url": "https://vsco.co/photographer123",
"total_images_found": 45
},
"images": {
"photographer123/60f6042f56ee7b3727dc1786": {
"local_path": "/public/images/vsco/photographer123/60f6042f56ee7b3727dc1786.jpg",
"filename": "60f6042f56ee7b3727dc1786.jpg",
"file_size_bytes": 1472503,
"downloaded_at": "2025-08-28T10:02:26.557Z",
"skipped": false,
"vsco_image_id": "60f6042f56ee7b3727dc1786",
"vsco_image_url": "https://vsco.co/photographer123/media/60f6042f56ee7b3727dc1786",
"vsco_profile_url": "https://vsco.co/photographer123",
"direct_image_url": "https://im.vsco.co/aws-us-west-2/.../vsco60f604326085a.jpg?w=2048&dpr=1",
"thumbnail_url": "//im.vsco.co/aws-us-west-2/.../vsco60f604326085a.jpg?w=480",
"width_px": 2048,
"height_px": 1536,
"dimensions": "2048 x 1536",
"author": "Amazing Photographer",
"vsco_username": "photographer123",
"upload_date": "2021-07-19T23:01:06.386Z",
"available_sizes": [
"300w", "480w", "600w", "640w", "960w", "1136w", "1200w"
],
"srcset": "//im.vsco.co/.../vsco60f604326085a.jpg?w=300 300w, //im.vsco.co/.../vsco60f604326085a.jpg?w=480 480w, //im.vsco.co/.../vsco60f604326085a.jpg?w=600 600w, //im.vsco.co/.../vsco60f604326085a.jpg?w=640 640w, //im.vsco.co/.../vsco60f604326085a.jpg?w=960 960w, //im.vsco.co/.../vsco60f604326085a.jpg?w=1136 1136w, //im.vsco.co/.../vsco60f604326085a.jpg?w=1200 1200w",
"download_method": "playwright",
"extracted_at": "2025-08-28T10:02:26.557Z"
}
},
"stats": {
"total_images": 1,
"downloaded": 1,
"failed": 0,
"skipped": 0,
"success_rate": 100,
"duration_seconds": 0
}
}
Profile Data:
- Clean profile structure without unnecessary fields
- Focuses on essential information: username, display name, profile URL, and image count
Image Metadata:
- Upload Date: Extracted from
<time datetime>
elements on individual VSCO pages - Available Sizes: Parsed from
srcset
attribute showing all resolution options - Srcset Data: Complete responsive image data for different screen densities
- High-Resolution URLs: Direct links to highest quality images (up to 2048px)
- File Information: Local paths, sizes, and download timestamps
- VSCO Integration: Direct links to original VSCO pages and profiles
The tool provides comprehensive output including:
- Real-time download progress
- Success/failure statistics
- File sizes and storage usage
- Detailed error reporting for failed downloads
- Local manifest creation
Example output:
π Starting downloads for 5 images...
π₯ [1/5] Processing: user1/abc123
π Navigating to: https://vsco.co/user1/gallery/abc123
πΈ Extracted image URL: https://image-aws-us-west-2.vsco.co/...
β
Downloaded: user1_abc123.jpg (2.34 MB)
βοΈ Skipped (exists): user2_def456.jpg
π Download Results:
β
Successfully downloaded: 4
βοΈ Skipped (already exist): 1
β Failed to download: 0
β±οΈ Total time: 45.2s
πΎ Storage used: 12.67 MB
"Could not extract image URL from VSCO page"
- The page layout may have changed
- Try running with
--no-headless --debug
to inspect the page - Check if manual login is required
"Timeout waiting for download"
- Increase timeout:
--timeout 60000
- Check internet connection
- Some images may be larger and take longer
"Login failed"
- Verify credentials in
.env
file - Try manual login with
--no-headless
- Check if 2FA is enabled (not currently supported)
"Browser crashes"
- Ensure sufficient system resources
- Update Playwright browsers:
pnpm run install-browsers
Run with debug flags for troubleshooting:
pnpm run download --no-headless --debug
This will:
- Show the browser window
- Opens Browser DevTools
- Provide detailed error information
pnpm run download \
--username photographer123 \
--download-dir /path/to/download/folder
Add this to your main project's package.json
:
{
"scripts": {
"download:images:vsco": "cd tools/playwright-vsco-downloader && pnpm run download"
}
}
Then run from your main project:
pnpm run download:images:vsco
- Concurrent Processing: Supports concurrent downloads (1-10 workers) with intelligent batching for optimal performance
- Rate Limiting: Smart batching and delays respect VSCO's terms of service while maximizing download speed
- Resource Management: Browser context pooling optimizes memory usage and prevents resource exhaustion
- Authentication: While authentication is optional, it may provide access to higher quality images
- API Usage: Uses VSCO's public endpoints and browser automation, not private APIs
- Respect Terms: Always respect VSCO's terms of service and the rights of image creators
- Quality: Image quality depends on what VSCO serves publicly vs. to authenticated users
- System Resources: Higher concurrency settings require more system resources (RAM, CPU, network)
- API Access: VSCO has limited public API, so more browser automation is required
- Authentication: Different login flow and credential handling
- Rate Limiting: More conservative approach to respect VSCO's terms
- URL Structure: Different URL patterns and image ID extraction
- Metadata: Different metadata structure and availability
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly with VSCO content
- Submit a pull request
This tool is part of the main project and follows the same license terms.
This tool is designed for legitimate use cases such as:
- Downloading your own VSCO content
- Archiving images you have permission to download
- Educational and research purposes
Please ensure you:
- Respect VSCO's terms of service
- Respect image creators' rights and copyrights
- Use the tool responsibly and ethically
- Do not use for mass scraping or commercial exploitation
The developers are not responsible for misuse of this tool.