Feature: Add opt-in telemetry system to help improve Crawl4AI stability through anonymous crash reporting #1420

ntohidi · 2025-08-20T08:53:17Z

Summary

Add opt-in telemetry system to help improve Crawl4AI stability through anonymous crash reporting. This implementation provides a privacy-first, provider-agnostic telemetry infrastructure that captures only
exception information without any PII, URLs, or crawled content.

The telemetry system is designed to be completely transparent and user-controlled, with opt-in behavior for CLI/library usage and opt-out for Docker deployments.

Fix: #1409

List of files changed and why

New files:

crawl4ai/telemetry/__init__.py - Main telemetry module with manager, decorators, and public API
crawl4ai/telemetry/base.py - Provider interface and base classes for extensibility
crawl4ai/telemetry/config.py - Configuration management and persistence
crawl4ai/telemetry/consent.py - User consent handling for different environments
crawl4ai/telemetry/environment.py - Runtime environment detection (CLI, Docker, Jupyter)
crawl4ai/telemetry/providers/sentry.py - Sentry provider implementation
tests/telemetry/test_telemetry.py - Comprehensive test suite (15 test cases)
docs/md_v2/core/telemetry.md - Complete telemetry documentation

Modified files:

crawl4ai/cli.py - Added telemetry CLI commands (enable/disable/status)
crawl4ai/async_webcrawler.py - Integrated telemetry decorators for exception capture
deploy/docker/server.py - Added Docker telemetry initialization
deploy/docker/requirements.txt - Added sentry-sdk dependency
pyproject.toml - Added optional telemetry dependencies
mkdocs.yml - Added telemetry documentation to navigation

How Has This Been Tested?

Unit Tests: Created comprehensive test suite with 15 test cases covering:
- Configuration persistence
- Environment detection (CLI, Docker, Jupyter)
- Consent management flows
- Exception capture functionality
- Singleton pattern verification
- Public API functions
Integration Testing:
- Tested AsyncWebCrawler exception capture with real crawl operations
- Verified CLI commands work correctly (crwl telemetry enable/disable/status)
- Tested Docker server telemetry initialization
- Verified async and sync decorators capture exceptions properly
Manual Testing:
- Tested interactive consent prompt in CLI
- Verified config persistence in ~/.crawl4ai/config.json
- Tested environment variable overrides (CRAWL4AI_TELEMETRY=0)
- Confirmed graceful degradation when sentry-sdk not installed

All tests pass successfully with pytest tests/telemetry/test_telemetry.py

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added/updated unit tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

…tability improvement Implement a privacy-first, provider-agnostic telemetry system to help improve Crawl4AI stability through anonymous crash reporting. The system is designed with user privacy as the top priority, collecting only exception information without any PII, URLs, or crawled content. Architecture & Design: - Provider-agnostic architecture with base TelemetryProvider interface - Sentry as the initial provider implementation with easy extensibility - Separate handling for sync and async code paths - Environment-aware behavior (CLI, Docker, Jupyter/Colab) Key Features: - Opt-in by default for CLI/library usage with interactive consent prompt - Opt-out by default for Docker/API server (enabled unless CRAWL4AI_TELEMETRY=0) - Jupyter/Colab support with widget-based consent (fallback to code snippets) - Persistent consent storage in ~/.crawl4ai/config.json - Optional email collection for critical issue follow-up CLI Integration: - `crwl telemetry enable [--email <email>] [--once]` - Enable telemetry - `crwl telemetry disable` - Disable telemetry - `crwl telemetry status` - Check current status Python API: - Decorators: @telemetry_decorator, @async_telemetry_decorator - Context managers: telemetry_context(), async_telemetry_context() - Manual capture: capture_exception(exc, context) - Control: telemetry.enable(), telemetry.disable(), telemetry.status() Privacy Safeguards: - No URL collection - No request/response data - No authentication tokens or cookies - No crawled content - Automatic sanitization of sensitive fields - Local consent storage only Testing: - Comprehensive test suite with 15 test cases - Coverage for all environments and consent flows - Mock providers for testing without external dependencies Documentation: - Detailed documentation in docs/md_v2/core/telemetry.md - Added to mkdocs navigation under Core section - Privacy commitment and FAQ included - Examples for all usage patterns Installation: - Optional dependency: pip install crawl4ai[telemetry] - Graceful degradation if sentry-sdk not installed - Added to pyproject.toml optional dependencies - Docker requirements updated Integration Points: - AsyncWebCrawler: Automatic exception capture in arun() and aprocess_html() - Docker server: Automatic initialization with environment control - Global exception handler for uncaught exceptions (CLI only) This implementation provides valuable error insights to improve Crawl4AI while maintaining complete transparency and user control over data collection.

coderabbitai · 2025-08-20T08:53:23Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/telemetry

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…system

ntohidi added 2 commits August 18, 2025 14:20

docs: update Docker instructions to use the latest release tag

9447054

ntohidi requested a review from unclecode August 20, 2025 08:53

ntohidi linked an issue Aug 20, 2025 that may be closed by this pull request

Telemetry & Error Reporting (Library, API Server, Notebooks) #1409

Open

feat(tests): Implement comprehensive testing framework for telemetry …

d48d382

…system

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Feature: Add opt-in telemetry system to help improve Crawl4AI stability through anonymous crash reporting #1420

Feature: Add opt-in telemetry system to help improve Crawl4AI stability through anonymous crash reporting #1420

Uh oh!

ntohidi commented Aug 20, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Aug 20, 2025 •

edited

Loading

Review skipped

Uh oh!

Uh oh!

Uh oh!

Feature: Add opt-in telemetry system to help improve Crawl4AI stability through anonymous crash reporting #1420

Are you sure you want to change the base?

Feature: Add opt-in telemetry system to help improve Crawl4AI stability through anonymous crash reporting #1420

Uh oh!

Conversation

ntohidi commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

List of files changed and why

New files:

Modified files:

How Has This Been Tested?

Checklist:

Uh oh!

coderabbitai bot commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Uh oh!

ntohidi commented Aug 20, 2025 •

edited

Loading

coderabbitai bot commented Aug 20, 2025 •

edited

Loading