Skip to content

Conversation

@vdusek
Copy link
Collaborator

@vdusek vdusek commented Aug 19, 2025

Description

  • Increase statistics log table width (so that the default maximum number of retries fits there).
  • Add format_duration util function for human-readable duration values in stats.

Issues

Demonstration

import asyncio

from crawlee.crawlers import ParselCrawler, ParselCrawlingContext
from crawlee.errors import SessionError


async def main() -> None:
    crawler = ParselCrawler(statistics_log_format='table')

    @crawler.router.default_handler
    async def request_handler(_: ParselCrawlingContext) -> None:
        raise SessionError('Blah blah blah')

    await crawler.run(['https://crawlee.dev'])


if __name__ == '__main__':
    asyncio.run(main())

Log:

$ python run_crawler.py 
[ParselCrawler] INFO  Crawled 0/8292 pages, 0 failed requests, desired concurrency 1.
[ParselCrawler] INFO  Current request statistics:
┌───────────────────────────────┬────────┐
│ requests_finished             │ 0      │
│ requests_failed               │ 0      │
│ retry_histogram               │ [0]    │
│ request_avg_failed_duration   │ None   │
│ request_avg_finished_duration │ None   │
│ requests_finished_per_minute  │ 0      │
│ requests_failed_per_minute    │ 0      │
│ request_total_duration        │ 0s     │
│ requests_total                │ 0      │
│ crawler_runtime               │ 14.1ms │
└───────────────────────────────┴────────┘
[crawlee._autoscaling.autoscaled_pool] INFO  current_concurrency = 0; desired_concurrency = 2; cpu = 0; mem = 0; event_loop = 0.0; client_info = 0.0
[ParselCrawler] WARN  Encountered "crawlee.errors.SessionError: Blah blah blah", rotating session and retrying...
[ParselCrawler] WARN  Encountered "crawlee.errors.SessionError: Blah blah blah", rotating session and retrying...
[ParselCrawler] WARN  Encountered "crawlee.errors.SessionError: Blah blah blah", rotating session and retrying...
[ParselCrawler] WARN  Encountered "crawlee.errors.SessionError: Blah blah blah", rotating session and retrying...
[ParselCrawler] WARN  Encountered "crawlee.errors.SessionError: Blah blah blah", rotating session and retrying...
[ParselCrawler] WARN  Encountered "crawlee.errors.SessionError: Blah blah blah", rotating session and retrying...
[ParselCrawler] WARN  Encountered "crawlee.errors.SessionError: Blah blah blah", rotating session and retrying...
[ParselCrawler] WARN  Encountered "crawlee.errors.SessionError: Blah blah blah", rotating session and retrying...
[ParselCrawler] WARN  Encountered "crawlee.errors.SessionError: Blah blah blah", rotating session and retrying...
[ParselCrawler] ERROR Request to https://crawlee.dev failed and reached maximum retries
 Traceback (most recent call last):
  File "/home/vdusek/Projects/crawlee-python/src/crawlee/crawlers/_basic/_basic_crawler.py", line 1312, in __run_task_function
    await self._run_request_handler(context=context)
  File "/home/vdusek/Projects/crawlee-python/src/crawlee/crawlers/_basic/_basic_crawler.py", line 1407, in _run_request_handler
    await wait_for(
    ...<5 lines>...
    )
  File "/home/vdusek/Projects/crawlee-python/src/crawlee/_utils/wait.py", line 37, in wait_for
    return await asyncio.wait_for(operation(), timeout.total_seconds())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vdusek/.local/share/uv/python/cpython-3.13.0-linux-x86_64-gnu/lib/python3.13/asyncio/tasks.py", line 507, in wait_for
    return await fut
           ^^^^^^^^^
  File "/home/vdusek/Projects/crawlee-python/src/crawlee/crawlers/_basic/_context_pipeline.py", line 114, in __call__
    await final_context_consumer(cast('TCrawlingContext', crawling_context))
  File "/home/vdusek/Projects/crawlee-python/src/crawlee/router.py", line 98, in __call__
    return await self._default_handler(context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vdusek/Projects/crawlee-python/run_crawler.py", line 12, in request_handler
    raise SessionError('Blah blah blah')
crawlee.errors.SessionError: Blah blah blah
[crawlee._autoscaling.autoscaled_pool] INFO  Waiting for remaining tasks to finish
[ParselCrawler] INFO  Error analysis: total_errors=1 unique_errors=1
[ParselCrawler] INFO  Final request statistics:
┌───────────────────────────────┬────────────────────────────────┐
│ requests_finished             │ 0                              │
│ requests_failed               │ 1                              │
│ retry_histogram               │ [0, 0, 0, 0, 0, 0, 0, 0, 0, 1] │
│ request_avg_failed_duration   │ 607.8ms                        │
│ request_avg_finished_duration │ None                           │
│ requests_finished_per_minute  │ 0                              │
│ requests_failed_per_minute    │ 8                              │
│ request_total_duration        │ 607.8ms                        │
│ requests_total                │ 1                              │
│ crawler_runtime               │ 7.31s                          │
└───────────────────────────────┴────────────────────────────────┘

Checklist

  • CI passed

@vdusek vdusek added this to the 121st sprint - Tooling team milestone Aug 19, 2025
@vdusek vdusek requested a review from Pijukatel August 19, 2025 14:32
@vdusek vdusek self-assigned this Aug 19, 2025
@vdusek vdusek added the t-tooling Issues with this label are in the ownership of the tooling team. label Aug 19, 2025
@github-actions github-actions bot added the tested Temporary label used only programatically for some analytics. label Aug 19, 2025
@vdusek vdusek merged commit 1eb6da5 into master Aug 19, 2025
19 checks passed
@vdusek vdusek deleted the improve-stats-logging branch August 19, 2025 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve crawler stats logging

3 participants