-
Notifications
You must be signed in to change notification settings - Fork 539
Closed
Labels
enhancementNew feature or request.New feature or request.t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.
Description
Exceptions logged by crawlers should not contain irrelevant stack traces.
Context: Crawlers can log exceptions and continue running. For example TimeoutError that happened in request handler function. Many of these exceptions contain framework code related stack traces, that is completely irrelevant for the end user. This clutters the exception info and makes logs less readable.
Example of cluttered log:
[crawlee.crawlers._basic._basic_crawler] ERROR Request failed and reached maximum retries
Traceback (most recent call last):
File ".../repos/crawlee-python/src/crawlee/crawlers/_basic/_context_pipeline.py", line 82, in __call__
await final_context_consumer(cast('TCrawlingContext', crawling_context))
File ".../repos/crawlee-python/src/crawlee/router.py", line 98, in __call__
return await self._default_handler(context)
File ".../repos/crawlee-python/tests/unit/crawlers/_basic/test_basic_crawler.py", line 1301, in default_handler
await asyncio.sleep(5)
File "/usr/lib/python3.10/asyncio/tasks.py", line 605, in sleep
return await future
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File ".../repos/crawlee-python/src/crawlee/_utils/wait.py", line 37, in wait_for
return await asyncio.wait_for(operation(), timeout.total_seconds())
File "/usr/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File ".../repos/crawlee-python/src/crawlee/crawlers/_basic/_basic_crawler.py", line 1112, in __run_task_function
await self._run_request_handler(context=context)
File ".../repos/crawlee-python/src/crawlee/crawlers/_basic/_basic_crawler.py", line 1209, in _run_request_handler
await wait_for(
File ".../repos/crawlee-python/src/crawlee/_utils/wait.py", line 39, in wait_for
raise asyncio.TimeoutError(timeout_message) from ex
asyncio.exceptions.TimeoutError: Request handler timed out after 1.0 seconds
Example of focused log with only the relevant information:
[crawlee.crawlers._basic._basic_crawler] ERROR Request failed and reached maximum retries
asyncio.exceptions.TimeoutError: Request handler timed out after 1.0 seconds
Request handler was interrupted at:
File ".../repos/crawlee-python/tests/unit/crawlers/_basic/test_basic_crawler.py", line 1301, in default_handler
await asyncio.sleep(5)
Metadata
Metadata
Assignees
Labels
enhancementNew feature or request.New feature or request.t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.