id	title	description
error-handling	Error handling	How to handle errors that occur during web crawling.

import ApiLink from '@site/src/components/ApiLink'; import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';

import HandleProxyError from '!!raw-loader!roa-loader!./code_examples/error_handling/handle_proxy_error.py'; import ChangeHandleErrorStatus from '!!raw-loader!roa-loader!./code_examples/error_handling/change_handle_error_status.py'; import DisableRetry from '!!raw-loader!roa-loader!./code_examples/error_handling/disable_retry.py';

This guide demonstrates techniques for handling common errors encountered during web crawling operations.

Handling proxy errors

Low-quality proxies can cause problems even with high settings for max_request_retries and max_session_rotations in BasicCrawlerOptions. If you can't get data because of proxy errors, you might want to try again. You can do this using failed_request_handler:

{HandleProxyError}

You can use this same approach when testing different proxy providers. To better manage this process, you can count proxy errors and stop the crawler if you get too many.

Changing how error status codes are handled

By default, when Sessions get status codes like 401, 403, or 429, Crawlee marks the Session as retire and switches to a new one. This might not be what you want, especially when working with authentication. You can learn more in the Session management guide.

Here's an example of how to change this behavior:

{ChangeHandleErrorStatus}

Turning off retries for non-network errors

Sometimes you might get unexpected errors when parsing data, like when a website has an unusual structure. Crawlee normally tries again based on your max_request_retries setting, but sometimes you don't want that.

Here's how to turn off retries for non-network errors using error_handler, which runs before Crawlee tries again:

{DisableRetry}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Files

error_handling.mdx

error_handling.mdx

Handling proxy errors

Changing how error status codes are handled

Turning off retries for non-network errors

Collapse file tree

Files

error_handling.mdx

Latest commit

History

error_handling.mdx

File metadata and controls

Handling proxy errors

Changing how error status codes are handled

Turning off retries for non-network errors