Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An exception occurred during handling of failed request #2746

Closed
1 task
HJK181 opened this issue Nov 14, 2024 · 0 comments
Closed
1 task

An exception occurred during handling of failed request #2746

HJK181 opened this issue Nov 14, 2024 · 0 comments
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@HJK181
Copy link

HJK181 commented Nov 14, 2024

Which package is this bug report for? If unsure which one to select, leave blank

@crawlee/http (HttpCrawler)

Issue description

My HttpCrawler fails with

DEBUG HttpCrawler: Crawled 18/547 pages, 0 failed requests, desired concurrency 2.
ERROR HttpCrawler: An exception occurred during handling of failed request. This places the crawler and its underlying storages into an unknown state and crawling will be terminated. 
  Handling request failure of https://cc-stage-shopware.kleerly.de/epson/stylus/ (3FHkwFLmzy6kAAL) timed out after 300 seconds.
      at Timeout._onTimeout (/home/myuser/node_modules/@apify/timeout/cjs/index.cjs:64:68)
      at listOnTimeout (node:internal/timers:581:17)
      at process.processTimers (node:internal/timers:519:7)
ERROR HttpCrawler:AutoscaledPool: runTaskFunction failed.
  Handling request failure of https://cc-stage-shopware.kleerly.de/epson/stylus/ (3FHkwFLmzy6kAAL) timed out after 300 seconds.
      at Timeout._onTimeout (/home/myuser/node_modules/@apify/timeout/cjs/index.cjs:64:68)
      at listOnTimeout (node:internal/timers:581:17)
      at process.processTimers (node:internal/timers:519:7)

The only kind of errors logged before this happened were (19 times according to SDK_CRAWLER_STATISTICS_6.json):

ERROR Request failed with Detected a session error, rotating session... 
Proxy responded with 500 Unable to connect: 0 bytes.

Below is the first 100 bytes of the proxy response body:

Code sample

const crawler = new HttpCrawler(
      {
        maxConcurrency: 2,
        maxRequestsPerMinute: 180,
        ...options,
        proxyConfiguration,
        useSessionPool: true,
        persistCookiesPerSession: true,
        additionalMimeTypes: ["text/plain", "application/pdf"],
        async requestHandler({ pushData, request, response }) {
          await pushData({
            url: request.url,
            statusCode: response.statusCode,
          });
        },
        async failedRequestHandler({ pushData, request, response }) {
          log.error(`Request for URL "${request.url}" failed.`);
          await pushData({
            url: request.url,
            statusCode: response?.statusCode ?? 0,
          });
        },
        async errorHandler({ request }, { message }) {
          log.error(`Request failed with ${message}`);
          if (!request.noRetry) {
            const baseWaitTime = Math.pow(2, request.retryCount) * 1000;
            const jitter = baseWaitTime * (Math.random() - 0.5);
            const waitTime = baseWaitTime + jitter;
            await new Promise((resolve) => setTimeout(resolve, waitTime));
          }
        },
      },
      config,
    );
crawler.run(urls)

Package version

crawlee@3.11.5

Node.js version

v20.18.0

Operating system

Docker image From apify/actor-node-playwright-chrome

Apify platform

  • Tick me if you encountered this issue on the Apify platform

I have tested this on the next release

No response

Other context

No response

@HJK181 HJK181 added the bug Something isn't working. label Nov 14, 2024
@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Nov 14, 2024
@apify apify locked and limited conversation to collaborators Nov 14, 2024
@B4nan B4nan converted this issue into discussion #2747 Nov 14, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

1 participant