Skip to content

Handle unprocessed requests in batch_add_requests #456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Pijukatel opened this issue Apr 16, 2025 · 2 comments · Fixed by apify/crawlee-python#1159
Closed

Handle unprocessed requests in batch_add_requests #456

Pijukatel opened this issue Apr 16, 2025 · 2 comments · Fixed by apify/crawlee-python#1159
Labels
enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@Pijukatel
Copy link
Contributor

Pijukatel commented Apr 16, 2025

in _request_queue_client.py (async and sync client) there is a batch_add_requests method. This method is calling Apify API which can return some unprocessed requests in rare cases. This can happen for example due to rate limiting when there is high load. Currently it seems that those unprocessed requests are not handled in any way and will be just ignored.

Maybe the method should at least retry to add unprocessed requests in some final batch? Maybe with some backoff?

(Some context apify/crawlee-python#1155 (comment))

@Pijukatel Pijukatel added enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team. labels Apr 16, 2025
@B4nan
Copy link
Member

B4nan commented Apr 16, 2025

IIRC we deal with this in crawlee, at least in the JS version.

@Pijukatel
Copy link
Contributor Author

IIRC we deal with this in crawlee, at least in the JS version.

I do not think it is done in Python version. There we assume it just works and we only print the response. This response can contain info about unprocessed requests, but we do not handle that there.

@janbuchar I might have missed some handling on some other level, so please correct me if I am wrong.

Pijukatel added a commit to apify/crawlee-python that referenced this issue Apr 23, 2025
### Description
Adds retry to unprocessed requests in call `add_requests_batched`.
Retry calls recursively `_process_batch`, which initially works on full
request batch and then on batches of unprocessed requests until retry
limit is reached or all requests are processed. Each retry is done after
linearly increasing delay with each attempt.

Unprocessed requests are not counted in `request_queue.get_total_count`
Add test.

### Issues

- Closes: [Handle unprocessed requests in
batch_add_requests](apify/apify-sdk-python#456)
Mantisus pushed a commit to Mantisus/crawlee-python that referenced this issue Apr 24, 2025
### Description
Adds retry to unprocessed requests in call `add_requests_batched`.
Retry calls recursively `_process_batch`, which initially works on full
request batch and then on batches of unprocessed requests until retry
limit is reached or all requests are processed. Each retry is done after
linearly increasing delay with each attempt.

Unprocessed requests are not counted in `request_queue.get_total_count`
Add test.

### Issues

- Closes: [Handle unprocessed requests in
batch_add_requests](apify/apify-sdk-python#456)
Mantisus pushed a commit to Mantisus/crawlee-python that referenced this issue Apr 24, 2025
### Description
Adds retry to unprocessed requests in call `add_requests_batched`.
Retry calls recursively `_process_batch`, which initially works on full
request batch and then on batches of unprocessed requests until retry
limit is reached or all requests are processed. Each retry is done after
linearly increasing delay with each attempt.

Unprocessed requests are not counted in `request_queue.get_total_count`
Add test.

### Issues

- Closes: [Handle unprocessed requests in
batch_add_requests](apify/apify-sdk-python#456)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants