You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, this error shows up frequently. I don't know how to handle it. Besides wasting memory, it's really annoying and it might be slowing down the scrape.
Here i am using multiple contexts with 1 page each but it was the same when i used multiple pages with 1 context.
I am even creating new context for each page and closing them within both parse and errback. (The code is below the error.)
I am only allowing requests for html, maybe the library is trying to handle other requests after i closed the page within parse. Though, i haven't looked into the source code so i've got no clue.
Can anyone help me?
[asyncio] ERROR: Exception in callback AsyncIOEventEmitter._emit_run.<locals>.callback(<Task finishe...been closed')>) at /usr/local/lib/python3.9/dist-packages/pyee/asyncio.py:65
handle: <Handle AsyncIOEventEmitter._emit_run.<locals>.callback(<Task finishe...been closed')>) at /usr/local/lib/python3.9/dist-packages/pyee/asyncio.py:65>
Traceback (most recent call last):
File "/usr/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/usr/local/lib/python3.9/dist-packages/pyee/asyncio.py", line 71, in callback
self.emit("error", exc)
File "/usr/local/lib/python3.9/dist-packages/pyee/base.py", line 179, in emit
self._emit_handle_potential_error(event, args[0] if args else None)
File "/usr/local/lib/python3.9/dist-packages/pyee/base.py", line 139, in _emit_handle_potential_error
raise error
File "/usr/local/lib/python3.9/dist-packages/scrapy_playwright/handler.py", line 606, in _log_request
referrer = await request.header_value("referer")
File "/usr/local/lib/python3.9/dist-packages/playwright/async_api/_generated.py", line 381, in header_value
return mapping.from_maybe_impl(await self._impl_obj.header_value(name=name))
File "/usr/local/lib/python3.9/dist-packages/playwright/_impl/_network.py", line 232, in header_value
return (await self._actual_headers()).get(name)
File "/usr/local/lib/python3.9/dist-packages/playwright/_impl/_network.py", line 240, in _actual_headers
headers = await self._channel.send("rawRequestHeaders")
File "/usr/local/lib/python3.9/dist-packages/playwright/_impl/_connection.py", line 61, in send
return await self._connection.wrap_api_call(
File "/usr/local/lib/python3.9/dist-packages/playwright/_impl/_connection.py", line 461, in wrap_api_call
return await cb()
File "/usr/local/lib/python3.9/dist-packages/playwright/_impl/_connection.py", line 96, in inner_send
result = next(iter(done)).result()
playwright._impl._api_types.Error: Target page, context or browser has been closed
This happens when there are scheduled playwright page callbacks (created via page.on()) that have yet to be processed when you close the context. In this case their calls to page coroutines (like this one in scrapy-playwright's default request callback) will produce this error.
Closing the page before closing the context should allow playwright to unravel the callbacks first:
awaitpage.close()
awaitpage.context.close()
@elacuesta this commonly happens for pages with telemetry, e.g. pages on amazon.com will make regular requests to unagi.amazon.com after returning the initial page. Maybe it could be adjusted in example code in the "Closing a context during a crawl" README section?
(edited, added tag to code link in order to make it a permalink)
Hi, this error shows up frequently. I don't know how to handle it. Besides wasting memory, it's really annoying and it might be slowing down the scrape.
Here i am using multiple contexts with 1 page each but it was the same when i used multiple pages with 1 context.
I am even creating new context for each page and closing them within both parse and errback. (The code is below the error.)
I am only allowing requests for html, maybe the library is trying to handle other requests after i closed the page within parse. Though, i haven't looked into the source code so i've got no clue.
Can anyone help me?
Here is the code :
The text was updated successfully, but these errors were encountered: