Skip to content
This repository has been archived by the owner on May 8, 2020. It is now read-only.

Websocket connection is lost on some websites (ConnectionClosed) #62

Open
hartym opened this issue Apr 11, 2018 · 31 comments
Open

Websocket connection is lost on some websites (ConnectionClosed) #62

hartym opened this issue Apr 11, 2018 · 31 comments
Labels

Comments

@hartym
Copy link
Contributor

hartym commented Apr 11, 2018

I have a script that basically open a tab, plays with a website, close the tab. Loop.

On my laptop (OSX), it works just fine. Now when I run that in a container (either with /dev/shm disabled, or with --shm-size=2g), it does work, but at some point the websocket connection is lost and an exception happens in pyppeteer.connection.Connection, in _async_send() (websocket.ConnectionClosed I think).

[EDIT 04/12] this is not container related, I can have the same error happen on my local laptop. For some reason, it happens less, but still happens.

Error:

websockets.exceptions.ConnectionClosed  WebSocket connection is closed: code = 1006 (connection closed abnormally [internal]), no reason
ERR.:0054:asyncio: Task exception was never retrieved
future: <Task finished coro=<Connection._async_send() done, defined at /env/lib/python3.6/site-packages/pyppeteer/connection.py:61> exception=ConnectionClosed('WebSocket connection is closed: code = 1006 (connection closed abnormally [internal]), no reason',)>
│ Traceback (most recent call last):
│   File "/env/lib/python3.6/site-packages/pyppeteer/connection.py", line 64, in _async_send
│     await self.connection.send(msg)
│   File "/env/lib/python3.6/site-packages/websockets/protocol.py", line 334, in send
│     yield from self.ensure_open()
│   File "/env/lib/python3.6/site-packages/websockets/protocol.py", line 470, in ensure_open
│     raise ConnectionClosed(self.close_code, self.close_reason)
└ websockets.exceptions.ConnectionClosed  WebSocket connection is closed: code = 1006 (connection closed abnormally [internal]), no reason

I tried to add the debug info of asyncio and look at the logs, and it seems that at some point, for no reason that I can think of, there is indeed a OP_CLOSE frame that goes through write_frame() coroutine in the ws protocol. I'm a bit lacking knowledge on asyncio, and can't really find a way to trace it back to the reason of this close.

Then, the next async_send() complains, with an uncaught error.

    async def _async_send(self, msg: str) -> None:
        while not self._connected:
            await asyncio.sleep(self._delay)
        await self.connection.send(msg)

I tried to modify the above code to catch the connection error aroud the send() call, and try to reconnect blindly (ok that was a bit naive, but who knows), but that did not do the trick. Probably because I do it at a random time, without knowing any state.

Note that the browser process does not die, and I can reproduce it more systematically if I enclose the browser.goto call with a asyncio.wait_for with small timeout (small enough so that it can't goto for real). I guess the later is a bad idea, and I stopped trying to do it, but still it happens at complete random time even without. Low or high goto timeouts, 50 tabs or only 1, 10 seconds after running the script or 30 minute after, etc. and never on the same URL.

I don't know what kind of information I can provide to help, or what kind of tools I can use to debug and fix that. The asyncio's "hey, here is your empty stack trace, have fun" way of helping me is giving me a hard time.

@hartym hartym changed the title In linux container, the websocket connection is randomly lost after 1 to 30 minutes of browsing, causing an uncaught execption Websocket connection is randomly lost after 1 to 30 minutes of browsing, causing an uncaught execption Apr 12, 2018
@hartym
Copy link
Contributor Author

hartym commented Apr 12, 2018

Update:

Still strugling in tracing that but it looks like it happens because page.close() is awaited too soon after the page.goto(...) went timeout.

I reduce greatly the amount of errors (like, to about 0 if not 0, will confirm in a few hours) either if:

  • I slow down the machine that runs the script. Giving only 0.2 CPU instead of 4 to 8 like I was trying before, and I don't see the error happen anymore
  • I await an asyncio.sleep(1) before awaiting page.close().

I suspect there is a race condition between some callback that happens on goto timeout and page.close()

[edit] The "slowness fixes the bug" hypothesis is erroneous. There is something related to the websites opened that causes this. The reason why it's so "random" is that I use a message queue (rabbitmq) that will reschedule any cancelled task at same spot they were before. So for long time, no buuggy site, no error. If one buggy site comes to amqp head, then each time it kills a worker another worker will get it, and die too. Loop that. Funny, right?

@miyakogi
Copy link
Owner

Great thanks for detail information and suggestion.
I don't have good idea to fix this soon, but I will check page.goto's timeout sequence.

@hartym hartym changed the title Websocket connection is randomly lost after 1 to 30 minutes of browsing, causing an uncaught execption Websocket connection is randomly lost (websockets.exceptions.ConnectionClosed uncaught in _async_send after OP_CLOSE) Apr 13, 2018
@hartym hartym changed the title Websocket connection is randomly lost (websockets.exceptions.ConnectionClosed uncaught in _async_send after OP_CLOSE) Websocket connection is randomly lost (ConnectionClosed) Apr 13, 2018
@hartym
Copy link
Contributor Author

hartym commented Apr 13, 2018

@miyakogi Thanks for the reply.

I can confirm for sure that it is not docker related, I just spent some time installing my code on a machine without containers, and the error happens exactly the same way.

Trying to create a minimal code file for reproduction (even if it's happening at random, it's 100% reproductible, if one waits enough time, which can be 10 seconds sometime)

The code to reproduce is not much more than what is the hello world of pyppeteer, btw.

@hartym
Copy link
Contributor Author

hartym commented Apr 13, 2018

Related websocket log :

DEBG:0087:websockets.protocol: client > Frame(fin=True, opcode=8, data=b'\x03\xe8', rsv1=False, rsv2=False, rsv3=False)
DEBG:0087:websockets.protocol: client - eof_received()
DEBG:0087:websockets.protocol: client ! failing WebSocket connection: 1006
DEBG:0087:websockets.protocol: client - connection_lost(None)
DEBG:0087:websockets.protocol: client x closing TCP connection

Looks like it shows the "send" with OP_CLOSE opcode is actually coming from the pyppeteer (but again, I'm not sure I understand everything here).

@hartym
Copy link
Contributor Author

hartym commented Apr 13, 2018

@miyakogi Unlike everything I believed until then, it seems the error happens because of something the remote site has. Can't understand why, but I can reproduce it 100% (whatever the platform is) using this website (probably found in the 90s):

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch()
    page = await browser.newPage()
    try:
        await page.goto('http://www.kunstenknipwerk.com/', timeout=10000)
    finally:
        await page.close()

asyncio.get_event_loop().run_until_complete(main())

The randomness of the error for me then comes from the fact I use a queue that never contains the same thing.

@hartym
Copy link
Contributor Author

hartym commented Apr 13, 2018

Note that the equivalent puppeteer code on the same website works as expected and does not close the websocket connection:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  let page = await browser.newPage();
  try {
    await page.goto('http://www.kunstenknipwerk.com/', {timeout: 10000});
  }
  catch (e) {
    console.error(e)
  }
  finally {
    await page.close();
  }
  console.log('done');
})();

@hartym
Copy link
Contributor Author

hartym commented Apr 13, 2018

Here are some other examples of urls that produce this bug:

https://www.iu.edu/
http://www.teletape.info/

One challenge is that not all urls reproduce the bug on all origins. I suspect websites to be slightly different depending on the origin (think of an ad server serving an ad that change on each request ...) and although the urls I pasted here seems to reproduce the bug 100% on my machines, some urls behave differently and only bug on one or a subset of machines ...

@hartym hartym changed the title Websocket connection is randomly lost (ConnectionClosed) Websocket connection is lost on some websites (ConnectionClosed) Apr 13, 2018
@miyakogi
Copy link
Owner

Thank you for minimal reproducible code.
This problem is reproduced by the code.
I will try to fix it, but may take some time...

@miyakogi miyakogi added the bug label Apr 15, 2018
@miyakogi
Copy link
Owner

After merging your PR (#64), this problem does not reproduce on my machine.
How about on your environment?

@hartym
Copy link
Contributor Author

hartym commented Apr 15, 2018

@miyakogi That's quite unexpected, should be completely unrelated, but on my local box it looks like the it works for the urls I listed here. Unfortunately, I upgraded a few of my instances to use dev version of pyppeteer, and I still see the error happening on other pages. I'm going to put in place a simple logging infrastructure for the dubious urls, which is a bit tricky as I don't know which tab is the culprit when it happens. But my "stupid" strategy on this is (a.k.a "post somewhere all urls that are currently processing when the error happen and try them one by one in a single tab spider") should help me find urls that still have this problem.

@hartym
Copy link
Contributor Author

hartym commented Apr 15, 2018

Oh and thanks for looking into it, I understand it may take some time as I already spent around 1 full week struggling with this, really hard to trace (but you for sure know better the internals of pyppeteer so it may be easier for you).

@hartym
Copy link
Contributor Author

hartym commented Apr 18, 2018

So, update here.

Indeed it looks like the patches "mostly" fixes the problem.

I still have it on some websites (but not 100% reproductible, looks like the behavior is a bit different from my local network than from the spiders networks).

The thing is, if an error happening in some handler causes the browser ws connection to crash, then there is probably something in pyppeteer that we can do to catch all errors and at least log the original error. It's basically non debuggable if it just says "wops, connection closed (1006)", but it's a matter of seconds to fix it if it says "cannot do X on NoneType". Do you have a good idea on where this code could go? I can work on a patch, but I'm not confident enough with the pyppeteer codebase yet to think of the best place to catch everything.

Thanks.

@debugtalk
Copy link

I have the the same problem. Has this been fixed yet?

@hartym
Copy link
Contributor Author

hartym commented Jun 9, 2018

As far as I know, it is not fixed. I still have the problem from time to time even if using develop makes it happen way less often. @miyakogi any idea on this regard ?

@RunningToTheEdgeOfTheWorld

I got this problem too...

@caglarsayin
Copy link

caglarsayin commented Oct 22, 2018

Same problem

@Qianlitp
Copy link

Qianlitp commented Nov 6, 2018

I got the exception at random time when I use multiprocess and open many pages.

Task exception was never retrieved
future: <Task finished coro=<Connection._async_send() done, defined at /opt/python3/lib/python3.6/site-packages/pyppeteer/connection.py:69> exception=InvalidStateError('invalid state',)>
Traceback (most recent call last):
  File "/opt/python3/lib/python3.6/site-packages/websockets/protocol.py", line 528, in transfer_data
    msg = yield from self.read_message()
  File "/opt/python3/lib/python3.6/site-packages/websockets/protocol.py", line 580, in read_message
    frame = yield from self.read_data_frame(max_size=self.max_size)
  File "/opt/python3/lib/python3.6/site-packages/websockets/protocol.py", line 645, in read_data_frame
    frame = yield from self.read_frame(max_size)
  File "/opt/python3/lib/python3.6/site-packages/websockets/protocol.py", line 710, in read_frame
    extensions=self.extensions,
  File "/opt/python3/lib/python3.6/site-packages/websockets/framing.py", line 100, in read
    data = yield from reader(2)
  File "/opt/python3/lib/python3.6/asyncio/streams.py", line 672, in readexactly
    raise IncompleteReadError(incomplete, n)
asyncio.streams.IncompleteReadError: 0 bytes read on a total of 2 expected bytes

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/python3/lib/python3.6/site-packages/pyppeteer/connection.py", line 73, in _async_send
    await self.connection.send(msg)
  File "/opt/python3/lib/python3.6/site-packages/websockets/protocol.py", line 361, in send
    yield from self.ensure_open()
  File "/opt/python3/lib/python3.6/site-packages/websockets/protocol.py", line 501, in ensure_open
    self.close_code, self.close_reason) from self.transfer_data_exc
websockets.exceptions.ConnectionClosed: WebSocket connection is closed: code = 1006 (connection closed abnormally [internal]), no reason

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/python3/lib/python3.6/site-packages/pyppeteer/connection.py", line 79, in _async_send
    await self.dispose()
  File "/opt/python3/lib/python3.6/site-packages/pyppeteer/connection.py", line 170, in dispose
    await self._on_close()
  File "/opt/python3/lib/python3.6/site-packages/pyppeteer/connection.py", line 153, in _on_close
    f'Protocol error {cb.method}: Target closed.',  # type: ignore
asyncio.base_futures.InvalidStateError: invalid state```

@RunningToTheEdgeOfTheWorld

I found some hints. It may be not websockets problem.
Some web socket server will close your connection if you not send ping message, so check your connect server's rule may help.
And even though send ping message, some server will close your connection after 24 hours

@nurettin
Copy link

nurettin commented Nov 7, 2018

I have an interesting setup. When I run without pipenv, I get no problems. When I run with pipenv, I get connection closed within a few seconds of opening target website. Any way I can assist in finding the culprit?

@nurettin
Copy link

nurettin commented Nov 7, 2018

It appears pipenv has websockets 7.0 and the normal userspace site-packages has websockets 6.0, it may be because of this difference.

@nurettin
Copy link

nurettin commented Nov 7, 2018

Yeah, downgraded to websockets 6.0, it works fine now. websockets 7.0 consistently disconnects after some time for all websites.

@Qianlitp
Copy link

Qianlitp commented Nov 8, 2018

@RunningToTheEdgeOfTheWorld I agree with you. I found that chrome navigated some unstable urls would cost a lot of time and the exception appeared frequently. May be pyppeteer should keep the connection by sending ping message.

@taewookim
Copy link

taewookim commented Jan 16, 2019

Yeah, downgraded to websockets 6.0, it works fine now. websockets 7.0 consistently disconnects after some time for all websites.

@nurettin Dumb question, how do you do this?

@nurettin
Copy link

nurettin commented Jan 17, 2019

Yeah, downgraded to websockets 6.0, it works fine now. websockets 7.0 consistently disconnects after some time for all websites.

@nurettin how do you do this?

You edit your Pipfile or requirements.txt manually and replace websockets==7.0 with 6.0

But I recommend monkey patching for now see #160

@taewookim
Copy link

duh... I didnt realize websockets was a pip package.
thanks

@taewookim
Copy link

Don't mean to hijack this thread, but i wasn't sure if this was related..
Is there a way to force websocket connections to terminate on pyppeteer? (the issue here)

ivanistheone added a commit to learningequality/sushi-chef-edraak that referenced this issue Feb 4, 2019
  - Need to wait for page to fully load in order to get component url
  - Pyeterr has problem with latest websockets==7.0 , see miyakogi/pyppeteer#62 (comment)
  - pinned websockets to 6.0 and to make it work
@ingmferrer
Copy link

Bump this. It needs to be resolved in master and the solution is to downgrade websockets.

Yeah, downgraded to websockets 6.0, it works fine now. websockets 7.0 consistently disconnects after some time for all websites.

@nurettin how do you do this?

You edit your Pipfile or requirements.txt manually and replace websockets==7.0 with 6.0

But I recommend monkey patching for now see #160

byt3bl33d3r added a commit to byt3bl33d3r/WitnessMe that referenced this issue Nov 3, 2019
- Applied path from miyakogi/pyppeteer#160 in
order to fix miyakogi/pyppeteer#62 which made
the websocket connection to Chromium close after ~20s

- Reworked logic to use a producer/consumer pattern
@byt3bl33d3r
Copy link

This issue seems to be still a problem on some websites even after applying the proposed patches for the ping_timeout argument and/or downgrading to websockets 6.0.

@Chestermozhao
Copy link

Does this problem be fixed? I got same problem, and I used uvicorn which need websockets>=8.0...

@sjoerd222888
Copy link

Still problems with newest websocket package. Does somebody has an idea what goes wrong? Would be nice to open an issue for the websocket package.

@Mattwmaster58
Copy link

No idea what's wrong, I think it has something to do with the websockets dependancy. With 6.0, there are no issues, with 8.1, issues left and right

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests