Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling errorError -3 while decompressing data: invalid distance too far error #665

Closed
bsergean opened this issue Aug 26, 2019 · 11 comments

Comments

@bsergean
Copy link

bsergean commented Aug 26, 2019

Problem

I am getting this error quite a bit in production: errorwebsockets.extensions.permessage_deflate in decode errorError -3 while decompressing data: invalid distance too far

stacktrace

My per message deflate settings are:

PerMessageDeflate(remote_no_context_takeover=False, local_no_context_takeover=False, remote_max_window_bits=15, local_max_window_bits=15)

The data to be decompressed is relatively small: [26, 92, 39, 167, 24, 89, 90, 32, 15, 212, 140, 158, 156, 50, 122, 114, 202].

I am getting a bunch of those errors and I'm not too sure on how to fix it, help troubleshoot it. Some internet threads suggest that this can be fixed by upgrading zlib. I have a feeling that the bug is more likely to be in my C++ websocket client code.

The client code talking to my websockets based server (cobra)[https://github.com/machinezone/cobra] is written in C++. Here are the zlib bits.

There is a thread about window size of size 8 being bogus. I don't think this is my problem because I am setting

Ideas

  1. I am thinking about setting a User Agent that report the OS Version + zlib version used, to see if this has anything to do with a platform specific bug.
  2. Having the server advertize a smaller max_window_bits (14 instead of 15, or 13, etc...)
  3. Should an exception be caught in the websocket transfer data code ? Should there be a new websockets dedicated 'DeflateDecodingError' ? All the messages contains a platform id. I don't currently have what I'd call a 'global' asyncio exception handler. Maybe I need one to handle that case.
  4. Making sure that my library (IXWebSocket) is autobahn compliant. I've looked at your compliance sub-folder, but I couldn't find anything related to the zlib extension there.

Thanks for any hints.

Some random links:

@aaugustin
Copy link
Member

Hmm, that's weiiiiird.

Indeed, I don't think this has to do with the size 8 / 9 issues. Since zlib support at most a window size of 15, both sides are using the largest window size possible here.

There are a few weird things in the WebSocket per-message deflate spec. It may be easier to read the websockets implementation. From a quick look, the code you linked to seems to do the right thing, except it doesn't handle fragmentation.

Dumping the OS and zlib versions would certainly be interesting.

I don't expect a smaller window size to change much, but why not try it.

I think the exception is clear enough and probably rare enough not to warrant special handling.

One way to debug this further would be to dump raw messages. You'll have to patch websockets for this :-( Then, when an error occurs, you have the full sequence of messages that triggered it.

Finally, websockets enables compression by default if the other endpoint supports it; that's why it doesn't do anything special for autobahn compliance tests.

@bsergean
Copy link
Author

bsergean commented Aug 27, 2019 via email

@bsergean
Copy link
Author

I have figured out how to work with the autobahn test-suite (by looking at a couple of clients including yours), and I have found that many tests are failing. I'll try to fix all those and see if it helps.

@aaugustin
Copy link
Member

I don't know what else I can do here, so I'm going to close this issue. I'm sorry I couldn't help.

@bsergean
Copy link
Author

Sure make sense. It's a tough one.

If I ever figure it out I'll report here on what's the problem / my library is autobahn compliant now, and I'm reporting the url-agent which I'm trying to advertise and catch in sentry to see if the problem is platform specific.

@aliqandil
Copy link

aliqandil commented Mar 20, 2022

I recently faced the same problem, it's in production so I can't really tell how I can reproduce it.
I'll just leave my error log here, just in case its helpful:

data transfer failed
Traceback (most recent call last):
  File "/srv/.env/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 944, in transfer_data
    message = await self.read_message()
  File "/srv/.env/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 1013, in read_message
    frame = await self.read_data_frame(max_size=self.max_size)
  File "/srv/.env/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 1088, in read_data_frame
    frame = await self.read_frame(max_size)
  File "/srv/.env/lib/python3.10/site-packages/websockets/legacy/protocol.py", line 1143, in read_frame
    frame = await Frame.read(
  File "/srv/.env/lib/python3.10/site-packages/websockets/legacy/framing.py", line 109, in read
    new_frame = extension.decode(new_frame, max_size=max_size)
  File "/srv/.env/lib/python3.10/site-packages/websockets/extensions/permessage_deflate.py", line 128, in decode
    data = self.decoder.decompress(data, max_length)
zlib.error: Error -3 while decompressing data: invalid distance too far back

For a bit more context, I use a parent process and create an inheritable socket, and then pass it to multiple children each running their own servers, and this error appears once about every minute. I'm currently facing some other problems and features that I feel are lacking, but I don't think these are related to this issue so I won't mention them until I can be sure it's not something I'm doing wrong.

@aaugustin
Copy link
Member

This came up again; let's try to debug it.

If you'd like to help, see #1160 (comment) for debugging instructions.

@bsergean
Copy link
Author

Thanks, however I don't have easy access to the project that triggered the problem (I can't deploy changes), but I can bump to 10.3 in the project that was using your library, which is opensource, and ask for the new version to be deployed.

One bug we noticed was that a C++ websocket client was serializing an int as hex, which was handled fine by python but not by other libraries, could be that. We have a sysadmin who thought that this bug could be a load balancer pb too.

@bsergean
Copy link
Author

This was the C++ bug fyi

@aaugustin
Copy link
Member

Eventually I just transformed this exception into a ProtocolError, which is less noisy (i.e. not logged at the ERROR level).

@bsergean
Copy link
Author

bsergean commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants