DotNetty.Codecs.CorruptedFrameException: negative pre-adjustment length field #3879

leonelcm · 2019-08-07T13:19:14Z

- Which Akka.Net version you are using
Akka.net: 1.3.13
DotNetty: 0.6.0

- On which platform you are using Akka.Net
Windows

- A list of steps to reproduce the issue. Or an gist or github repo which can be easily used to reproduce your case.
The following exceptions were generated last night while we were processing messages:

Error caught channel "[::ffff:165.33.24.232]:8910"->"[::ffff:165.34.44.22]:35690"
{
"Depth": 0,
"ClassName": "DotNetty.Codecs.CorruptedFrameException",
"Message": "negative pre-adjustment length field: -160",
"Source": "DotNetty.Codecs",
"StackTraceString": " at DotNetty.Codecs.LengthFieldBasedFrameDecoder.Decode(IChannelHandlerContext context, IByteBuffer input)\r\n at DotNetty.Codecs.LengthFieldBasedFrameDecoder.Decode(IChannelHandlerContext context, IByteBuffer input, List1 output)\r\n at DotNetty.Codecs.ByteToMessageDecoder.CallDecode(IChannelHandlerContext context, IByteBuffer input, List1 output)\r\n at DotNetty.Codecs.ByteToMessageDecoder.ChannelRead(IChannelHandlerContext context, Object message)\r\n at DotNetty.Transport.Channels.AbstractChannelHandlerContext.InvokeChannelRead(Object msg)",
"RemoteStackTraceString": null,
"RemoteStackIndex": 0,
"ExceptionMethod": {
"Name": "Decode",
"AssemblyName": "DotNetty.Codecs",
"AssemblyVersion": "0.6.0.0",
"AssemblyCulture": "",
"ClassName": "DotNetty.Codecs.LengthFieldBasedFrameDecoder",
"Signature": "System.Object Decode(DotNetty.Transport.Channels.IChannelHandlerContext, DotNetty.Buffers.IByteBuffer)",
"MemberType": 8
},
"HResult": -2146233088,
"HelpURL": null
}
No response from remote for inbound association. Handshake timed out after [15000] ms
Error caught channel "[::ffff:165.33.24.232]:8910"->"[::ffff:165.34.44.22]:53650"
{
"Depth": 0,
"ClassName": "DotNetty.Codecs.TooLongFrameException",
"Message": "Adjusted frame length exceeds 50000: 721008 - discarded",
"Source": "DotNetty.Codecs",
"StackTraceString": " at DotNetty.Codecs.LengthFieldBasedFrameDecoder.Fail(Int64 frameLength)\r\n at DotNetty.Codecs.LengthFieldBasedFrameDecoder.FailIfNecessary(Boolean firstDetectionOfTooLongFrame)\r\n at DotNetty.Codecs.LengthFieldBasedFrameDecoder.Decode(IChannelHandlerContext context, IByteBuffer input)\r\n at DotNetty.Codecs.LengthFieldBasedFrameDecoder.Decode(IChannelHandlerContext context, IByteBuffer input, List1 output)\r\n at DotNetty.Codecs.ByteToMessageDecoder.CallDecode(IChannelHandlerContext context, IByteBuffer input, List1 output)\r\n at DotNetty.Codecs.ByteToMessageDecoder.ChannelRead(IChannelHandlerContext context, Object message)\r\n at DotNetty.Transport.Channels.AbstractChannelHandlerContext.InvokeChannelRead(Object msg)",
"RemoteStackTraceString": null,
"RemoteStackIndex": 0,
"ExceptionMethod": {
"Name": "Fail",
"AssemblyName": "DotNetty.Codecs",
"AssemblyVersion": "0.6.0.0",
"AssemblyCulture": "",
"ClassName": "DotNetty.Codecs.LengthFieldBasedFrameDecoder",
"Signature": "Void Fail(Int64)",
"MemberType": 8
},
"HResult": -2146233088,
"HelpURL": null
}

Unfortunately I can't reproduce the issues as this happened in PROD and we process millions of messages a day. Any thoughts on what could be the issue and could be the impact?

Thank you,
Leonel

The text was updated successfully, but these errors were encountered:

Aaronontheweb · 2019-08-07T15:11:54Z

Hi @leonelcm

We've had some other reports similar to this one and they've been difficult to reproduce, the biggest reason being that we don't even know what the message type is at this stage in the decoding pipeline.

The reason why this is happening is because the 4 byte header we append to the front of each message (this is a standard socket message encoding technique) and in this instance the header appears to have some bad information in it.

Here's the line that threw:

https://github.com/Azure/DotNetty/blob/13f009426687afc7c8d60ea653f6bc59d520eb2b/src/DotNetty.Codecs/LengthFieldBasedFrameDecoder.cs#L329-L333

This could have happened because this piece of math that happened just before:

int actualLengthFieldOffset = input.ReaderIndex + this.lengthFieldOffset;
long frameLength = this.GetUnadjustedFrameLength(input, actualLengthFieldOffset, this.lengthFieldLength, this.byteOrder);

Was wrong. I think what is likely happening in this case is that the input.ReaderIndex is wrong at the start of the read - we pool those byte buffers and are supposed to release them and return them to the pool. I suspect there's a race condition of some kind where an item can be returned to the pool dirty - in the case of decoding multiple messages the reader index isn't set correctly at the end of one message and the beginning of another.

Aaronontheweb · 2019-08-07T15:19:46Z

Going forward, I can address this in one of two ways:

Run some FsCheck tests over DotNetty's decoders using a combination of dirty and clean buffers and a small, in-memory message processing pipeline that pretty closely simulates what we do with Akka.Remote. That might expose an edge case that isn't being covered correctly by this code today. I'd have to patch DotNetty and get a new release out, which might be hard considering that the Microsoft guys who run that project have gone really quiet lately. Worse case though - I can write my own LengthFrameDecoder and add it to the remoting pipeline without doing any code updates.
Resume my work on getting rid of DotNetty altogether and work on moving towards a pure Akka.Streams TCP transport, which is something I started but haven't picked back up on yet here: https://github.com/Aaronontheweb/Akka.Streams.RemoteTransport - primary goal of this transport is performance and that's a totally experimental design that I haven't finished yet. But if we got that working, that would also solve this problem.

Aaronontheweb · 2019-08-08T20:06:46Z

Wonder if these are related #3273

leonelcm · 2019-08-12T17:04:24Z

@Aaronontheweb, thank you very much for your feedback. Would the messages be dropped when this issue occur? What is the impact of these issues?

Aaronontheweb · 2019-09-20T17:42:43Z

I think this is going to be difficult to locate, specifically, without a really good reproduction spec, but I ran into this myself yesterday while working on some tests with Akka.Remote.

I think the issue in this case is that DotNetty, for whatever reason, can occasionally corrupt frames using its decoding scheme (length frame.) I also think it has to do with how the input buffers are being re-used inside DotNetty as well.

We're about to embark on a number of issues for Akka.NET v1.4.0 that might render this moot - namely moving off of DotNetty and implementing Artery for Akka.NET TCP.

Aaronontheweb · 2019-12-28T18:44:16Z

After running some detailed tests as part of #4106 I've determined that this is ultimately an issue with the LengthFrameDecoder / LengthFrameEncoder and the DotNetty reference-counting system for pooled memory. It's infrequent, but it does happen somewhat racily. I believe the issue in this case is a dirty write during the encoding process which encodes the frame-length header incorrectly.

Workaround

There is a temporary work-around for this problem which should have a fairly minimal performance impact (as determined by my numbers):

akka.remote.dot-netty.tcp.enable-pooling = false

This will disable buffer pooling altogether and thus work around the source of this issue.

Solution

The solution is to programmatically either fix our use of reference-counting (.Retain() and .Release() calls) inside our DotNetty pipeline or simply disable the use of pooling inside the remote transport entirely.

…kkadotnet#4252)

Aaronontheweb · 2020-03-09T22:55:10Z

included this into Akka.NET v1.3.18: https://github.com/akkadotnet/akka.net/releases/tag/1.3.18

Aaronontheweb added akka-remote dotnetty confirmed bug labels Aug 7, 2019

Aaronontheweb added this to the 1.4.0 milestone Aug 7, 2019

Aaronontheweb self-assigned this Dec 28, 2019

Aaronontheweb mentioned this issue Feb 25, 2020

Adjusted frame length exceeds 1300000000 #4244

Closed

Aaronontheweb added a commit to Aaronontheweb/akka.net that referenced this issue Feb 25, 2020

close akkadotnet#3879 - disable buffer pooling in DotNetty transport

0558f1a

Aaronontheweb mentioned this issue Feb 25, 2020

disable buffer pooling in DotNetty transport #4252

Merged

Aaronontheweb closed this as completed in #4252 Feb 26, 2020

Aaronontheweb added a commit that referenced this issue Feb 26, 2020

close #3879 - disable buffer pooling in DotNetty transport (#4252)

94c15d4

Aaronontheweb added a commit to Aaronontheweb/akka.net that referenced this issue Mar 9, 2020

close akkadotnet#3879 - disable buffer pooling in DotNetty transport (a…

9475326

…kkadotnet#4252)

Aaronontheweb mentioned this issue Mar 9, 2020

added v1.3.18 release notes #4311

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DotNetty.Codecs.CorruptedFrameException: negative pre-adjustment length field #3879

DotNetty.Codecs.CorruptedFrameException: negative pre-adjustment length field #3879

leonelcm commented Aug 7, 2019

Aaronontheweb commented Aug 7, 2019

Aaronontheweb commented Aug 7, 2019

Aaronontheweb commented Aug 8, 2019

leonelcm commented Aug 12, 2019

Aaronontheweb commented Sep 20, 2019

Aaronontheweb commented Dec 28, 2019

Aaronontheweb commented Mar 9, 2020

DotNetty.Codecs.CorruptedFrameException: negative pre-adjustment length field #3879

DotNetty.Codecs.CorruptedFrameException: negative pre-adjustment length field #3879

Comments

leonelcm commented Aug 7, 2019

Aaronontheweb commented Aug 7, 2019

Aaronontheweb commented Aug 7, 2019

Aaronontheweb commented Aug 8, 2019

leonelcm commented Aug 12, 2019

Aaronontheweb commented Sep 20, 2019

Aaronontheweb commented Dec 28, 2019

Workaround

Solution

Aaronontheweb commented Mar 9, 2020