Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Akka.Remote.EndpointException: Error while decoding incoming Akka PDU #3273

Closed
Aaronontheweb opened this issue Jan 16, 2018 · 25 comments · Fixed by #4252
Closed

Akka.Remote.EndpointException: Error while decoding incoming Akka PDU #3273

Aaronontheweb opened this issue Jan 16, 2018 · 25 comments · Fixed by #4252

Comments

@Aaronontheweb
Copy link
Member

Akka.NET v1.3.3 nightlies

Akka.Remote.EndpointException: Error while decoding incoming Akka PDU ---> Google.Protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field.  This could mean either that the input has been truncated or that an embedded message misreported its own length.
   at Google.Protobuf.CodedInputStream.SkipRawBytes(Int32 size)
   at Akka.Remote.Serialization.Proto.Msg.Payload.MergeFrom(CodedInputStream input)
   at Google.Protobuf.CodedInputStream.ReadMessage(IMessage builder)
   at Akka.Remote.Serialization.Proto.Msg.RemoteEnvelope.MergeFrom(CodedInputStream input)
   at Google.Protobuf.CodedInputStream.ReadMessage(IMessage builder)
   at Akka.Remote.Serialization.Proto.Msg.AckAndEnvelopeContainer.MergeFrom(CodedInputStream input)
   at Google.Protobuf.MessageExtensions.MergeFrom(IMessage message, ByteString data)
   at Google.Protobuf.MessageParser`1.ParseFrom(ByteString data)
   at Akka.Remote.Transport.AkkaPduProtobuffCodec.DecodeMessage(ByteString raw, RemoteActorRefProvider provider, Address localAddress)
   at Akka.Remote.EndpointReader.TryDecodeMessageAndAck(ByteString pdu)
   --- End of inner exception stack trace ---
   at Akka.Remote.EndpointReader.TryDecodeMessageAndAck(ByteString pdu)
   at Akka.Remote.EndpointReader.<Reading>b__11_1(InboundPayload inbound)
   at lambda_method(Closure , Object , Action`1 , Action`1 , Action`1 )
   at Akka.Actor.ReceiveActor.ExecutePartialMessageHandler(Object message, PartialAction`1 partialAction)
   at Akka.Actor.UntypedActor.Receive(Object message)
   at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)
   at Akka.Actor.ActorCell.ReceiveMessage(Object message)
   at Akka.Actor.ActorCell.Invoke(Envelope envelope)
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Akka.Actor.ActorCell.HandleFailed(Failed f)
   at Akka.Actor.ActorCell.SysMsgInvokeAll(EarliestFirstSystemMessageList messages, Int32 currentState)

Happened while the endpointManager actor was decoding a message.

Have also seen this occur on an endpointReader actor inside the same cluster:

[akka://SICluster/system/endpointManager/endpointWriter-akka.tcp%3A%2F%2FSICluster%4010.0.17.97%3A10050-3442#554962129]
 Message
AssociationError [akka.tcp://SICluster@10.0.16.115:10050] <- akka.tcp://SICluster@10.0.17.97:10050: Error [Error while decoding incoming Akka PDU] [   at Akka.Remote.EndpointReader.TryDecodeMessageAndAck(ByteString pdu)
   at Akka.Remote.EndpointReader.<Reading>b__11_1(InboundPayload inbound)
   at lambda_method(Closure , Object , Action`1 , Action`1 , Action`1 )
   at Akka.Actor.ReceiveActor.ExecutePartialMessageHandler(Object message, PartialAction`1 partialAction)
   at Akka.Actor.UntypedActor.Receive(Object message)
   at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)
   at Akka.Actor.ActorCell.ReceiveMessage(Object message)
   at Akka.Actor.ActorCell.Invoke(Envelope envelope)]
@Aaronontheweb
Copy link
Member Author

cc @Horusiath looks like this was an issue with the wire format not being totally compatible between 1.3.2 and 1.3.3 once WeaklyUp is turned on. The cluster had some nodes running the 1.3.3 nightlies and others running 1.3.2 stable.

We had to modify the .proto files to support WeaklyUp, no?

@Horusiath
Copy link
Contributor

@Aaronontheweb yes, new member status was added. However from what I was reading, it was marked as not breaking binary format compatibility.

@Horusiath Horusiath reopened this Jan 17, 2018
@Aaronontheweb
Copy link
Member Author

Yeah, that's what I thought upon looking at the changes to the .proto file too. Could also be that this cluster was using @nvivo's custom .NET Core / .NET Desktop intertop stuff.

@nvivo
Copy link
Contributor

nvivo commented Jan 23, 2018

Some more info on this, I noticed this error appearing when having 2 different nightlies running: 1.3.3 beta-475 and another one from a few days ago, probably 472 or 470. After updating all nodes to the same version again, it stopped.

@AndreSteenbergen
Copy link
Contributor

Can this still be an issue? I am on 1.3.9, and I am seeing this as well. Actorsystem running for about an hour (on 1 system) and it just seems to stop nodes as well.

@AndreSteenbergen
Copy link
Contributor

Might be similar, or not related at all: (enable pooling was set to false (as seen on another similar issue))
[ERROR][11/2/18 11:16:19 AM][Thread 0008][remoting] Error while decoding incoming Akka PDU
Cause: Akka.Remote.EndpointException: Error while decoding incoming Akka PDU ---> Google.Protobuf.InvalidProtocolBufferException: SkipLastField called on an end-group tag, indicating that the corresponding start-group was missing

@Aaronontheweb
Copy link
Member Author

This might still be an issue - mind submitting some information about your runtime environment @AndreSteenbergen ? It'd be helpful to know if it's Linux-specific or not.

@AndreSteenbergen
Copy link
Contributor

AndreSteenbergen commented Nov 2, 2018

Off course:
Ubuntu 16.04 LTS
Dotnet Core 2.0
Akka cluster 1.3.9 (until today the latest stable)

Problem is I can't seem to reproduce it simply, one time this happen after 5 minutes, another time oafter an hour, I am using MessagePack as serializer if that has anything to do with this

@AndreSteenbergen
Copy link
Contributor

AndreSteenbergen commented Nov 3, 2018

Can it be connected with gossip messages? I have let my cluster sit overnight without giving any tasks. I see this in the logs:

on one side (4062 port) I see this
[ERROR][11/3/18 4:52:07 AM][Thread 0006][[akka://claxe/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fclaxe%4010.0.0.30%3A4060-2/endpointWriter#1261779370]] AssociationError [akka.tcp://claxe@10.0.0.30:4062] <- akka.tcp://claxe@10.0.0.30:4060: Error [While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either that the input has been truncated or that an embedded message misreported its own length.] [ at Google.Protobuf.CodedInputStream.RefillBuffer(Boolean mustSucceed)

on the 4060 side I see this:

[ERROR][11/3/18 4:56:08 AM][Thread 0009][remoting] Error while decoding incoming Akka PDU
Cause: Akka.Remote.EndpointException: Error while decoding incoming Akka PDU ---> Google.Protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the mof a field.  This could mean either that the input has been truncated or that an embedded message misreported its own length.

@AndreSteenbergen
Copy link
Contributor

If it is of any help, I migrated from Azure to a VPS the dotnet version is 2.0.9. Because of the dotnet core 2.1 issue. Can this also be a dotnetty thing?

@AndreSteenbergen
Copy link
Contributor

Could it be my own deserializer? Not reading to the end?

[ERROR][11/3/18 7:14:53 AM][Thread 0008][remoting] Error while decoding incoming Akka PDU
Cause: Akka.Remote.EndpointException: Error while decoding incoming Akka PDU ---> Google.Protobuf.InvalidProtocolBufferException: Mismatched end-group tag. Started with field 12; ended with field 13
   at Google.Protobuf.CodedInputStream.SkipGroup(UInt32 startGroupTag)
   at Akka.Remote.Serialization.Proto.Msg.ActorRefData.MergeFrom(CodedInputStream input)
   at Google.Protobuf.CodedInputStream.ReadMessage(IMessage builder)
   at Akka.Remote.Serialization.Proto.Msg.RemoteEnvelope.MergeFrom(CodedInputStream input)
   at Google.Protobuf.CodedInputStream.ReadMessage(IMessage builder)
   at Akka.Remote.Serialization.Proto.Msg.AckAndEnvelopeContainer.MergeFrom(CodedInputStream input)
   at Google.Protobuf.MessageExtensions.MergeFrom(IMessage message, ByteString data)
   at Google.Protobuf.MessageParser`1.ParseFrom(ByteString data)
   at Akka.Remote.Transport.AkkaPduProtobuffCodec.DecodeMessage(ByteString raw, IRemoteActorRefProvider provider, Address localAddress)
   at Akka.Remote.EndpointReader.TryDecodeMessageAndAck(ByteString pdu)
   --- End of inner exception stack trace ---
   at Akka.Remote.EndpointReader.TryDecodeMessageAndAck(ByteString pdu)
   at Akka.Remote.EndpointReader.<Reading>b__11_1(InboundPayload inbound)
   at lambda_method(Closure , Object , Action`1 , Action`1 , Action`1 )
   at Akka.Actor.ReceiveActor.ExecutePartialMessageHandler(Object message, PartialAction`1 partialAction)
   at Akka.Actor.UntypedActor.Receive(Object message)
   at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)
   at Akka.Actor.ActorCell.ReceiveMessage(Object message)
   at Akka.Actor.ActorCell.Invoke(Envelope envelope)
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Akka.Actor.ActorCell.HandleFailed(Failed f)
   at Akka.Actor.ActorCell.SysMsgInvokeAll(EarliestFirstSystemMessageList messages, Int32 currentState)

@AndreSteenbergen
Copy link
Contributor

AndreSteenbergen commented Nov 3, 2018

Could this be an issue: I am also running squid on that machine. I used an old config from the machine I was migrating away from. It received constant forbiddens. Dotnet reported I was on ::1 instead of 127.0.0.1 I needed the extra acl. Could this be related? I mostly see these messages from local nodes, maybe an ipv6 issue somehow?

@AndreSteenbergen
Copy link
Contributor

Is this related?

This is on a lighthouse instance, it's unlikely any of the messages from the actorsystem gets send to that instance.

[ERROR][11/29/18 7:34:31 AM][Thread 0009][remoting] While parsing a protocol message, the input ended unexpectedly in the middle of a field.  This could mean either that the input has been truncated or that an embedded message misreported its own length.
Cause: Google.Protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field.  This could mean either that the input has been truncated or that an embedded message misreported its own length.

Later in the logs I see these kind of messages. To me it looks like a stream of data where one hick-up results in errors later in the stream. As these errors are quite contstant after a while. When I restart lighthouse, the errors seem to stop (for a while).

[ERROR][11/29/18 7:39:03 AM][Thread 0007][remoting] SkipLastField called on an end-group tag, indicating that the corresponding start-group was missing

[ERROR][11/29/18 7:35:09 AM][Thread 0007][remoting] Mismatched end-group tag. Started with field 6; ended with field 8

@AndreSteenbergen
Copy link
Contributor

Just upgraded to .net core 2.2 on ubuntu 16.04. This issue came back. It is the Lighthouse service node which throws these messages. I have configured my lighthouse system to not create custom actor. So I don't really understand what is going on.

@Aaronontheweb
Copy link
Member Author

Aaronontheweb commented Feb 18, 2019 via email

@AndreSteenbergen
Copy link
Contributor

Could this be an issue? I have set a max frame size of 256K

               dot-netty.tcp {
                        transport-class = "Akka.Remote.Transport.DotNetty.TcpTransport, Akka.Remote"
                        applied-adapters = []
                        transport-protocol = tcp
                        #will be populated with a dynamic host-name at runtime if left uncommented
                        hostname = "0.0.0.0"
                        public-hostname = "10.0.0.31"
                        port = 10160
                        maximum-frame-size = 256000b
                }

@Aaronontheweb
Copy link
Member Author

Aaronontheweb commented Feb 18, 2019 via email

@AndreSteenbergen
Copy link
Contributor

AndreSteenbergen commented Feb 18, 2019

I am not using LightHouse from docker, I compiled one myself. I run native on linux, without docker.

Google.Protobuf.InvalidProtocolBufferException: Mismatched end-group tag. Started with field 12; ended with field 13

and:

Google.Protobuf.InvalidProtocolBufferException: SkipLastField called on an end-group tag, indicating that the corresponding start-group was missing

@Aaronontheweb
Copy link
Member Author

Aaronontheweb commented Feb 18, 2019 via email

@AndreSteenbergen
Copy link
Contributor

AndreSteenbergen commented Feb 18, 2019 via email

@Aaronontheweb
Copy link
Member Author

Aaronontheweb commented Feb 18, 2019 via email

@AndreSteenbergen
Copy link
Contributor

Issue resolved (I think) .... I Checked out LightHouse from the webcrawler example, with petabridge 0.4 something. Which is packed with Akka.Cluster 1.3.10, not 1.3.11. No build errors, because Cluster was part of the project. So I had a Cluster version mismatch ..... Sorry ..

@Aaronontheweb
Copy link
Member Author

Related issue:

Exception thrown: 'Google.Protobuf.InvalidProtocolBufferException' in Google.Protobuf.dll
Additional information: Protocol message contained an invalid tag (zero). occurred

Google.Protobuf.dll!Google.Protobuf.CodedInputStream.ReadTag() + 0x1cd bytes
Akka.Remote.dll!Akka.Remote.Serialization.Proto.Msg.AckAndEnvelopeContainer.MergeFrom(Google.Protobuf.CodedInputStream input) + 0x12f bytes
Google.Protobuf.dll!Google.Protobuf.MessageExtensions.MergeFrom(Google.Protobuf.IMessage message, Google.Protobuf.ByteString data) + 0xa3 bytes
Google.Protobuf.dll!Google.Protobuf.MessageParser<Akka.Remote.Serialization.Proto.Msg.AckAndEnvelopeContainer>.ParseFrom(Google.Protobuf.ByteString data) + 0x84 bytes
Akka.Remote.dll!Akka.Remote.Transport.AkkaPduProtobuffCodec.DecodeMessage(Google.Protobuf.ByteString raw, Akka.Remote.IRemoteActorRefProvider provider, Akka.Actor.Address localAddress) + 0x6c bytes

@Aaronontheweb
Copy link
Member Author

This looks to me like it has to be a message framing issue somewhere further up the food chain, but I'm skeptical about that because we've tested the hell out of the DotNetty message framing sitting in front of it.

Best idea is to probably add some additional Info logging inside the AkkaPduProtobuffCodec class on the type of message being deserialized when this happened.

@Aaronontheweb
Copy link
Member Author

LOL welp, my fault - my issue was the result of a unit test I wrote intentionally injecting a mal-formed packet into the transport. Disregard my latest comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants