Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Akka.Remote - proper logging when remote node abruptly goes down #2460

Closed
maxcherednik opened this issue Jan 19, 2017 · 0 comments
Closed

Akka.Remote - proper logging when remote node abruptly goes down #2460

maxcherednik opened this issue Jan 19, 2017 · 0 comments
Milestone

Comments

@maxcherednik
Copy link
Contributor

When 2 nodes are Associated successfully and one of the nodes goes down abruptly we get this:

2017-01-19 17:01:13.400 [1] [(null)] INFO  Akka.RiskEngine.RiskEngineService - Starting risk engine...
2017-01-19 17:01:13.724 [8] [(null)] INFO  Akka.Event.DummyClassForStringSources - Starting remoting
2017-01-19 17:01:13.830 [5] [(null)] INFO  Akka.Event.DummyClassForStringSources - Remoting started; listening on addresses : [akka.tcp://riskengine@127.0.0.1:4054]
2017-01-19 17:01:13.832 [7] [(null)] INFO  Akka.Event.DummyClassForStringSources - Remoting now listens on addresses: [akka.tcp://riskengine@127.0.0.1:4054]
2017-01-19 17:01:13.863 [8] [(null)] INFO  Akka.Event.DummyClassForStringSources - Cluster Node [akka.tcp://riskengine@127.0.0.1:4054] - Starting up...
2017-01-19 17:01:13.881 [19] [(null)] INFO  Akka.Event.DummyClassForStringSources - Cluster Node [akka.tcp://riskengine@127.0.0.1:4054] - Started up successfully
2017-01-19 17:01:13.887 [1] [(null)] INFO  Akka.RiskEngine.RiskEngineService - Started!
2017-01-19 17:01:14.893 [22] [(null)] INFO  Akka.Cluster.ClusterCoreDaemon - Welcome from [akka.tcp://riskengine@127.0.0.1:4053]
2017-01-19 17:02:39.191 [8] [(null)] ERROR Akka.Remote.Transport.Helios.TcpServerHandler - Error caught channel [[::ffff:127.0.0.1]:4054->[::ffff:127.0.0.1]:60957](Id=ChannelId(-717517600))
System.Net.Sockets.SocketException (0x80004005): An existing connection was forcibly closed by the remote host
   at Helios.Channels.Sockets.SocketChannelAsyncOperation.Validate()
   at Helios.Channels.Sockets.AbstractSocketByteChannel.SocketByteChannelUnsafe.FinishRead(SocketChannelAsyncOperation operation)
2017-01-19 17:02:39.195 [8] [(null)] ERROR Akka.Remote.Transport.Helios.TcpClientHandler - Error caught channel [[::ffff:127.0.0.1]:60956->[::ffff:127.0.0.1]:4053](Id=ChannelId(1909747648))
System.Net.Sockets.SocketException (0x80004005): An existing connection was forcibly closed by the remote host
   at Helios.Channels.Sockets.SocketChannelAsyncOperation.Validate()
   at Helios.Channels.Sockets.AbstractSocketByteChannel.SocketByteChannelUnsafe.FinishRead(SocketChannelAsyncOperation operation)
2017-01-19 17:02:39.199 [8] [(null)] ERROR Akka.Remote.Transport.Helios.TcpClientHandler - Error caught channel [[::ffff:127.0.0.1]:60956->[::ffff:127.0.0.1]:4053](Id=ChannelId(1909747648))
System.Net.Sockets.SocketException (0x80004005): An existing connection was forcibly closed by the remote host
   at Helios.Channels.Sockets.SocketChannelAsyncOperation.Validate()
   at Helios.Channels.Sockets.AbstractSocketByteChannel.SocketByteChannelUnsafe.FinishRead(SocketChannelAsyncOperation operation)
2017-01-19 17:02:39.199 [8] [(null)] ERROR Akka.Remote.Transport.Helios.TcpServerHandler - Error caught channel [[::ffff:127.0.0.1]:4054->[::ffff:127.0.0.1]:60957](Id=ChannelId(-717517600))
System.Net.Sockets.SocketException (0x80004005): An existing connection was forcibly closed by the remote host
   at Helios.Channels.Sockets.SocketChannelAsyncOperation.Validate()
   at Helios.Channels.Sockets.AbstractSocketByteChannel.SocketByteChannelUnsafe.FinishRead(SocketChannelAsyncOperation operation)
2017-01-19 17:02:39.213 [22] [(null)] WARN  Akka.Remote.ReliableDeliverySupervisor - Association with remote system akka.tcp://riskengine@127.0.0.1:4053 has failed; address is now gated for 5000 ms. Reason is: [Akka.Remote.EndpointDisassociatedException: Disassociated
   at Akka.Remote.EndpointWriter.PublishAndThrow(Exception reason, LogLevel level, Boolean needToThrow)
   at Akka.Actor.ReceiveActor.ExecutePartialMessageHandler(Object message, PartialAction`1 partialAction)
   at Akka.Actor.ActorCell.<>c__DisplayClass114_0.<Akka.Actor.IUntypedActorContext.Become>b__0(Object m)
   at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)
   at Akka.Actor.ActorCell.ReceiveMessage(Object message)
   at Akka.Actor.ActorCell.AutoReceiveMessage(Envelope envelope)
   at Akka.Actor.ActorCell.Invoke(Envelope envelope)
--- End of stack trace from previous location where exception was thrown ---
   at Akka.Actor.ActorCell.HandleFailed(Failed f)
   at Akka.Actor.ActorCell.SysMsgInvokeAll(EarliestFirstSystemMessageList messages, Int32 currentState)]

Proposition:
On the transport level - no need to print the error and the stack trace - it's totally normal that sometimes socket can do this:

System.Net.Sockets.SocketException (0x80004005): An existing connection was forcibly closed by the remote host

We can propagate this information to the upper level that node is not available because of this: An existing connection was forcibly closed by the remote host

Hence on the higher level we don't need to print a strange exception:

2017-01-19 17:02:39.213 [22] [(null)] WARN  Akka.Remote.ReliableDeliverySupervisor - Association with remote system akka.tcp://riskengine@127.0.0.1:4053 has failed; address is now gated for 5000 ms. Reason is: [Akka.Remote.EndpointDisassociatedException: Disassociated
   at Akka.Remote.EndpointWriter.PublishAndThrow(Exception reason, LogLevel level, Boolean needToThrow)
   at Akka.Actor.ReceiveActor.ExecutePartialMessageHandler(Object message, PartialAction`1 partialAction)
   at Akka.Actor.ActorCell.<>c__DisplayClass114_0.<Akka.Actor.IUntypedActorContext.Become>b__0(Object m)
   at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)
   at Akka.Actor.ActorCell.ReceiveMessage(Object message)
   at Akka.Actor.ActorCell.AutoReceiveMessage(Envelope envelope)
   at Akka.Actor.ActorCell.Invoke(Envelope envelope)
--- End of stack trace from previous location where exception was thrown ---
   at Akka.Actor.ActorCell.HandleFailed(Failed f)
   at Akka.Actor.ActorCell.SysMsgInvokeAll(EarliestFirstSystemMessageList messages, Int32 currentState)]

... cause this is not what happened in reality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants