Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set Receive/Send socket timeouts appropriately #317

Merged
merged 1 commit into from
Aug 31, 2016
Merged

Set Receive/Send socket timeouts appropriately #317

merged 1 commit into from
Aug 31, 2016

Conversation

JustArchi
Copy link
Contributor

@JustArchi JustArchi commented Aug 31, 2016

This is one of the issues I found "by the way" while looking for real cause of this bug. This report is not connected with ASF one in any way.

I've implemented a custom HeartBeat() function that executes await SteamApps.PICSGetProductInfo(0, null); every minute, and if request times out, while still being SteamClient.IsConnected, function executes SteamClient.Connect(); in order to initiate a reconnect sooner than SK2 would find out.

This uncovered some weird thing going on when network stability is not perfect:

2016-08-31 09:06:48|INFO|archi|OnDisconnected() Disconnected from Steam!
2016-08-31 09:06:48|INFO|archi|OnDisconnected() Reconnecting...
2016-08-31 09:06:58|INFO|archi|OnConnected() Connected to Steam!
2016-08-31 09:06:58|INFO|archi|OnConnected() Logging in...
2016-08-31 09:07:33|WARN|archi|HeartBeat() Connection to Steam Network lost, reconnecting...
2016-08-31 09:08:33|WARN|archi|HeartBeat() Connection to Steam Network lost, reconnecting...
> (repeat above 15 more times in 1-minute interval)
2016-08-31 09:24:33|WARN|archi|HeartBeat() Connection to Steam Network lost, reconnecting...
2016-08-31 09:24:50|INFO|archi|OnDisconnected() Disconnected from Steam!
2016-08-31 09:24:50|INFO|archi|OnDisconnected() Disconnected from Steam!
> (repeat above 15 more times)
2016-08-31 09:24:50|INFO|archi|OnDisconnected() Disconnected from Steam!
2016-08-31 09:24:50|INFO|archi|OnConnected() Connected to Steam!
2016-08-31 09:24:50|INFO|archi|OnConnected() Logging in...
2016-08-31 09:24:51|INFO|archi|OnLoggedOn() Successfully logged on!

This looked weird to me, that we sent last request to Steam network around ~09:07, then around ~15 SteamClient.Connect() function calls, and finally we received all callbacks at the same moment, eventually connecting at ~09:25. It's possible that network was down for a short while, but definitely not for more than half a minute.

We can analyze that also in nethook messages:

image

So what happened: for some reason Steam sent us ChannelEncryptRequest without previous graceful disconnection (you can notice our heartbeats sent by SK2 a while before). We handled that properly and sent ClientLogon in 35901 (09:06:58). We didn't get any response for nearly ~20 minutes, and the next in packet is again ChannelEncryptRequestin 35920 (09:24:50).

Debug log of SK2 since packet 35894 to 35924 can be found here:

CMClient | <- Recv'd EMsg: ClientClanState (822) (Proto: True)
CMClient | Sent -> EMsg: ClientHeartBeat (Proto: True)
CMClient | Sent -> EMsg: ClientPICSProductInfoRequest (Proto: True)
CMClient | Sent -> EMsg: ClientHeartBeat (Proto: True)
CMClient | Sent -> EMsg: ClientHeartBeat (Proto: True)
CMClient | Sent -> EMsg: ClientHeartBeat (Proto: True)
CMClient | Sent -> EMsg: ClientHeartBeat (Proto: True)
CMClient | Sent -> EMsg: ClientHeartBeat (Proto: True)
CMClient | Sent -> EMsg: ClientHeartBeat (Proto: True)
CMClient | Sent -> EMsg: ClientHeartBeat (Proto: True)
TcpConnection | Socket exception occurred while reading packet: System.IO.IOException: Unable to read data from the transport connection: Connection timed out. ---> System.Net.Sockets.SocketException: Connection timed out
  at System.Net.Sockets.Socket.Receive (System.Byte[] buffer, System.Int32 offset, System.Int32 size, System.Net.Sockets.SocketFlags socketFlags) <0x40eaad20 + 0x001a3> in <5cf7305791db4d5da2dbb66cb115547f>:0 
  at System.Net.Sockets.NetworkStream.Read (System.Byte[] buffer, System.Int32 offset, System.Int32 size) <0x40e8a890 + 0x000c7> in <5cf7305791db4d5da2dbb66cb115547f>:0 
   --- End of inner exception stack trace ---
  at System.Net.Sockets.NetworkStream.Read (System.Byte[] buffer, System.Int32 offset, System.Int32 size) <0x40e8a890 + 0x002b7> in <5cf7305791db4d5da2dbb66cb115547f>:0 
  at System.IO.BinaryReader.ReadBytes (System.Int32 count) [0x00041] in <f24f53297d1d465396b75be74d37e900>:0 
  at SteamKit2.TcpConnection.ReadPacket () <0x40eded20 + 0x000f8> in <22a5bf0f0df5491bb43e46408f01ce2f>:0 
  at SteamKit2.TcpConnection.NetLoop () <0x40eddfe0 + 0x0017b> in <22a5bf0f0df5491bb43e46408f01ce2f>:0 
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connected to 146.66.152.10:27020
CMClient | <- Recv'd EMsg: ChannelEncryptRequest (1303) (Proto: False)
CMClient | Got encryption request. Universe: Public Protocol ver: 1
CMClient | Sent -> EMsg: ChannelEncryptResponse (Proto: False)
CMClient | <- Recv'd EMsg: ChannelEncryptResult (1305) (Proto: False)
CMClient | Encryption result: OK
CMClient | Sent -> EMsg: ClientLogon (Proto: True)
CMClient | <- Recv'd EMsg: Multi (1) (Proto: True)
CMClient | <- Recv'd EMsg: ClientServersAvailable (5501) (Proto: True)

All cancellations are most likely done by SteamClient.Disconnect() executed by SteamClient.Connect() after client returned from nearly ~20 minutes of death.

(Don't worry about extra ClientHeartBeats, I have multiple steam clients running, those are misleading)


So what happened?

I think that such incident is caused by infinite timeout set by default in TCP socket's Receive/Send functions. UDP connection automatically is considered "dead" after 60 seconds, but with TCP one, we could in theory read given packet till something interrupts us, probably OS, and that's why the whole SK2 communication blocks if such incident happens, as the exception is thrown nearly ~20 minutes after TCP connection in fact should be considered dead.

I'm not yet sure about what values we should put, for now I sticked with user-provided timeout. While it works for me, I ask you to test it yourself and review the change, because I'm not that familiar with low-level SK2 things, and it's very likely that it might introduce unwanted behaviour. Otherwise, I think that this change is alright.

It's also possible that the culprit is in fact somewhere else and I got misleaded - in this case I'd appreciate if somebody else could look into that issue, because I'm trying to help most of the time, but I'm definitely not experienced with SK2 as much as you.

This is one of the issues I found "by the way" while looking for real cause of **[this](JustArchiNET/ArchiSteamFarm#318 bug. This report is not connected with ASF one in any way.

I've implemented a custom HeartBeat() function that executes ```await SteamApps.PICSGetProductInfo(0, null);``` every minute, and if request times out, while still being ```SteamClient.IsConnected```, function executes ```SteamClient.Connect();``` in order to initiate a reconnect sooner than SK2 would find out.

This uncovered some weird thing going on when network stability is not perfect:

```
2016-08-31 09:06:48|INFO|archi|OnDisconnected() Disconnected from Steam!
2016-08-31 09:06:48|INFO|archi|OnDisconnected() Reconnecting...
2016-08-31 09:06:58|INFO|archi|OnConnected() Connected to Steam!
2016-08-31 09:06:58|INFO|archi|OnConnected() Logging in...
2016-08-31 09:07:33|WARN|archi|HeartBeat() Connection to Steam Network lost, reconnecting...
2016-08-31 09:08:33|WARN|archi|HeartBeat() Connection to Steam Network lost, reconnecting...
> (repeat above 15 more times in 1-minute interval)
2016-08-31 09:24:33|WARN|archi|HeartBeat() Connection to Steam Network lost, reconnecting...
2016-08-31 09:24:50|INFO|archi|OnDisconnected() Disconnected from Steam!
2016-08-31 09:24:50|INFO|archi|OnDisconnected() Disconnected from Steam!
> (repeat above 15 more times)
2016-08-31 09:24:50|INFO|archi|OnDisconnected() Disconnected from Steam!
2016-08-31 09:24:50|INFO|archi|OnConnected() Connected to Steam!
2016-08-31 09:24:50|INFO|archi|OnConnected() Logging in...
2016-08-31 09:24:51|INFO|archi|OnLoggedOn() Successfully logged on!
```

This looked weird to me, that we sent last request to Steam network around ~09:07, then around ~15 ```SteamClient.Connect()``` function calls, and finally we received all callbacks at the same moment, eventually connecting at ~09:25.

We can analyze that also in nethook messages:

![image](https://cloud.githubusercontent.com/assets/1069029/18123334/b3eac1f8-6f6d-11e6-9379-f63d77a136d4.png)

So what happened: for some reason Steam sent us ```ChannelEncryptRequest``` without previous graceful disconnection (you can notice our heartbeats sent by SK2 a while before). We handled that properly and sent ```ClientLogon``` in 35901 (09:06:58). We didn't get any response for nearly ~20 minutes, and the next ```in``` packet is again ```ChannelEncryptRequest```in 35920 (09:24:50).

Debug log of SK2 since packet 35894 to 35924 can be found here:

```
CMClient | <- Recv'd EMsg: ClientClanState (822) (Proto: True)
CMClient | Sent -> EMsg: ClientHeartBeat (Proto: True)
CMClient | Sent -> EMsg: ClientPICSProductInfoRequest (Proto: True)
CMClient | Sent -> EMsg: ClientHeartBeat (Proto: True)
CMClient | Sent -> EMsg: ClientHeartBeat (Proto: True)
CMClient | Sent -> EMsg: ClientHeartBeat (Proto: True)
CMClient | Sent -> EMsg: ClientHeartBeat (Proto: True)
CMClient | Sent -> EMsg: ClientHeartBeat (Proto: True)
CMClient | Sent -> EMsg: ClientHeartBeat (Proto: True)
CMClient | Sent -> EMsg: ClientHeartBeat (Proto: True)
TcpConnection | Socket exception occurred while reading packet: System.IO.IOException: Unable to read data from the transport connection: Connection timed out. ---> System.Net.Sockets.SocketException: Connection timed out
  at System.Net.Sockets.Socket.Receive (System.Byte[] buffer, System.Int32 offset, System.Int32 size, System.Net.Sockets.SocketFlags socketFlags) <0x40eaad20 + 0x001a3> in <5cf7305791db4d5da2dbb66cb115547f>:0 
  at System.Net.Sockets.NetworkStream.Read (System.Byte[] buffer, System.Int32 offset, System.Int32 size) <0x40e8a890 + 0x000c7> in <5cf7305791db4d5da2dbb66cb115547f>:0 
   --- End of inner exception stack trace ---
  at System.Net.Sockets.NetworkStream.Read (System.Byte[] buffer, System.Int32 offset, System.Int32 size) <0x40e8a890 + 0x002b7> in <5cf7305791db4d5da2dbb66cb115547f>:0 
  at System.IO.BinaryReader.ReadBytes (System.Int32 count) [0x00041] in <f24f53297d1d465396b75be74d37e900>:0 
  at SteamKit2.TcpConnection.ReadPacket () <0x40eded20 + 0x000f8> in <22a5bf0f0df5491bb43e46408f01ce2f>:0 
  at SteamKit2.TcpConnection.NetLoop () <0x40eddfe0 + 0x0017b> in <22a5bf0f0df5491bb43e46408f01ce2f>:0 
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connection to 146.66.152.10:27020 cancelled by user
ServerList | Next server candidiate: 146.66.152.10:27020
TcpConnection | Connecting to 146.66.152.10:27020...
TcpConnection | Connected to 146.66.152.10:27020
CMClient | <- Recv'd EMsg: ChannelEncryptRequest (1303) (Proto: False)
CMClient | Got encryption request. Universe: Public Protocol ver: 1
CMClient | Sent -> EMsg: ChannelEncryptResponse (Proto: False)
CMClient | <- Recv'd EMsg: ChannelEncryptResult (1305) (Proto: False)
CMClient | Encryption result: OK
CMClient | Sent -> EMsg: ClientLogon (Proto: True)
CMClient | <- Recv'd EMsg: Multi (1) (Proto: True)
CMClient | <- Recv'd EMsg: ClientServersAvailable (5501) (Proto: True)
```

All cancellations are most likely done by ```SteamClient.Disconnect()``` executed by ```SteamClient.Connect()``` after client returned from nearly ~20 minutes of death.

(Don't worry about extra ```ClientHeartBeat```s, I have multiple steam clients running, those are misleading)

---

**So what happened?**

I think that such incident is caused by infinite timeout set by default in TCP socket's Receive/Send functions. UDP connection automatically is considered "dead" after 60 seconds, but with TCP one, we could in theory read given packet till something interrupts us, probably OS, and that's why the whole SK2 communication blocks if such incident happens, as the exception is thrown nearly ~20 minutes after TCP connection in fact should be considered dead.

I'm not yet sure about what values we should put, for now I sticked with user-provided timeout. While it works for me, I ask you to test it yourself and review the change, because I'm not that familiar with low-level SK2 things, and it's very likely that it might introduce unwanted behaviour. Otherwise, I think that this change is alright.

It's also possible that the culprit is in fact somewhere else and I got misleaded - in this case I'd appreciate if somebody else could look into that issue, because I'm trying to help most of the time, but I'm definitely not experienced with SK2 as much as you.
@JustArchi
Copy link
Contributor Author

Oh yeah, Mono 4.6 is broken as well, so don't expect that one to succeed in current state: #318

@azuisleet
Copy link
Member

What does setting Socket timeouts do in this scenario? I see that TcpConnection did indeed throw an exception and disconnect, because your heartbeat function detected IsConnected is false.

The fact that IsConnected is being set to false means that nothing is stalled in Send either, so I don't understand what the timeouts would have changed. This simply appears to be a stall in your callback handler, such as a blocking call.

@JustArchi
Copy link
Contributor Author

How it can be a blocking call lasting ~15 minutes if all calls have typical timeout of 10 seconds? All I'm doing is awaiting (asynchronously) for result of PICS, which should throw exception in max of 10 seconds since calling. Heartbeats are executed once per minute + they should result in SteamClient.Connect() which is nowhere to be found in the log, instead, we have ~15 disconnection callbacks arriving all at once after ~15 minutes of death.

My HeartBeat() function can be found declared here. If it's not TCP timeout, then something else is blocking SteamClient.Connect(). Question is what - this is what I'm trying to find out. Something is blocking that function call from initiating reconnect on-request.

@JustArchi
Copy link
Contributor Author

Also don't get me wrong - I'm simply trying to locate the culprit that is preventing me from forcing a reconnect of SteamClient via standard Connect() call - you're far more experienced in SK2 code than me, so perhaps you'll find it easier, I'd appreciate a helpful hand as I'm probably not the only one meeting this issue 👍

@azuisleet
Copy link
Member

What exception are you catching from SteamApps.PICSGetProductInfo to detect that the connection is dead?

@JustArchi
Copy link
Contributor Author

Any. The objective of that function is to detect when SteamClient.IsConnected but not responding - if I don't receive a valid response, I'm initiating reconnect. I added that function as a "workaround" while searching for real cause of those freezes. In best case SK2 should detect it itself, but it usually takes longer while for it to notice disconnect, that's why I tried experimental approach of detecting it earlier and reconnecting before TCP socket dies. It's not perfect, but uncovered what I was looking for initially - a blocking call somewhere that stops whole communication and all function calls.

@azuisleet
Copy link
Member

It would really be helpful if you could capture a stack trace to confirm the issue, as in while it's in whatever broken state.

What I believe is happening: The exception is thrown correctly, TcpConnection tries to Release but netLock is held by a Send call. If this is the case, then SendTimeout on the socket is the correct answer. This would also imply that your callback thread is stalled and therefore you aren't getting any callbacks/IsConnected state doesn't change/etc.

@azuisleet
Copy link
Member

But even then I'm not totally convinced since you're able to call Connect/Disconnect, which also acquire the lock, which is why I still suspect that you have a blocking call on your callback thread

@JustArchi
Copy link
Contributor Author

JustArchi commented Aug 31, 2016

since you're able to call Connect/Disconnect, which also acquire the lock

I wouldn't be so sure, it's totally possible that all those calls are waiting for lock - HeartBeat() is executed by a Timer, and Timer doesn't care in this case about function completion, so it's totally possible that I have 15 HeartBeat() calls blocked on SteamClient.Connect() - that would also explain all 15 disconnect callbacks returning after ~15 minutes of stall.

This would also imply that your callback thread is stalled and therefore you aren't getting any callbacks/IsConnected state doesn't change/etc.

Possible. There is not much I can do in this case though, I only made sure that I'm running callback loop as long as I have KeepRunning enabled, and I set it to false only in Stop() function, so I'm 100% sure that the loop is working during stall, I can't say if callback manager is stalled or not though.

It would really be helpful if you could capture a stack trace to confirm the issue, as in while it's in whatever broken state.

I know, I'm trying very hard to catch it in debugger but it's so rare networking issue that I can't reproduce it reliably. Since last 2-3 weeks I'm trying blindly to find out what is wrong, because those are some very short network downtimes/interruptions, I even tested without HeartBeat() previously and there are only two situations happening:

  • Either downtime will be unobserved, and connection will remain connected
  • Or something calls SK2 client and connection will die, very often being noticed after ~15+ minutes by SK2 itself, this is what I'm trying to avoid, if it dies - it should be observed almost immediately, preferably in less than a minute.

In fact, this issue happened first time since a week of tests, and I'm running my project 24/7, so it's like my 3rd try to locate the culprit and fix it. Probably I'll eventually give up and just add proper code that will take whole process dump when detecting such thing, but even detecting it programatically is not easy.

I deeply appreciate your help. I'm currently testing SK2 with my own change from this PR to see if it helps or not, and so far it's working, but it'll probably take a month or so to state at least semi-reliably that issue is gone.

@azuisleet
Copy link
Member

The DebugLog doesn't have any timestamps, so it's difficult to say when things are happening. If I understand correctly, you're saying all the Connect calls happened at the same time. I'm hoping that the actual Exception was thrown much earlier.

If the case is that we are stalled on a Send call holding back the actual Release call, then I do agree that the infinite timeout is probably to blame, and that most stacks are probably blocked on netLock.

@azuisleet
Copy link
Member

I'm merging this since there is a reasonable suspicion.

@JustArchi
Copy link
Contributor Author

I think so too, indeed I didn't think of timestamps initially, I've added them earlier today after posting this PR, so if it ever happens again, I'll have more precise log now.

For now I'm having a wait-game for the issue to come back to life. TBH I think that even if setting timeouts doesn't fix initial issue, for sure it'll cause SK2 to be aware of dead connection sooner, which will allow me to remove that experimental workaround and handle the connection statuses where it should be handled - in SK2. The whole objective is to minimize downtime caused by network issues - if timeout value is too low, we can always bump it to more appropriate value, but leaving it with infinite is usually a bad choice.

@azuisleet azuisleet merged commit d9a1da3 into SteamRE:master Aug 31, 2016
@JustArchi JustArchi deleted the patch-3 branch August 31, 2016 21:23
@JustArchi
Copy link
Contributor Author

It probably won't interest anybody anymore, but I couldn't reproduce this issue since setting up those timeouts - either it's a miracle, or indeed something got fixed/improved in a way that it doesn't annoy me anymore.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants