Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syncthing Tray sometimes loses connection to Syncthing and fails to connect again on its own #217

Open
1 of 6 tasks
tomasz1986 opened this issue Dec 10, 2023 · 21 comments
Open
1 of 6 tasks
Labels

Comments

@tomasz1986
Copy link
Contributor

tomasz1986 commented Dec 10, 2023

Relevant components

  • Standalone tray application (based on Qt Widgets)
  • Plasmoid/applet for Plasma desktop
  • Dolphin integration
  • Command line tool (syncthingctl)
  • Integrated Syncthing instance (libsyncthing)
  • Backend libraries

Environment and versions

  • Versions of syncthingtray, qtutilities and c++utilities: 1.4.10, N/A, N/A
  • Qt version: 6.6.1
  • C++ compiler (name and version): N/A
  • C++ standard library (name and version): N/A
  • Operating system (name and version): Windows 10 x64

Bug description

Recently, I've been experiencing this problem where Syncthing Tray just stops connecting to Syncthing after working with no issues for a few days. The problem does not go away on its own unless I manually press the "Apply connection settings and try to reconnect with the currently selected config button". After pressing the button, Syncthing Tray does reconnect to Syncthing properly and works fine for a while again.

These are the error logs. "Połączenie odrzucone" means "Connection refused", and "Nie można zapisać" means "Cannot save". As the logs say, the problem occurred two days ago first, and then today the next time.

[2023-12-08T16:59:03] Unable to request Syncthing status: Połączenie odrzucone
Request URL: https://syncthing:redacted@localhost:8384/rest/system/status
[2023-12-08T16:59:42] Unable to request Syncthing status: Połączenie odrzucone
Request URL: https://syncthing:redacted@localhost:8384/rest/system/status
[2023-12-10T11:34:04] Unable to request Syncthing events: Połączenie zakończone
Request URL: https://syncthing:redacted@localhost:8384/rest/events?since=180267&timeout=60
[2023-12-10T13:46:32] Unable to request Syncthing events: Nie można zapisać
Request URL: https://syncthing:redacted@localhost:8384/rest/events?since=189703&timeout=60

Steps to reproduce

  1. Leave Syncthing Tray running for a few days.

Expected behavior

Syncthing Tray should work and stay connected to Syncthing as long as Syncthing itself is running.

Screenshots

image

Additional context

Syncthing (v1.27.0 as of today) is started on user logon separately from Syncthing Tray. I have also just noticed that the "supply credentials for HTTP authentication" checkbox was ticked and I have now unticked it, however https://docs.syncthing.net/users/config.html#config-option-gui.sendbasicauthprompt is also enabled, so I think Syncthing Tray should work fine both ways (and it does connect using just the username and password initially).

@tomasz1986 tomasz1986 added the bug label Dec 10, 2023
@Martchus
Copy link
Owner

Considering the log the retry logic is definitely working as it tries to reconnect. Considering clicking on "Apply connection settings and try to reconnect with the currently selected config button" helps it also cannot be Qt's network module.

So it must be the state of Syncthing Tray's connection handling. On the other hand, this would also be strange because it says "Connection refused" indicating that Syncthing is not even reachable at all.

So I cannot really make sense of the problem right now. That it is only reproducible after a few days makes it of course even harder to debug.

Is that a new problem? I actually haven't changed much recently when it comes to the handling of API requests and events.

Note the you only need user name and password if Syncthing is behind a reverse proxy requires basic HTTP auth. For Syncthing itself the API key should be sufficient.

@tomasz1986
Copy link
Contributor Author

tomasz1986 commented Dec 10, 2023

Is that a new problem? I actually haven't changed much recently when it comes to the handling of API requests and events.

This actually does look new to me. I don't remember seeing Syncthing Tray disconnecting and disabling itself before. I've only noticed it in the last few days because the icon turned grey. Just for the record, I've noticed the problem on two separate Windows devices so far.

Note the you only need user name and password if Syncthing is behind a reverse proxy requires basic HTTP auth. For Syncthing itself the API key should be sufficient.

Yeah, honestly I don't remember why I used them both. This is an old config, it used to be set like that for a very long time.

@tomasz1986
Copy link
Contributor Author

tomasz1986 commented Dec 12, 2023

The problem happened again, this time on another device.

[2023-12-03T15:22:50] Unable to request Syncthing events: Temporary network failure.
Request URL: https://xxx.local:8384/rest/events?since=240802&timeout=60
[2023-12-03T17:35:49] Unable to request connections: Połączenie odrzucone
Request URL: https://xxx.local:8384/rest/system/connections
[2023-12-03T17:35:56] Unable to request errors: Połączenie odrzucone
Request URL: https://xxx.local:8384/rest/system/error
[2023-12-03T17:36:26] Unable to request device statistics: Połączenie odrzucone
Request URL: https://xxx.local:8384/rest/stats/device

Not sure if relevant but I can add that I use hostname URLs to connect Syncthing Tray to the devices. In addition, today the issue happened right when the device lost its LAN connection. As soon as the LAN connection was lost, Syncthing Tray also lost its connection to Syncthing and changed the icon colour to grey. However it stayed like that even long after the device itself re-connected to the LAN. Like before, a manual intervention was required to bring it back to life.

@Martchus
Copy link
Owner

In addition, today the issue happened right when the device lost its LAN connection. As soon as the LAN connection was lost, Syncthing Tray also lost its connection to Syncthing and changed the icon colour to grey.

This speaks for a bug in Qt's network stack. Maybe a regression in Qt 6.6.1?

Like before, a manual intervention was required to bring it back to life.

But this on the other hand speaks for something gone stale in Syncthing Tray's internal code. I haven't changed anything except for the introduction of timeouts (which are just an additional function call before setting up the connection). But maybe configuring a transfer timeout (or maybe configuring both timeouts) makes it actually worse in that regard nevertheless. I personally have mainly used the long-polling timeout but not the transfer timeout (except in my initial testing which did only happen under GNU/Linux).

Note that I haven't been able to reproduce the issue under Windows yet but I guess my longest session was only around 6 hours. I actually disconnected and re-connected the Wifi a lot today (which should be similar to losing the LAN connection) and also couldn't reproduce the issue that way. On GNU/Linux I had longer sessions but couldn't reproduce the issue as well.

@tomasz1986
Copy link
Contributor Author

This speaks for a bug in Qt's network stack. Maybe a regression in Qt 6.6.1?

I have just noticed that the other device where the issue occurred still ran Syncthing Tray 1.4.9 with Qt 6.6.0, so the culprit must be something else.

@tomasz1986
Copy link
Contributor Author

tomasz1986 commented Dec 15, 2023

I'm now seeing the same issue with yet another device, which is an Android phone running Syncthing, with Syncthing Tray connected to it from a Windows device. This time, the device just keeps disconnecting after 1-2 minutes for no apparent reason. Each time this happens, I can reconnect manually with no problems.

Both the transfer timeout and long polling interval are set to 60000 ms. If I reset the first one to the default value of "no timeout", the device stops disconnecting. Premature conclusion. It still loses connection even with "no timeout". I'm now testing with the long polling interval also reset to its default value of "Syncthing's default with no timeout".

Edit: It seems to stay connected with the long polling interval set to "Syncthing's default with no timeout"!

@Martchus
Copy link
Owner

Ok, so the timeout for the long polling interval avoids the connection from becoming stuck (issue #209) but leads to this issue. Maybe it wasn't the best idea to make it now the default - although I couldn't reproduce the issue myself so far. I'm nevertheless still wondering why this happens. If the connection would run into a timeout it should not say "Connection refused".

Copy link

stale bot commented Feb 15, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Feb 15, 2024
Martchus added a commit that referenced this issue Feb 15, 2024
* This reverts commit becf6e8.
* This timeout might be problematic after all as it might cause
  Syncthing to lose its connection not being able to connect
  again on its own (see
  #217 (comment))
@stale stale bot closed this as completed Feb 22, 2024
@tomasz1986
Copy link
Contributor Author

I'm still experiencing the issue on my devices, and I'd like to get rid of it now, but I'm a bit confused about the settings.

image

This is what I currently have on most devices. Do you recommend that I set just the Long polling int. to "Syncthing's default with no timeout" and leave the Transfer timeout at 60000 ms?

@Martchus
Copy link
Owner

I'm not sure what to recommend. I cannot reproduce the issue myself so it is not easy to improve anything on my side. I cannot even bisect what change caused it.

I thought that setting timeouts would be generally a good idea to avoid the connection from getting stuck. However, that might have caused a regression so I reverted enabling timeouts by default again (see 699dcbd). Disabling the long polling interval/timeout is maybe the safest option. If you do that then you might see #209 again, though.

@Martchus Martchus reopened this Feb 23, 2024
@stale stale bot removed the stale label Feb 23, 2024
@xgdgsc
Copy link

xgdgsc commented Feb 26, 2024

Usually on a low performance device (arm in my case). After resume from suspend there would be a short period when CPU usage is 100% by all the busy processes. And it would show the connection lost error.

@Martchus
Copy link
Owner

And it would show the connection lost error.

That may be expected - unless you have configured a long enough grace period for that alert or unless you have disabled the alert completely. (The grace period is the number of seconds you can configure in the notifications settings.)

This ticket is about re-connects not happening (independently of the alert I assume) despite a re-connect interval being configured for the relevant connection in the connection settings.

@tomasz1986
Copy link
Contributor Author

tomasz1986 commented Feb 26, 2024

Disabling the long polling interval/timeout is maybe the safest option. If you do that then you might see #209 again, though.

Yeah, I see the dilemma here. Not sure what to do then. Maybe enable the long polling interval only on the devices that use hibernation?

What about the transfer timeout though? Is it better to disable or keep it at 60000 ms?

@Martchus
Copy link
Owner

Martchus commented Feb 26, 2024

Yeah, I see the dilemma here. Not sure what to do then. Maybe enable the long polling interval only on the devices that use hibernation?

@xgdgsc Now that @tomasz1986 used the word "hibernation" I get the problem you are having. I guess it is true that on hibernation or on standby (or whatever causes all network connections to break) one sees the "Connection lost error" because, well, the connection was in fact lost. For GNU/Linux I actually implemented suppressing those alerts as part of the systemd integration but it hasn't been done yet for Windows. So if you use hibernation/standby very often and are annoyed by the alerts you'll have to disable them completely. Note that this still has nothing to do with your issue where the re-connect doesn't work for some reason (despite being configured). Also note that it could still be that you generally need a higher grace period for this alert on slower devices.

What about the transfer timeout though? Is it better to disable or keep it at 60000 ms?

It is best to keep the default which means disabling it because some HTTP requests can take very long and there's so far no exception for them. (For example, the request for rescanning a folder only completes after rescanning is complete. This is simply how Syncthing's REST-API behaves and it also makes kind of sense.) This setting also shouldn't affect whether Syncthing Tray considers itself connected or not (because that's done via the long polling connection). By the way, if you are in the state where the connection is lost but not recovered, can you make other requests like requesting a rescan?

@ProactiveServices
Copy link

I've just had this problem recur and noticed that syncthing wasn't running this time. It seems that it auto-upgraded and because syncthingtray starts syncthing with the --no-restart argument (both syncthing.exe instances) this prevents syncthing from automatically restart itself after an upgrade. I checked my configuration, and that the envvar STNOUPGRADE is unset. This behaviour could be one cause of this bug - it happens much less frequently these days.

https://docs.syncthing.net/users/syncthing.html#cmdoption-no-restart

@Martchus
Copy link
Owner

Martchus commented Apr 9, 2024

That would be one way to run into this issue. I don't remember why I added --no-restart to the default arguments. Maybe it makes sense to remove it from the defaults.

However, considering the initial ticket description, especially the part

unless I manually press the "Apply connection settings and try to reconnect with the currently selected config button". After pressing the button, Syncthing Tray does reconnect to Syncthing properly and works fine for a while again.

I don't think this is what caused the issue in @tomasz1986 case. If Syncthing doesn't run anymore then clickng that button wouldn't make a difference, too.

@tomasz1986
Copy link
Contributor Author

Yeah, in my case, Syncthing was still running in the background. I also use my own builds with upgrades disabled, so there is no automatic upgrade and restart business going on either.

Copy link

stale bot commented Jun 18, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jun 18, 2024
@stale stale bot closed this as completed Jun 25, 2024
@Martchus Martchus reopened this Jun 25, 2024
@stale stale bot removed the stale label Jun 25, 2024
Copy link

stale bot commented Aug 24, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 24, 2024
@stale stale bot closed this as completed Aug 31, 2024
@Martchus Martchus reopened this Aug 31, 2024
@stale stale bot removed the stale label Aug 31, 2024
Copy link

stale bot commented Nov 4, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Nov 4, 2024
@Martchus
Copy link
Owner

Martchus commented Nov 4, 2024

Is this still happening?

@stale stale bot removed the stale label Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants