Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agents are slow to connect if they are not "local" #6682

Closed
mdalacu opened this issue Jan 10, 2025 · 17 comments
Closed

Agents are slow to connect if they are not "local" #6682

mdalacu opened this issue Jan 10, 2025 · 17 comments
Labels

Comments

@mdalacu
Copy link

mdalacu commented Jan 10, 2025

Describe the bug
Agents are slow to connect if they are not "local".
It takes over 15 minutes for them to connect. Here it is an example:

Mesh Server Connection Error [15]
AutoRetry Connect in 339435 milliseconds
Connecting to: wss://mc.mdonline.ro:443/agent.ashx
Network Timeout occurred...
Mesh Server Connection Error [15]
AutoRetry Connect in 299528 milliseconds
Connecting to: wss://mc.mdonline.ro:443/agent.ashx
Connected.

This is form internal network, from outside it is the same.
After it connects the agent stays online for days without any problem.
It does not matter if i restart the agent or not..same delay.
If in msh i put "local" from internal network it connects instantly.
The web server it is working perfectly from inside/outside.
I am using an apache2 reverse proxy.
Its happening to me since day one...about six months ago ...but if I am not loosing my mind it is getting worse with updates...😅
Thank you for your support.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
The agent to connect to server in a short amount of time.

Screenshots
If applicable, add screenshots to help explain your problem.

Server Software (please complete the following information):

  • OS: Ubuntu 24.04.1
  • Virtualization: Proxmox
  • Network: reverse proxy on another machine, apache2
  • Version: 1.1.38
  • Node: included in Ubuntu

Client Device (please complete the following information):

  • Device: Desktop / Laptop / VM
  • OS: Windows / Debian / Ubuntu
  • Network: Local / Remote over WAN
  • Browser: ?
  • MeshCentralRouter Version: [if applicable]

Remote Device (please complete the following information):

  • Device: Desktop / Laptop / VM
  • OS: Windows / Debian / Ubuntu
  • Network: Remote over WAN
  • Current Core Version (if known): Nov 21 2022, 3412752687

Additional context
Add any other context about the problem here.

Your config.json file

{
  "$schema": "https://raw.githubusercontent.com/Ylianst/MeshCentral/master/meshcentral-config-schema.json",
  "__comment1__": "This is a simple configuration file, all values and sections that start with underscore (_) are ignored. Edit a section and remove the _ in front of the name. Refer to the user's guide for details.",
  "__comment2__": "See node_modules/meshcentral/sample-config-advanced.json for a more advanced example.",
  "settings": {
    "cert": "mc.mdonline.ro",
    "_WANonly": true,
    "_LANonly": true,
    "_sessionKey": "MyReallySecretPassword1",
    "_port": 443,
    "_aliasPort": 443,
    "_redirPort": 80,
    "_redirAliasPort": 80,
    "mpsPort": 4433,
    "TrustedProxy": "192.168.Y.Z",
    "agentping": 30
  },
  "domains": {
    "": {
      "_title": "MyServer",
      "_title2": "Servername",
      "_minify": true,
      "certUrl": "https://mc.mdonline.ro",
      "_newAccounts": true,
      "_userNameIsEmail": true
    },
    "AmtManager": {
       "AdminAccounts": [
         { "user": "admin", "pass": "XXXXX" },
         { "user": "admin", "pass": "XXXXX" }
        ]
      }
  },
  "_letsencrypt": {
    "__comment__": "Requires NodeJS 8.x or better, Go to https://letsdebug.net/ first before trying Let's Encrypt.",
    "email": "myemail@mydomain.com",
    "names": "myserver.mydomain.com",
    "skipChallengeVerification": true,
    "production": false
  }
}

@mdalacu mdalacu added the bug label Jan 10, 2025
@DaanSelen
Copy link

It's actually erroring, normally it connects within seconds of starting the agent. Are there network measures in place which can slow it down?

@mdalacu
Copy link
Author

mdalacu commented Jan 10, 2025

Hi! No there are not. At least I can't think of anything.
If I open a browser from any of these devices the website is working perfectly...

@mdalacu
Copy link
Author

mdalacu commented Jan 10, 2025

I see these lines in apace error and access log..

mdalacu@mdweb1:~$ tail -f /var/log/apache2/error.log
[Fri Jan 10 07:39:09.576121 2025] [proxy:error] [pid 44932] (111)Connection refused: AH00957: HTTPS: attempt to connect to 192.168.1.166:443 (192.168.1.166) failed
[Fri Jan 10 07:39:09.576226 2025] [proxy_http:error] [pid 44932] [client 192.168.1.1:57133] AH01114: HTTP: failed to make connection to backend: 192.168.1.166
[Fri Jan 10 07:39:10.054311 2025] [proxy:error] [pid 44398] (111)Connection refused: AH00957: WSS: attempt to connect to 192.168.1.166:443 (*) failed
[Fri Jan 10 07:39:10.054335 2025] [proxy_wstunnel:error] [pid 44398] [client 86.126.10.128:51891] AH02452: failed to make connection to backend: 192.168.1.166
[Fri Jan 10 07:39:11.864779 2025] [proxy:error] [pid 44123] (111)Connection refused: AH00957: WSS: attempt to connect to 192.168.1.166:443 (*) failed
[Fri Jan 10 07:39:11.864839 2025] [proxy_wstunnel:error] [pid 44123] [client 86.126.10.128:39895] AH02452: failed to make connection to backend: 192.168.1.166
[Fri Jan 10 07:39:15.023304 2025] [proxy:error] [pid 44399] (111)Connection refused: AH00957: WSS: attempt to connect to 192.168.1.166:443 (*) failed
[Fri Jan 10 07:39:15.023385 2025] [proxy_wstunnel:error] [pid 44399] [client 87.237.108.35:55940] AH02452: failed to make connection to backend: 192.168.1.166
[Fri Jan 10 07:39:15.375808 2025] [proxy:error] [pid 44930] (111)Connection refused: AH00957: WSS: attempt to connect to 192.168.1.166:443 (*) failed
[Fri Jan 10 07:39:15.375874 2025] [proxy_wstunnel:error] [pid 44930] [client 86.126.10.128:44551] AH02452: failed to make connection to backend: 192.168.1.166
^C
mdalacu@mdweb1:~$ tail -f /var/log/apache2/access.log
192.168.1.1 - - [10/Jan/2025:09:41:41 +0000] "GET /agent.ashx HTTP/1.1" 502 4026 "-" "-"
192.168.1.1 - - [10/Jan/2025:09:42:02 +0000] "GET /agent.ashx HTTP/1.1" 502 4026 "-" "-"
::1 - - [10/Jan/2025:09:42:03 +0000] "OPTIONS * HTTP/1.0" 200 126 "-" "Apache/2.4.41 (Ubuntu) OpenSSL/1.1.1f (internal dummy connection)"
86.126.10.128 - - [10/Jan/2025:09:41:58 +0000] "GET /agent.ashx HTTP/1.1" 502 4026 "-" "-"
192.168.1.1 - - [10/Jan/2025:09:42:06 +0000] "GET /agent.ashx HTTP/1.1" 502 4026 "-" "-"
86.126.10.128 - - [10/Jan/2025:09:42:18 +0000] "GET /agent.ashx HTTP/1.1" 502 4026 "-" "-"
192.168.1.1 - - [10/Jan/2025:09:42:27 +0000] "GET /agent.ashx HTTP/1.1" 502 4026 "-" "-"
86.126.10.128 - - [10/Jan/2025:09:42:38 +0000] "GET /agent.ashx HTTP/1.1" 502 4026 "-" "-"
192.168.1.1 - - [10/Jan/2025:09:42:48 +0000] "GET /agent.ashx HTTP/1.1" 502 4026 "-" "-"
86.126.10.128 - - [10/Jan/2025:09:42:58 +0000] "GET /agent.ashx HTTP/1.1" 502 4026 "-" "-"
192.168.1.1 - - [10/Jan/2025:09:43:09 +0000] "GET /agent.ashx HTTP/1.1" 502 4026 "-" "-"
86.126.10.128 - - [10/Jan/2025:09:43:18 +0000] "GET /agent.ashx HTTP/1.1" 502 4026 "-" "-"
192.168.1.1 - - [10/Jan/2025:09:43:30 +0000] "GET /agent.ashx HTTP/1.1" 502 4026 "-" "-"

@DaanSelen
Copy link

Apache2 reverse proxies are a bit more difficult with MeshCentral, if possible can you also try with NGINX? Just to isolate the problem? For quick deployment you can use NGINX Proxy Manager.

@mdalacu
Copy link
Author

mdalacu commented Jan 10, 2025

Unfortunately I have multipe sites configured on that machine an d i can't just switch the reverse proxy...i must fix it in apache :/
Thanks

@si458
Copy link
Collaborator

si458 commented Jan 10, 2025

@mdalacu the was meshagent works with its reconnects is after X miliseconds, it will try to reconnect again,
then if it doesnt work, it will wait another random amount of miliseconds before trying again
this is why it takes so long for the agents to register back again (looking into if we can change this behaviour later this year)

your logs are showing that your apache is struggling to connect to your meshcentral server
[Fri Jan 10 07:39:09.576121 2025] [proxy:error] [pid 44932] (111)Connection refused: AH00957: HTTPS: attempt to connect to 192.168.1.166:443 (192.168.1.166) failed so this is why your agents arent connecting

the OS you have says Ubuntu,
so you can check what port meshcentral is running on by using lsof -Pni and looking for meshcentral
OR use journalctl -xeu meshcentral.service and check what the last logs are to say what port its running on

then you can determine which ports are being using,
IF the ports arent port 443 for HTTPS then set those ports in config.json and set aliasPort: 443
this tells remote agents to connect back to you on port 443 and NOT the port you specified

@mdalacu
Copy link
Author

mdalacu commented Jan 10, 2025

Hi si458! Thank you for answering.
The site is working perfectly...so 443 it is.
Here is the output:

Jan 09 23:13:32 mdWireguardCT node[949]: MeshCentral HTTP redirection server running on port 80.
Jan 09 23:13:32 mdWireguardCT node[949]: MeshCentral v1.1.37, Hybrid (LAN + WAN) mode, Production mode.
Jan 09 23:13:37 mdWireguardCT node[949]: MeshCentral Intel(R) AMT server running on mc.mdonline.ro:4433.
Jan 09 23:13:37 mdWireguardCT node[949]: Server amtmanager has no users, next new account will be site administrator.
Jan 09 23:13:37 mdWireguardCT node[949]: MeshCentral HTTPS server running on mc.mdonline.ro:443.
Jan 09 23:13:37 mdWireguardCT node[949]: Loaded web certificate from "https://mc.mdonline.ro", host: "mc.mdonline.ro"
Jan 09 23:13:37 mdWireguardCT node[949]:   SHA384 cert hash: xxxxx>
Jan 09 23:13:37 mdWireguardCT node[949]:   SHA384 key hash: xxxxxx>
Jan 10 09:38:53 mdWireguardCT node[949]: Starting self upgrade to: 1.1.38
Jan 10 09:39:08 mdWireguardCT node[949]: Update completed...
Jan 10 09:39:11 mdWireguardCT node[949]: MeshCentral HTTP redirection server running on port 80.
Jan 10 09:39:11 mdWireguardCT node[949]: MeshCentral v1.1.38, Hybrid (LAN + WAN) mode, Production mode.
Jan 10 09:39:16 mdWireguardCT node[949]: MeshCentral Intel(R) AMT server running on mc.mdonline.ro:4433.
Jan 10 09:39:16 mdWireguardCT node[949]: Server amtmanager has no users, next new account will be site administrator.
Jan 10 09:39:16 mdWireguardCT node[949]: MeshCentral HTTPS server running on mc.mdonline.ro:443.
Jan 10 09:39:17 mdWireguardCT node[949]: Loaded web certificate from "https://mc.mdonline.ro", host: "mc.mdonline.ro"

I also attach the apache2 config Maybe you spot something wrong...

<IfModule mod_ssl.c>
<VirtualHost *:443>
    ServerAdmin admin@mdonline.ro
    ServerName mc.mdonline.ro
    ServerAlias
#    DocumentRoot /var/www/mc.mdonline.ro
    ErrorLog ${APACHE_LOG_DIR}/error.log
    CustomLog ${APACHE_LOG_DIR}/access.log combined

RewriteEngine on

ProxyPreserveHost On
RewriteCond %{HTTP:Upgrade} websocket [NC]
RewriteCond %{HTTP:Connection} upgrade [NC]
RewriteRule /(.*) "wss://192.168.X.Y/$1" [P,L]

ProxyPass / https://192.168.X.Y:443/ 
ProxyPassReverse / https://192.168.X.Y:443/


SSLProxyEngine On
SSLProxyVerify none
SSLProxyCheckPeerCN off
SSLProxyCheckPeerName off
SSLProxyCheckPeerExpire off

ProxyRequests off

Include /etc/letsencrypt/options-ssl-apache.conf
SSLCertificateFile /etc/letsencrypt/live/mc.mdonline.ro/fullchain.pem
SSLCertificateKeyFile /etc/letsencrypt/live/mc.mdonline.ro/privkey.pem
</VirtualHost>
</IfModule>

They are not on the same machine (reverse proxy and meshcentral ) but on the same internal lan.

Thank you

@si458
Copy link
Collaborator

si458 commented Jan 10, 2025

your config.json isnt quite setup correctly
you have created a domain called amtmanager by accident AND this should be inside of your domains
see below for your fix for this and restart meshcentral afterwards

{
  "$schema": "https://raw.githubusercontent.com/Ylianst/MeshCentral/master/meshcentral-config-schema.json",
  "__comment1__": "This is a simple configuration file, all values and sections that start with underscore (_) are ignored. Edit a section and remove the _ in front of the name. Refer to the user's guide for details.",
  "__comment2__": "See node_modules/meshcentral/sample-config-advanced.json for a more advanced example.",
  "settings": {
    "cert": "mc.mdonline.ro",
    "_WANonly": true,
    "_LANonly": true,
    "_sessionKey": "MyReallySecretPassword1",
    "_port": 443,
    "_aliasPort": 443,
    "_redirPort": 80,
    "_redirAliasPort": 80,
    "mpsPort": 4433,
    "TrustedProxy": "192.168.Y.Z",
    "agentping": 30
  },
  "domains": {
    "": {
      "_title": "MyServer",
      "_title2": "Servername",
      "_minify": true,
      "certUrl": "https://mc.mdonline.ro",
      "_newAccounts": true,
      "_userNameIsEmail": true,
      "AmtManager": {
        "AdminAccounts": [
          { "user": "admin", "pass": "XXXXX" },
          { "user": "admin", "pass": "XXXXX" }
         ]
       }
    }
  },
  "_letsencrypt": {
    "__comment__": "Requires NodeJS 8.x or better, Go to https://letsdebug.net/ first before trying Let's Encrypt.",
    "email": "myemail@mydomain.com",
    "names": "myserver.mydomain.com",
    "skipChallengeVerification": true,
    "production": false
  }
}

also what happens if you visit https://meshcentralip in your web browser?
do you get a self-signed cert and then the login page?

@mdalacu
Copy link
Author

mdalacu commented Jan 10, 2025

Thank you for correcting my config.json.
I have modify it but the result is the same...all agents with "local" in msh connects instantly..the others after many minnutes.
Yes, if i put directly the ip of the mescentral server i get the certificate warning since it is self signed and the I am presented with the login page.

What is wird and I forget to mention..all local agents are comming with an unique IP address but a V6 one, even if on the clients i have disable ipv6...

What else can I do?
Many thanks again!

@si458
Copy link
Collaborator

si458 commented Jan 10, 2025

the issue will be because the agents are struggling to connect to your server for some reason
so only way of testing is use the remote agents browser and see how quickly it loads up the web ui in their browser

also you seem to be having problems between your apache and your meshcentral server

86.126.10.128 - - [10/Jan/2025:09:43:18 +0000] "GET /agent.ashx HTTP/1.1" 502 4026 "-" "-"
192.168.1.1 - - [10/Jan/2025:09:43:30 +0000] "GET /agent.ashx HTTP/1.1" 502 4026 "-" "-"

its showing error 502 error from meshcentral which isnt a good sign

also 339435 miliseconds is 5(ish) mins before it will try to connect

so you just have to be patient and wait for them to reconnect OR restart the meshagent manually on the remote device

@mdalacu
Copy link
Author

mdalacu commented Jan 10, 2025

The site in the browser is instant opened on a remote device, a local one or on my prone through gprs.
The site is using websockets....and it is working...i see this problem only when a agent and not a browser is trying to connect.
Could be an user agent problem, a timmeout set in agent to short or what?
Please detail this because i don't know what you mean..
"so only way of testing is use the remote agents browser and see how quickly it loads up the web ui in their browser"
If you mean a browser running on the remote device trying to acces https://mc.mdonline.ro it is instant, regardles of where it came.

EDIT: If i restart the agent it will take the same for it to connect...> 15 min..even one hour

@mdalacu
Copy link
Author

mdalacu commented Jan 10, 2025

...But If i put the internal/external IP of the reverse proxy (which is exposed throuht 443 on thje internet) than i get an error....I think it is a normal behavior since i have multiple sites exposed.
Here is the error log in apache depending if iti is browser or agent:

192.168.1.1 - - [10/Jan/2025:11:46:33 +0000] "GET /agent.ashx HTTP/1.1" 404 491 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"
87.237.108.35 - - [10/Jan/2025:11:46:22 +0000] "GET /agent.ashx HTTP/1.1" 502 4026 "-" "-"

Is the e agent using the IP to connect to meshcentral in the url not the server name as it is written in msh file? Does it cache the name resolution to save same cpu clicks?

@si458
Copy link
Collaborator

si458 commented Jan 10, 2025

yes you WONT be able to see /agent.ashx in a web browser, as it ONLY accepts websocket connections,
if you load up the agent and click connection details, it will show you what URL its using,
if its local then it will ONLY work when the device is on the same network as the meshcentral server
if its a DNS name, just check again that device can see the login page of the DNS and port number it has set

@si458
Copy link
Collaborator

si458 commented Jan 10, 2025

i think your apache config is incorrect, try the config example from here #317 (comment)
i think its the RewriteRule thats wrong

 <IfModule mod_ssl.c>
         <VirtualHost *:443>
                 ServerAdmin webmaster@localhost
                 ServerName mesh.domain.tld
                 
		ProxyPreserveHost On
                ProxyPass "/" "https://mesh.domain.tld/"
                ProxyPassReverse "/" "https://mesh.domain.tld/"
                 
		RewriteEngine on
                RewriteCond %{HTTP:Upgrade} websocket [NC]
                RewriteCond %{HTTP:Connection} upgrade [NC]
                RewriteRule . "wss://mesh.domain.tld%{REQUEST_URI}" [P]
                 
		ErrorLog ${APACHE_LOG_DIR}/error.log
                CustomLog ${APACHE_LOG_DIR}/access.log combined
                 
		SSLEngine on
                SSLProxyEngine On
                SSLCertificateFile      /etc/ssl/certs/cert.pem
                SSLCertificateKeyFile /etc/ssl/private/key.key
                SSLProtocol TLSv1.2
                SSLCipherSuite EECDH+AESGCM:EDH+AESGCM
                SSLHonorCipherOrder on
 				
                <FilesMatch "\.(cgi|shtml|phtml|php)$">
                  SSLOptions +StdEnvVars
                </FilesMatch>
                 
 		<Directory /usr/lib/cgi-bin>
                SSLOptions +StdEnvVars
                Options FollowSymLinks
                AllowOverride All
                </Directory>
 				
         </VirtualHost>
 </IfModule>

@mdalacu
Copy link
Author

mdalacu commented Jan 10, 2025

I have modified the rewrite rule and the result is the same...

192.168.1.1 - - [10/Jan/2025:12:26:47 +0000] "GET /agent.ashx HTTP/1.1" 502 4026 "-" "-"
86.126.10.128 - - [10/Jan/2025:12:27:00 +0000] "GET /agent.ashx HTTP/1.1" 502 4026 "-" "-"
192.168.1.1 - - [10/Jan/2025:12:27:08 +0000] "GET /agent.ashx HTTP/1.1" 502 4026 "-" "-"
18.228.222.118 - - [10/Jan/2025:12:27:33 +0000] "GET / HTTP/1.1" 200 743 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 16_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.5 Mobile/15E148 Safari/604.1"
86.126.10.128 - - [10/Jan/2025:12:27:21 +0000] "GET /agent.ashx HTTP/1.1" 502 4026 "-" "-"
192.168.1.1 - - [10/Jan/2025:12:27:28 +0000] "GET /agent.ashx HTTP/1.1" 502 4026 "-" "-"

If you mean GUI - Agent - Consloe - info..then the Server URL it is ok...
No more ideas what i could try...

@mdalacu
Copy link
Author

mdalacu commented Jan 10, 2025

Does meshcentral make use of port 80...because I don't do anything on the reverseproxy side to it...only redirect to 443 ?.
like this:

RewriteEngine On
RewriteCond %{SERVER_NAME} =mc.mdonline.ro
RewriteRule ^ https://%{SERVER_NAME}%{REQUEST_URI} [END,NE,R=permanent]

Thx.

@mdalacu
Copy link
Author

mdalacu commented Jan 10, 2025

@si458
I have managed to get it working by using tlsOffload instead of TrustedProxy and changing the apace reverse proxy accordantly. (to use http to backend and ws insted of https/wss)
No all the agents connects instantly <5s BUT "local" ones ..no!
Should even work agents in mode "local" with tlsOffload?

Thank you very much for your support and Mesh Central is the best!!!

@si458 si458 closed this as completed Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants