Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R2S rousette crash on boot if LAN not connected #751

Open
minexn opened this issue Oct 22, 2024 · 8 comments
Open

R2S rousette crash on boot if LAN not connected #751

minexn opened this issue Oct 22, 2024 · 8 comments
Labels
bug Something isn't working
Milestone

Comments

@minexn
Copy link

minexn commented Oct 22, 2024

Current Behavior

The system booted with WAN connected and LAN disconnected.

lan             ethernet   DOWN        b2:b2:41:23:71:82                        
                ipv4                   192.168.2.1/24 (static)
wan             ethernet   UP          b2:b2:41:23:71:83                        
                ipv4                   10.10.3.101/24 (dhcp)

Rousette keeps crashing

Oct 22 06:47:02 r2s rousette[1883]: [2024-10-22 06:47:02.245] [rousette] [info] NACM config validation: Anonymous user access disabled 
Oct 22 06:47:02 r2s rousette[1883]: [2024-10-22 06:47:02.248] [rousette] [warning] Telemetry disabled. No CzechLight YANG modules found. 
Oct 22 06:47:07 r2s finit[1]: Service rousette keeps crashing, not restarting.

The system booted with WAN and LAN connected.

lan             ethernet   UP          b2:b2:41:23:71:82                        
                ipv4                   192.168.2.1/24 (static)
wan             ethernet   UP          b2:b2:41:23:71:83                        
                ipv4                   10.10.3.101/24 (dhcp)

Rousette starts and responds to queries.

Oct 22 06:49:09 r2s rousette[1542]: [2024-10-22 06:49:09.645] [rousette] [info] NACM config validation: Anonymous user access disabled 
Oct 22 06:49:09 r2s rousette[1542]: [2024-10-22 06:49:09.662] [rousette] [warning] Telemetry disabled. No CzechLight YANG modules found. 

Expected Behavior

Rousette starts and responds to queries.

Oct 22 06:49:09 r2s rousette[1542]: [2024-10-22 06:49:09.645] [rousette] [info] NACM config validation: Anonymous user access disabled 
Oct 22 06:49:09 r2s rousette[1542]: [2024-10-22 06:49:09.662] [rousette] [warning] Telemetry disabled. No CzechLight YANG modules found. 

Steps To Reproduce

load v24.10.1
unplug LAN
reboot
check log

Additional information

Factory configuration

@minexn minexn added bug Something isn't working triage Pending investigation & classification (CCB) labels Oct 22, 2024
@troglobit
Copy link
Contributor

troglobit commented Oct 23, 2024

Reproduced on my R2S:

Oct 23 05:35:26 r2s finit[1]: Service rousette[2080] died, restarting in 5000 msec (10/10)
Oct 23 05:35:27 r2s finit[1]: Starting rousette[2163]
Oct 23 05:35:27 r2s rousette[2163]: [2024-10-23 05:35:27.538] [rousette] [info] NACM config validation: Anonymous user access disabled 
Oct 23 05:35:27 r2s rousette[2163]: [2024-10-23 05:35:27.541] [rousette] [warning] Telemetry disabled. No CzechLight YANG modules found. 
Oct 23 05:35:27 r2s rousette[2163]: terminate called after throwing an instance of 'std::runtime_error'
Oct 23 05:35:27 r2s rousette[2163]:   what():  Server error: Host not found (authoritative)
Oct 23 05:35:32 r2s finit[1]: Service rousette keeps crashing, not restarting.

@troglobit troglobit removed the triage Pending investigation & classification (CCB) label Oct 23, 2024
@troglobit
Copy link
Contributor

Workaround, as suggested by @mattiaswal, helps:

admin@r2s:/cfg$ diff backup.cfg startup-config.cfg 
--- backup.cfg
+++ startup-config.cfg
@@ -39,7 +39,8 @@
       },
       {
         "name": "wan",
-        "type": "infix-if-type:ethernet"
+        "type": "infix-if-type:ethernet",
+        "ietf-ip:ipv6": {}
       }
     ]
   },

@troglobit
Copy link
Contributor

If I try to mimic the same setup in Qemu, using the x86_64 build, by disabling ipv6 on all ethernet interfaces, I cannot reproduce the problem. Very odd, need to discuss this further with @mattiaswal.

@troglobit
Copy link
Contributor

After discussions with @mattiaswal and the rest of core team, we decided yesterday to check if this was an issue also with the standard aarch64 builds on tier one customer HW (Marvell CRB derivatives).

These tests were concluded this morning, without any problems.

So, it seems this issue is localized to the R2S build.

@sgsx3
Copy link
Contributor

sgsx3 commented Oct 26, 2024

Also had that issue with rousette bailing out with:

rousette[1957]: terminate called after throwing an instance of 'std::runtime_error'
rousette[1957]: what(): Server error: Host not found (authoritative)

Turns out that the boost library is not willing to resolve a numeric IPv6 host (::1) because its resolver flags are set to address_configured by default.
See https://www.boost.org/doc/libs/1_83_0/doc/html/boost_asio/reference/ip__resolver_base.html for more info.

The following patch resolved it for me:

--- nghttp2-asio-e877868abe06a83ed0a6ac6e245c07f6f20866b5/lib/asio_server.cc
+++ nghttp2-asio-e877868abe06a83ed0a6ac6e245c07f6f20866b5/lib/asio_server.cc
@@ -82,8 +82,13 @@ boost::system::error_code server::bind_and_listen(boost::system::error_code &ec,
   // Open the acceptor with the option to reuse the address (i.e.
   // SO_REUSEADDR).
   tcp::resolver resolver(io_service_pool_.get_io_service());
+
   tcp::resolver::query query(address, port);
   auto it = resolver.resolve(query, ec);
+  if (ec) {
+    tcp::resolver::query query(address, port, boost::asio::ip::resolver_query_base::numeric_host);
+    auto it = resolver.resolve(query, ec);
+  }
   if (ec) {
     return ec;
   }

@troglobit
Copy link
Contributor

Nice catch! Do you think you could try and get this patch in upstream so we can use a backport of that in Infix? A bit unsure of the state of that upstream though, do you know more @mattiaswal?

sgsx3 added a commit to sgsx3/infix that referenced this issue Oct 28, 2024
The boost library refuses to resolve a numeric IPv6 host (::1) because its resolver flags are set to 'address_configured' by default.
This patch simply runs an additional query in such a case with flags set to 'numeric_host'.

See https://www.boost.org/doc/libs/1_83_0/doc/html/boost_asio/reference/ip__resolver_base.html for more info.

Fixes kernelkit#751

Signed-off-by: Stefan Schlosser <sgs@grmmbl.org>
sgsx3 added a commit to sgsx3/infix that referenced this issue Oct 28, 2024
@troglobit troglobit added the triage Pending investigation & classification (CCB) label Feb 15, 2025
@troglobit
Copy link
Contributor

Bumping back to CCB

@jovatn
Copy link
Contributor

jovatn commented Feb 17, 2025

CCB: Retest with buildroot 25.02.

@jovatn jovatn removed the triage Pending investigation & classification (CCB) label Feb 17, 2025
@jovatn jovatn added this to the Infix v25.03 milestone Feb 17, 2025
@jovatn jovatn modified the milestones: Infix v25.03, FUTURE Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

4 participants