Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tunnel broker fails intermittently preventing nodes from connecting to the internet #8

Closed
paidforby opened this issue Dec 8, 2017 · 61 comments

Comments

@paidforby
Copy link

Mesh home nodes (mynet N600 or mynet N750) do not seem to be meshing with one another over their ad-hoc interface regardless of sudowrt firmware build (april build, new build) or version of makenode used (old commit, newest commit). Last I recall being able to mesh was in mid-September when working on battery powered, sneaker nodes.

To reproduce:

  1. Flash two nodes with any build of sudowrt-firmware and run makenode
  2. Connect one node to the internet via your home router.
  3. Connect your computer to the other node via its "peoplesiopen.net " SSID
  4. Attempt to ping a location on the internet, for example:
ping archlinux.org

or

ping 8.8.8.8

Expected result:

an internet connection

64 bytes from apollo.archlinux.org (138.201.81.199): icmp_seq=3 ttl=52 time=204 ms

Actual result:

no internet connection

ping: archlinux.org: Temporary failure in name resolution

or

PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
From 100.65.7.129 icmp_seq=1 Destination Net Unreachable

where 100.65.7.129 is the IP of the home node

Other observations:

pplsopen.net-node2node SSID is visible from computer WiFi list.

@jhpoelen
Copy link
Contributor

jhpoelen commented Dec 8, 2017

@paidforby thanks for reporting this. This is consistent what I have seen during experimental mesh setups using n600/n750 over the last weeks.

@paidforby
Copy link
Author

Issue was actually caused by exit node being in a 'weird' state? Can we document the solution an add it to the operator's manual?

@jhpoelen
Copy link
Contributor

jhpoelen commented Dec 13, 2017

Yes!

It appears that the issue was caused by some sort of VPN issue on the exit node.

On Monday night, as I was debugging a home node mesh setup, I was able to mesh with local sudoroom nodes, but wasn't able to find a route to the internet. However, after @Juul applied some magic (VPN service restart on exit node?), my test nodes were able to ping public ip addresses. Also, an apparent related issue was resolved after this assumed reset (see https://sudoroom.org/pipermail/mesh/2017-December/002719.html).

Ideally we'd reproduce and then fix the issue, document in operator's manual like you suggested.

@jhpoelen
Copy link
Contributor

I may have produced the issue as I was building the https://github.com/sudomesh/tunneldigger-lab . Steps to reproduce

  1. dig a tunnel to exit node using tunneldigger client
  2. stop / start a bunch of time

expected -
tunnel sessions go up and down, interface l2tp0 appears and disappears

actual -
initially, behavior is as expected, but after a couple of cycles, syslog on client machine has entries like -

Dec 18 22:24:54 lightgrey td-client: Performing broker selection...
Dec 18 22:25:16 lightgrey td-client: No suitable brokers found. Retrying in 5 seconds
Dec 18 22:25:21 lightgrey td-client: Performing broker selection...
Dec 18 22:25:41 lightgrey td-client: No suitable brokers found. Retrying in 5 seconds
Dec 18 22:25:46 lightgrey td-client: Performing broker selection...
Dec 18 22:25:55 lightgrey td-client: Got termination signal, shutting down tunnel

@Juul - can you confirm that broker on exit node is no longer allow for the digging of tunnels?

@paidforby
Copy link
Author

I am seeing this same behavior from tunneldigger-lab. Also, I have some newly flashed home nodes that are not getting to the internet. Has the exit node been restarted since @jhpoelen reproduced the original issue?

@jhpoelen
Copy link
Contributor

jhpoelen commented Dec 26, 2017

@paidforby thanks for reproducing ... it seems like we may be on to something. Perhaps the next step would be to setup a test exit node to reproduce. Note sure about the tunneldigger broker restart on exit node. Perhaps @Juul knows, he's operating the broker. This does strike me as something we should address sooner rather than later.

@jhpoelen
Copy link
Contributor

jhpoelen commented Jan 3, 2018

After @Juul restarted the node, I was able to big tunnels again using https://github.com/sudomesh/tunneldigger-lab#digging-a-tunnel . This suggests that the tunnel broker hangs at some point after opening/closing tunnel cycles.

@jhpoelen jhpoelen changed the title mesh isn't meshing tunnel broker fails intermittently Jan 3, 2018
@jhpoelen jhpoelen changed the title tunnel broker fails intermittently tunnel broker fails intermittently preventing nodes from connecting to the internet Jan 3, 2018
@paidforby
Copy link
Author

Another outage of the exit node today. Discovered while following service guide and attempting to ping a locally hosted service, as in this use case.
Some observations come to mind:

  1. Why is there only one exit node? What can we do to get another, backup exit node running?
  2. Why do we need an exit node to route to services? Is there a better (read: more distributed way) of getting nodes to know about each over a WAN? How can we prevent the exit node from hindering development of local services?

To me, tunneldigger appears broken (at least the way we're using it) and ill-fit for "virtual meshing" to begin with (it's a start topology right? not very meshy). I am interested in exploring alternatives. Any suggestions?

@jnny
Copy link
Member

jnny commented Feb 4, 2018

Re: point 1: I can't answer with as much authority as @Juul (who'll probably weigh in later today), but it was my understanding we were switching to Hurricane Electric (HE) for the exit node and setting up another one at sudo room for backup purposes. Last I remember an update on the server at HE was in December, @jtremback and @Juul were coordinating setting it up. I believe there were 2 gigabit switches in sudo that we were going to set up on our rack up in sudo and in the HE cabinet.

there's also this guide to setting up an exit node that we should probably update: https://sudoroom.org/wiki/Mesh/Exit_setup

Re: point 2: I believe @papazoga was working on a tunneldigger alternative here: https://github.com/sudomesh/foutun but i'm not aware of its status.

side note: juul has been paying for the exit node and the dev droplet at ~$150/month out of pocket for about a year and a half now. Might be time to switch that cost over to sudomesh.

@bennlich
Copy link
Collaborator

bennlich commented Feb 4, 2018

Hey there. I think I am seeing the same bug on a home node that I recently flashed. I've connected the home node to my home router and am able to access the internet via the private SSID. I see two public SSIDs, peoplesopen.net 65.28.1 and peoplesopen.net 65.28.1 fast, neither of which provide internet access.

When I restart tunneldigger with /etc/init.d/tunneldigger restart, I see this in /var/log/messages:

root@pattyspuddles:~# tail -f /var/log/messages | grep td-client
Sun Feb  4 12:18:56 2018 daemon.warn td-client: Got termination signal, shutting down tunnel...
Sun Feb  4 12:19:06 2018 daemon.info td-client: Performing broker selection...
Sun Feb  4 12:19:09 2018 daemon.info td-client: Selected 45.34.140.42:8942 as the best broker.
Sun Feb  4 12:19:12 2018 daemon.info td-client: Tunnel successfully established.
Sun Feb  4 12:19:12 2018 daemon.info td-client: Requesting the broker to configure downstream bandwidth limit of 1000 kbps.

ping -I mesh5 8.8.8.8 times out (though I am not clear on whether that's the correct interface to use to test my connectivity to the mesh via the tunnel)
ping -I l2tp0 8.8.8.8 also times out (again, not positive if this ping should work even when tunneldigger is working as expected)

@jhpoelen
Copy link
Contributor

jhpoelen commented Feb 6, 2018

Some days ago, tunneldigger issues became more pronounced, with all (?) node getting disconnected.

@Juul shared the following server config -


Something changed that broke things and I have no idea what. Possibly the kernel.

I tried switching the server to the new version of the tunneldigger server (tunneldigger broker) and after fixing a small bug it worked but only with the new version of the tunneldigger client.

With the new server and old client the server can send to the client over the tunnel but the client cannot send to the server.

With the old server even getting a tunnel fails most, but not all, of the time. Even when it succeeds no traffic can pass through it in either direction.

This could have been simple to troubleshoot if the server (written in python) was just creating these tunnels with calls to the ip l2tp command, but of course they implemented it using a netlink socket...


@yardenac
Copy link

yardenac commented Feb 7, 2018

I reverted the kernel back to 3.16.0-4-amd64 and that didn't solve the problem. So probably not a kernel issue. If meltdown fixes cause your setup to break you're probably doing something wrong anyway... :)

@jhpoelen
Copy link
Contributor

jhpoelen commented Feb 8, 2018

Am making slow progress on reproducing the broker outage (with help of Benny!). I've gotten to a point where I can provision a 1GB/1vCPU digital ocean droplet resulting in a running babeld and tunneldigger daemon on ubuntu 16.04 (see https://github.com/jhpoelen/exitnode ). When I attempted doing this on latest debian (9.3 x64) offered through digital ocean, tunneldigger crashed with segmentation fault. Am now blocked by my ability to configure a home node to correct through my hotspot to test a newly minted exitnode on digital ocean, despite the somewhat detailed instructions at https://gist.github.com/957855bb5841100109eaeb90e8c6b01b . Hoping to work with others with functioning home nodes unless I can figure out a way to setup my node.

@jhpoelen
Copy link
Contributor

jhpoelen commented Feb 8, 2018

Using the "new" tunneldigger setup described in https://github.com/sudomesh/tunneldigger-lab , I was able to start a udp tunnel (at least as far as I understand).

On client, I used https://github.com/sudomesh/tunneldigger-lab#digging-a-tunnel to start a tunnel using the droplet ip.

On server, after starting the client, an l2tp* interface is created

#ip addr | grep l2tp
4: l2tp1011: <BROADCAST,MULTICAST> mtu 1446 qdisc noop state DOWN group default qlen 1000

and udp packets are traveling back and forth, as monitored on client

$sudo tcpdump | grep UDP
11:48:20.696672 IP [DROPLET IP].8942 > [CLIENT HOST].44310: UDP, length 12
11:48:23.699293 IP [CLIENT HOST].44310 > 138.197.181.87.8942: UDP, length 12
11:48:25.816708 IP [DROPLET IP].8942 > [CLIENT HOST].44310: UDP, length 12

@bennlich
Copy link
Collaborator

bennlich commented Feb 8, 2018

@jhpoelen In what you pasted, it looks like that l2tp interface is down though, no?

@jhpoelen
Copy link
Contributor

jhpoelen commented Feb 8, 2018

Note, however, that the connecting the same client to the (misbehaving) exit node at exit.sudomesh.org , bidirectional UDP traffic is also seen using tcpdump on the client.

@jhpoelen
Copy link
Contributor

jhpoelen commented Feb 8, 2018

@bennlich yep! perhaps related to babeld issues, trying to figure out why tunneldigger actually writes its logs as configured.

@jhpoelen
Copy link
Contributor

jhpoelen commented Feb 8, 2018

With some patches in tunneldigger and exitnode babeld config, I was able to get the interface to become active with a straight run of create_exitnode in https://github.com/jhpoelen/exitnode :

#ip addr 
...
3: l2tp1001: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1446 qdisc pfifo_fast state UNKNOWN group default qlen 1000
    link/ether 26:57:0b:92:20:e1 brd ff:ff:ff:ff:ff:ff
    inet 100.64.0.42/32 scope global l2tp1001
       valid_lft forever preferred_lft forever
    inet6 fe80::2457:bff:fe92:20e1/64 scope link 
       valid_lft forever preferred_lft forever
...

I ran ping -I l2tp0 8.8.8.8 on client machine running newer version of the tunneldigger client, but ... no cigar.

I guess the next step is figure out how the routing is supposed to work on the exit node.

Also, babeld seems to be doing something. Note that there's no babeld running on the client.

# babeld -i
Listening on interfaces: l2tp1001 

My id 90:9c:cf:78:0c:3f:19:c1 seqno 64785
0.0.0.0/0 metric 0 (exported)
100.64.0.42/32 metric 0 (exported)

@bennlich
Copy link
Collaborator

bennlich commented Feb 9, 2018

@jtremback @Juul I'm also trying to debug the broker (slowly but surely?)

I'm trying to wrap my head around which broker services are responsible for routing packets to the internet. I think my naive assumption is that packets arriving on a tunnel interface with a non-local destination IP should get sent through the default gateway because of the default system routing rule:

default via 128.199.32.1 dev eth0 onlink 

This doesn't seem to be happening though. When I ping -I l2tp0 8.8.8.8 from a tunnel client to a broker, I see ARP packets coming in on the broker's tunnel interface:

root@ubuntown:~# tcpdump -i l2tp100-100
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on l2tp100-100, link-type EN10MB (Ethernet), capture size 262144 bytes
20:58:54.630216 ARP, Request who-has google-public-dns-a.google.com tell 100.65.26.1, length 28

and I do not see them end up on the eth0 interface.

Off the top of your head, do you know what services are responsible for this part of the routing? From tunnel interface -> eth0 via default gateway (and back)?

@jhpoelen
Copy link
Contributor

jhpoelen commented Feb 9, 2018

I second @bennlich question. Am stuck on reverse engineering how the exitnode routes requests from tunnels to wan.

@jhpoelen
Copy link
Contributor

jhpoelen commented Feb 10, 2018

Was able to ping to 8.8.8.8 through tunneled and babeld routed client using static route via exitnode running on droplet. See https://github.com/jhpoelen/exitnode/blob/master/README.md#testing-routing-with-babeld-through-tunnel-digger .

@yardenac @Juul I noticed that babeld is sensitive to default route having static protocol. Had to change default via 207.154.192.1 dev eth0 onlink to default via 207.154.192.1 dev eth0 proto static to make babeld happy (see https://github.com/jhpoelen/exitnode/blob/d6c180c45069d4c0c26986c5a58a32a1f2dde8fe/create_exitnode.sh#L105 ). Can you check and share default route for current exit node ?

@Juul
Copy link
Member

Juul commented Feb 10, 2018

I second @bennlich question. Am stuck on reverse engineering how the exitnode routes requests from tunnels to wan.

ARP packets are layer 2. They are not routed. The fact that @bennlich is seeing ARP packets on the tunnelbroker's end of the tunnel asking for the MAC address for 8.8.8.8 tells me that the tunneldigger client does not have a sane default route set. The tunneldigger client should have assigned an IP to its end of the tunnel and set that IP as its default route. The tunneldigger broker should have set that same IP on its end of the tunnel and should of course also have a default route leading to the internet and IPv4 forwarding enabled.

@jhpoelen
Copy link
Contributor

jhpoelen commented Feb 10, 2018

@Juul this is consistent with my earlier message #8 (comment) , where I manually installed a route to 8.8.8.8 via tunnel on my laptop. It looks like the exitnode setup created with https://github.com/jhpoelen/exitnode looks promising and I am hoping that someone can help test this with an actual home node.

@Juul @yardenac can you please share the default route configuration for the current exit node?

@yardenac
Copy link

$ ip route
default via 45.34.140.41 dev eth1
[...]

@jhpoelen
Copy link
Contributor

jhpoelen commented Feb 12, 2018

@yardenac thanks for sharing - I am hoping to do some more testing with various default routes on the exit node.

Meanwhile, at the BYOI office hours today, we did got a node running and connecting through the "big" internet using a digital ocean droplet configured using the automated "create_exitnode.sh" script in https://github.com/jhpoelen/exitnode .

A home node (aka "goat") was configured using instructions at https://peoplesopen.net/walkthrough and https://github.com/jhpoelen/exitnode/#configure-home-node-to-use-exit-node and plugged into sudoroom ethernet. For some reason, I had to manually configure the default route to the exitnode on the home node (aka "goat") in the "public" routing table using ip route add default via 100.64.0.42 dev l2tp0 proto babel onlink table public. After adding this route in the public routing table, connections through peoplesopen.net ssids were able to be routed to the "big" internet.

Further investigation is needed why babeld doesn't install the default routes in the home node "public" routing table by itself.

@jhpoelen
Copy link
Contributor

I just rebooted the main exit node (again) after the number of gateways dropped to 13. I've attached screenshots of pre- and post-reboot status of https://peoplesopen.net/monitor . I noticed no unusual error messages in the recently re-enabled log entries of sorts so am still unclear about the root cause of this transient behavior.

screenshot from 2018-03-16 16-23-30

screenshot from 2018-03-16 16-46-09

@bennlich
Copy link
Collaborator

@jhpoelen did you happen to notice what id # the exit node tunnel interfaces were up to? or whether your home node was able to reconnect?

If you notice this again, feel free to ping me and maybe we'll notice something about the state of the exit node by poking around together. Or we'll notice some info that is missing from the logs that would be good to add.

@jhpoelen
Copy link
Contributor

tunnel ids ~ 481 . so interface ids where something like l2tp4811 .

@mitar
Copy link
Member

mitar commented Mar 17, 2018

Have you people upgraded to the latest Tunneldigger or are you debugging here still an old version?

@bennlich
Copy link
Collaborator

bennlich commented Mar 17, 2018

@mitar We people are debugging still an old version :-)

@jhpoelen
Copy link
Contributor

I was thinking to upgrade with a reverse patch to set session ids to 1. This would keep the old client working ok, but would getting any stability improvements otherwise.

@bennlich
Copy link
Collaborator

@mitar I have not yet looked at bugs reported in the parent tunneldigger repo. Does "clients dropping off over time and being unable to reconnect until the broker is rebooted" ring any bells?

@jhpoelen
Copy link
Contributor

@mitar what commit would be good to work off of?

@mitar
Copy link
Member

mitar commented Mar 17, 2018

I think the latest master is probably the best.

I would suggest you run two tunnel digger instances in parallel, an old version and new version. I would advise against patching with a reverse patch.

@mitar
Copy link
Member

mitar commented Mar 17, 2018

Does "clients dropping off over time and being unable to reconnect until the broker is rebooted" ring any bells?

Not really. We had both new and old versions running for months.

@jhpoelen
Copy link
Contributor

Upgrading would be nice, especially if we people would know that it would solve the root cause. Right now, we don't know what the root cause is.

@mitar
Copy link
Member

mitar commented Mar 17, 2018

So this is why you can run it in parallel and see if it happens with new version.

But I agree, understanding and learning is important as well. But also running a working network. There is always a trade off. :-(

@jhpoelen
Copy link
Contributor

Right now, if we run the new version, it would only be able to access a single connection due to the session id issue, and this is after we patch all the clients to include a new list of exit nodes. And yes, running a network is nice: that is why upgrading with a network full of old client is not really feasible. This is why I suggested the reverse patch for the time being.

@jhpoelen
Copy link
Contributor

Just trying to figure out how to resolve this with the limited resources and access we have.

@mitar
Copy link
Member

mitar commented Mar 17, 2018

and this is after we patch all the clients to include a new list of exit nodes.

Ehm. Not sure if I agree with this. Maybe I am misunderstanding something. You create a new VM, with new kernel, you install a new Tunneldigger there. And then you start adding this IP to whoever installs a new node or upgrade an old one. There is no need for "patch all the clients" moment. You just go slowly as things happen organically. But having two (or more) Tunneldiggers is nice anyway. Because if one hangs like this one now, routes would go through others.

So, exactly because of the fact that the old clients do not know about the new VPN server, means that they will not try to connect it and have an issue with one session ID. And the new clients can then connect to new VPN server and use it.

Or are you trying to say that the updated clients will not be able to connect both to the new and old server at the same time, because they are incompatible. I am not sure about this, but maybe you can run two versions of client code on the node at the same time.

This is why I suggested the reverse patch for the time being.

i mean, if people are OK running with old kernel version, then I would guess we could even make this a command line switch in main codebase. Do you want unique session IDs or only one session ID.

@jhpoelen
Copy link
Contributor

@mitar having a gradual transition makes sense. I guess I am just hung up on rescuing the existing nodes from the transient connection issues we have now. I agree that running two (or more) brokers is good practice anyway.

As far as running the old kernel version - I don't mind giving this one-session parameter as command-line switch a go, because it would allow us to test whether the "old" tunnel digger is in fact responsible for the root cause.

Thanks for being patient.

@jhpoelen
Copy link
Contributor

I've rebuild the firmware with sudomesh/nodewatcher-firmware-packages@d4e3b9f , flashed a home node with it, setup a recent version of tunnel digger on a digital ocean droplet and . . . and am now using the connection to write this comment.

@bennlich
Copy link
Collaborator

I think the bug that started this issue is resolved as of (#8 (comment)), and we can safely close this issue and open a new issue for the new bug that we've been monitoring the last few weeks.

The new bug is: routes from connected nodes eventually disappear from the exit node routing table (see all comments from #8 (comment) onward in this issue).

I did some poking around the exit node tonight, and I think the new bug we've been tracking is somewhere in babeld. My home node was successfully able to dig tunnels to the exit node, and a tcpdump on its tunnel interface showed healthy babel-ing from my home node to the exit node:

bennlich@exit:~$ sudo tcpdump -i l2tp9111
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on l2tp9111, link-type EN10MB (Ethernet), capture size 262144 bytes
03:15:08.600344 IP6 fe80::3008:41ff:feba:6408.6696 > ff02::1:6.6696: babel 2 (14) hello
03:15:12.293335 IP6 fe80::3008:41ff:feba:6408.6696 > ff02::1:6.6696: babel 2 (50) hello nh router-id update
03:15:16.589159 IP6 fe80::3008:41ff:feba:6408.6696 > ff02::1:6.6696: babel 2 (14) hello
03:15:19.849259 IP6 fe80::3008:41ff:feba:6408.6696 > ff02::1:6.6696: babel 2 (14) hello
03:15:23.404500 IP6 fe80::3008:41ff:feba:6408.6696 > ff02::1:6.6696: babel 2 (14) hello
03:15:27.854322 IP6 fe80::3008:41ff:feba:6408.6696 > ff02::1:6.6696: babel 2 (50) hello nh router-id update
03:15:29.385882 IP6 fe80::3008:41ff:feba:6408.6696 > ff02::1:6.6696: babel 2 (26) update/id hello
03:15:29.389076 IP6 fe80::3008:41ff:feba:6408.6696 > ff02::1:6.6696: babel 2 (26) update/id hello

Unfortunately, the exit node never babelled back. babeld -i showed that babeld was indeed listening on the tunnel interface connected to my home node, but no routes were installed, and there was no babel traffic directed at my home node.

I.e. my node was babeling into the existential abyss :-( :-( :-(

I was able to fix the bug by restarting first babeld (sudo service babeld restart), and then the tunneldigger broker (sudo systemctl restart tunneldigger). (The latter restart simply triggered reconnects from all home nodes.) I'm guessing this is functionally the same fix that @jhpoelen saw when restarting the machine.

I noticed that there is exactly one node that seems to tear down and recreate its tunnel every 5 minutes. My current guess is: after enough up_hook.shs (i.e. babeld -as), babeld eventually gets into a state where it is incapable of babeling on new interfaces. My next steps are to dig into babeld to try to find where/how this might be happening.

@jhpoelen
Copy link
Contributor

@bennlich thanks for your careful debugging and sharing of your observations. This is consistent with what we've been seeing. I feel you are honing in on the original issue (before the system upgrade that prevented more than one tunnels from being created due to fixing of a l2tp bug). However, opening a new issue might be nice to start with a fresh and focused thread.

@bennlich
Copy link
Collaborator

the system upgrade that prevented more than one tunnels from being created due to fixing of a l2tp bug

@jhpoelen ah okay I thought the system upgrade was the original issue. There was an issue before that?

In any case, I think starting a new issue could be good, as this one has a bunch of infos / observations that are not related to the babelling problem (e.g. the title is about tunneldigger).

I'll open a new issue. Am no longer certain this one is resolved--will let someone else determine that.

@jhpoelen
Copy link
Contributor

I agree that the title is misleading. The system upgrade issue appeared after this issue was originally created. Please take action as you see fit.

@jhpoelen
Copy link
Contributor

To summarize:

As far as I know, two issues related to exit node connectivity were discussed in this issue:

  1. a kernel module bug was fixed in recent debian l2tp package, which lead preventing the "old" tunneldigger from accepting more than 1 tunnel (tunnel broker fails intermittently preventing nodes from connecting to the internet #8 (comment)). Resolution: upgrade tunnel digger broker/client. A new exit node is now active running the new tunneldigger, while the original exit node was downgraded to continue to run an older version of tunnel digger. Firmware version 0.2.3+ contain the new tunneldigger client and are configured to connect the the next exit node first. Outstanding issue: upgrade the original exit node after moving upgrading existing home nodes over to the newer tunnel digger client. A new, specific issue was created to cover this: upgrade/ retire psychz exit node #28 .

  2. a apparent memory leak in our fork of babeld (see setsockopt out of memory causes babeld failure #24). Resolution: a workaround was introduced that monitors exit node logs for errors in order to restart babeld when needed. Outstanding issue: determine root cause of the memory outage and fix underlying issue.

With this, I am closing this issue. Thanks for all that chimed in! If I missed anything, please feel free to comments / re-open.

@paidforby
Copy link
Author

@jhpoelen Thanks for all the work on this bug and great job summarizing our solution (i had gotten a little lost back around the 16th comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants