Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve resilience of wireless service #143

Closed
earlchew opened this issue Feb 5, 2017 · 30 comments
Closed

Improve resilience of wireless service #143

earlchew opened this issue Feb 5, 2017 · 30 comments

Comments

@earlchew
Copy link
Contributor

earlchew commented Feb 5, 2017

As a user, I want to have access to the volumio wifi hotspot when I do have use of any wifi networks, so that I will always have some means to connect to the volumio service.

  • If the volumio service has not been configured, the volumio wireless service shall start the wifi hotspot first.
  • When the volumio service has been configured, the volumio wireless service shall start the wifi client first.
  • If the wifi hotspot has been running for 15s and is not associated with any wifi clients, the volumio wireless service shall stop the wifi hotspot and start the wifi client.
  • If the wifi client has been running for 30s and is not servicing any active TCP sessions, and If the wifi hotspot is not disabled in the volumio configuration, the volumio wireless shall stop the wifi client and start the wifi hotspot.
  • The volumio wireless service shall operate correctly alongside any other available network connection (eg Ethernet).
  • The wifi hotspot shall use the IPv4 address 198.18.0.1/15 so that it will never collide with any production address assigned to any other network connection (eg Ethernet).
  • The wifi hotspot shall support up to 8 DHCP clients.

Related Issues:

Use Cases

  • New volumio installation
  • New wireless site
  • No matching wireless sites
  • Access point drops out (eg wifi router reboots, portable volumio moves out of range)
  • Ethernet cable is connected
@earlchew
Copy link
Contributor Author

An initial version implementing the above can be reviewed here: master...earlchew:issue-143

  • Remove wlan0 from netplug, etc, and delegate all wireless operations to wireless.js
  • Use dhcpcd exclusively as a DHCP client (do not use dhclient) for wlan0 (hotspot) and eth0
  • Use dnsmasq for DNS and DHCP server (do not use dhcpd) for wlan0 (client)
  • Use systemd to maintain, stop, start and restart wireless services
    • Use wireless.service to coordinate wireless networking
    • Introduce wireless-hotspot and wireless-hotspot-dnsmasq for hotspot services
    • Introduce wireless-client and wireless-client-dhcpcd for client services
  • Rework hotspot.sh to focus on starting either hostapd or hostapd-edimax
  • Focus wireless.js on using systemd to start or stop the hotspot or client service, and monitoring active network for activity
  • Support the current true and false setting for enable_hotspot, as well as the proposed Off, On, and Auto settings

@macmpi
Copy link
Contributor

macmpi commented Feb 15, 2017

Thanks @earlchew this is really helpful.
Did some research also few months ago on this troublesome matter, and came across a very well regarded (and supported) solution to perform hostpot function mentioned in archlinux wiki: create_ap.
This script (currently in bash, rewrite in progress in ruby), does many of the necessary HW capability checks and fail-safe measures. It also operates without modifying system settings (does it's setup in temporary directories), and manages special cases like Edimax.

I think it can probably bring a very mature, tested, and supported way to implement hotspot function within Volumio, without too much custom rework, and bug crawling through many tricky issues.
Interested in your views as you are looking deeply into that.

@volumio
Copy link
Owner

volumio commented Feb 15, 2017

Interesting and well done, really. I would like to give it a go.
One thing I see, is that we need to have a settings for dnsmasq to not forward to the UI, sometimes is needed but sometimes not. Do you think we can handle that?

@earlchew
Copy link
Contributor Author

A couple of other notes and observations:

  • I made an attempt to switch to manual operation for eth0, and not use netplugd at all, thinking that it would lead to a simpler configuration: ifconfig eth0 up && dhcpcd eth0. The dhcpcd implementation is capable of watching link availability itself. Unfortunately /etc/network/interfaces is manipulated by separately, and the GUI supports configuring a static address. Taking both of these together suggests that it is easier just to keep /etc/network/interfaces, netplugd, ifup and ifdown.
  • I switched the hotspot IPv4 address to 198.18.0.1/15 to avoid any chance of colliding with a valid IPv4 address on other network interfaces.
  • I considered using IPv4LL (ZeroConfig) on the hotspot. This would make the deployment simpler. I discarded the idea in an earlier implementation because at that time wireless.js was watching for TCP connections via the hotspot and I thought it would be a good idea to allow clients to make a TCP/IP connection as fast as possible. Since that time, I've switched wireless.js to watch to hotspot wifi associations, so maybe it's worth considering this again. I'm pretty sure any modern wifi client will switch to use IPv4LL if it cannot acquire an address via DHCP.

@earlchew
Copy link
Contributor Author

@volumio wrote:

we need to have a settings for dnsmasq to not forward to the UI, sometimes is needed but sometimes not.

I'm not sure I understand the issue you are referring to. I think you are describing the use of dnsmasq in the context of the hotspot. Would you provide a more detailed description?

@earlchew
Copy link
Contributor Author

@macmpi wrote:

I think it can probably bring a very mature, tested, and supported way to implement hotspot function within Volumio

I suppose it could, but I think it's worth also asking what is expected from the wifi hotspot. I see that create_ap brings a lot of functionality, but to do so it makes the implementation more complicated.
The current hotspot implementation depends on hostapd and dnsmasq, and client_ap requires these, additional dependencies, includes a reasonable size daemon, and perhaps will also add ruby.

My understanding is that the intent of the hotspot is to function as a backup means to connect to the volumio application in the absence of any other viable network (Ethernet, or wifi network). If that remains the case, maybe it is worth keeping the configuration of the hotspot as straightforward as possible. Providing a lot of functionality and complexity here might make it less reliable or harder to use as a means of connection of last resort.

Should there be a desire to use client_ap (or something similar) to provide additional functionality, I notice that there is systemd support. This can work quite well with the reworked implementation of wireless.js because wireless.js focuses on starting and stopping wireless-hotspot.service, and the bulk of the integration with client_ap can probably be focused there:

# wireless-hotspot.service
[Unit]
PartOf=wireless.service
Requires=client_ap.service
Before=client_ap.service
...

@macmpi
Copy link
Contributor

macmpi commented Feb 15, 2017

@earlchew wrote:

This can work quite well [...]

Yes, I was thinking along this line indeed: activation logic is most likely Volumio specific, but actual operation may be performed by such specialized "unit" under systemd.
The benefit I saw was all the capability checks and experience with some drivers limitations (i.e Raspi3) which may help debug nasty issues: we often need to understand why it does not work, given the number of possible dongles and drivers out-there...create_ap can provide good hints at what's going wrong.
I did not really experience dependency or footprint issues with it, and it seemed flexible enough to run it in predetermined (and limited albeit flexible) options (--no-virt in particular, probably no need for full NAT to avoid routing issues, bridged + Zeroconf probably enough, etc): just choose a reasonable options mix and issue a 1 single command within systemd service.

Anyway, your call obviously: glad this overall feature can be revisited and improved.

PS: It's important to keep in mind some devices (like piZero) have no default built-in network interfaces to avoid explicit dependencies & issues linked to eth0 existence, etc... (exemple here)

@earlchew
Copy link
Contributor Author

earlchew commented Feb 15, 2017

@macmpi I appreciate the feedback. You wrote:

It's important to keep in mind some devices (like piZero) have no default built-in network interfaces to avoid explicit dependencies & issues linked to eth0 existence, etc

For the present, I don't believe I have made that situation any worse, but would appreciate any comments you might have regarding the reworked implementation in this regard.

The netplug modifications include supporting the probe call for eth0, and presumably if that fails, netplug will abandon the interface.

The wireless modifications attempt to manage wlan0. The wireless.js implementation should loop endlessly on its normal operational cycle on a hopeless quest if indeed wlan0 is not present. I did not verify this yet, but I'll take the time to do so in my next round of testing.

If as you say create_ap does provide substantial benefit when it comes to support of disparate hardware, that might be reason enough to adopt it. For the present, I would like to avoid making this change set any larger, and see these changes reviewed, tested, and adopted to improve the wireless scenarios described above.

If it makes sense, the situation regarding create_ap revisited after that I think. Tracking this in #147

@macmpi
Copy link
Contributor

macmpi commented Feb 16, 2017

Appreciate your systematic approach.
Looking forward testing it.

@earlchew
Copy link
Contributor Author

earlchew commented Feb 21, 2017

@macmpi I saw your reference to volumio/Volumio2#791. In that issue the hotspot was started even though it was set to OFF. I believe these changes would address that first part:

Setting Hotspot Client
false disabled enabled
'off' disabled enabled
'on' enabled disabled
'auto' enabled enabled
true enabled enabled

The other question pertains to DNS. The only time a DNS server is run is when the hotspot is started. The only reason I can think of to run the DNS server is to make is easier for clients to connect to the Volumio device by name, rather than by IP address.

Apart from that, I can't think of any other reason to run the DNS server. Is there any expectation that the DNS server actually provide a fully-featured hotspot service to allow clients to connect to the internet via the volumio hotspot?

@macmpi
Copy link
Contributor

macmpi commented Feb 21, 2017

@earlchew
Thanks for your note. Actually my reference was more general through a discussion on another commit, in which I also linked that older issue as exemple. That older issue investigation ended-up as a mixed bag of unwanted hotspot startup as you enlighten (and will probably address), and a DNS name resolution issue that popped in specific ISP conditions (ISP blocking Google DNS that used to be forced as default in Volumio).

Anyway, back to your question, I concur that one would not expect to run DNS server on Volumio in general (or at all). To me (but other may think differently?) the hotspot is merely an accesspoint servicing primarily as a basic bridge: it should not hamper with any DHCP or DNS server, or router eventually existing on the local network. Just in case there is no DHCP & DNS service available (i_e. router-less point-to-point network), it may just provide Zeroconf-type addressing & discovery to ease initial setup from most common platforms.

Typical case illustrating this:

  • Operating Volumio connected to a wired LAN with router providing (say) wired-only internet connection (DHCP & DNS). By adding a wifi dongle to Volumio one would create a simple wifi bridge, the original router doing all the heavy-lifting as usual. Wireless (and wired) device could setup and manage Volumio.
  • Operating Volumio in non-wired environment: then Volumio will most likely be either:
    --- wifi client to Home wifi AP (providing internet connection)
    --- wifi client to Phone's Hotspot (sharing its 3G/4G internet connection)
    --- (and eventually) point-to-point hotspot to a local PC/Tablet it's sharing content via LAN protocols -> also used for init setup when Volumio is not wired

PS:
On your proposed settings table, how about merging false and "OFF" on one hand, and then true and "Auto" on the other hand. We really only need 3 settings not 5 I guess.

@macmpi
Copy link
Contributor

macmpi commented Feb 26, 2017

FYI, a seemingly RPi3 wifi specific issue about how Hostspot management (combined with likely driver issue) may adversely affect how RPi3 can connect AP in client mode.
Possibly yet-another-special-case to handle.

@earlchew
Copy link
Contributor Author

@macmpi wrote:

On your proposed settings table, how about merging false and "OFF" on one hand, and then true and "Auto" on the other hand. We really only need 3 settings not 5 I guess.

The only reason I have five possibilities, rather than three, is for the code to support existing configurations (True vs False) as well as new configurations (on, off, auto). If after merging there is no need to support legacy configurations, then as you point out we can go ahead and drop that part of the code.

@volumio
Copy link
Owner

volumio commented Feb 27, 2017

I would say on\off\auto is the ideal configuration

@earlchew
Copy link
Contributor Author

I wrote:

The wireless modifications attempt to manage wlan0. The wireless.js implementation should loop endlessly on its normal operational cycle on a hopeless quest if indeed wlan0 is not present. I did not verify this yet, but I'll take the time to do so in my next round of testing.

Today I had the chance to remove my Wifi dongle, and boot with Ethernet available. I confirmed that the revised wireless service loops at intervals trying to enable wlan0 for the hotspot, and then for the client. Neither of course succeeds because wlan is not available. I used top(1) to confirm that the cpu is idle (ie none of the wireless services are causing the cpu to loop hard).

@macmpi
Copy link
Contributor

macmpi commented Mar 11, 2017

Hi nice to hear we are probably getting closer to experiment your PR.

There's a use-case you may be interested in testing/reviewing, particularly if your have a Pi3 or PiZeroW (please check this Forum thread).

In current implementation, its seems handover between Hotspot mode and client mode causes issue with those devices. Such handovers situations happen often in current implementation as Hotspot is mostly in AUTO mode (hotspot start at boot, automatic hotspot connextion if home AP fails).

With some chipsets (like the one of Pi3 and PiZeroW), AP and client mode can happen simultaneously, but on the same wifi channel (HW limitation). Therefore once Volumio sets Hotspot (AP) mode, typically on channel#4 by default, then if client mode handover to Home Wifi is not properly handled (properly turning OFF AP mode first, or restarting wifi chipset), then client can only join Home Wifi on channel#4!...
This is not obvious to users of course, particularly as many Home Wifi are on automatic channel assignment, or unlikely to be set by chance on the same channel than Volumio Hotspot is...

I'm still trying to better characterize the issue and possible workaround in the mentioned Forum thread, but as I do not own those devices, it's quite difficult as I rely on impacted users availability for tests.
Hopefully you may be able to carefully test such cases if you own those devices, and particularly check if your new implementation does not get into such troublesome issue.

@earlchew
Copy link
Contributor Author

@macmpi It would be fairly straightforward to apply some kind of reset strategy prior to bringing up the wifi client, or even prior to bringing up the hotspot. Unfortunately, right now I do not have access to either of the newer RPi models mentioned, so I'm unable to either reproduce or test this failure scenario.

A fix will have to wait until better information is available.

@biva
Copy link

biva commented Mar 16, 2017

Great, I feel we're having a fix soon?
If you need to, I'm able to perform tests on RPI3

@macmpi
Copy link
Contributor

macmpi commented Mar 22, 2017

What's really critical to avoid former issue or similar, is to make sure Client & AP modes are mutually exclusive, and one is never launched before previous is properly shut-down. This is particularly true in AUTO mode (where each of the 2 modes should actually only follow each other).

Indeed, should Volumio really intend to run Client & AP modes simultaneously, then it should be done by setting-up one virtual network interface for each, which is a bit more complex to handle, and not necessarily properly supported by many wifi chipsets (create_ap to the rescue!): hence very few users would benefit from such rare use-case anyway, and many could complain about it not working...

Therefore, I do not think such simultaneous use is at-all important for Volumio (we are not making a full blown AP), and therefore we just need one standard physical interface BUT then we must also make sure simultaneous modes do never happen by "mistake",...or we may end-up in complex bugs to figure-out, linked to particular chipsets & drivers limitations (like for instance the "same channel limitation" on Pi3/PiZeroW chipset).

I guess your new code does keep those 2 modes mutually exlusives?
It seems original code had some mix probably around v2.041 (or bogus driver did not cleanup some context properly), but can't tell exactly what fixed it since...

@biva
Copy link

biva commented Mar 26, 2017

@macmpi wrote:

Client & AP modes are mutually exclusive, and one is never launched before previous is properly shut-down. [...] Therefore, I do not think such simultaneous use is at-all important for Volumio

Totally agree, stability is a way more important than this feature that wouldn't be very useful.

But I think that LAN should be working at any time, and should have the priority over Wifi as soon as it is connected. If I can't connect to my Volumio (2.118 / RPI3) over WiFi, I try to connect over LAN ; but sometimes it doesn't work (I don't see it in my LAN, so I have no choice but restart my RPI). Unfortunately, I wasn't able to reproduce it in a reproducible way.

@earlchew
Copy link
Contributor Author

@macmpi My proposed reimplementation of wireless.js will only run either the hotspot or the wireless client, but not both at the same time.

@macmpi
Copy link
Contributor

macmpi commented Mar 30, 2017

Great thanks.
Hope you'll manage factor changes in more incremental ways, if at-all possible.

@biva
Copy link

biva commented May 8, 2017

Hello, I'm on 2.163 and I didn't see any change regarding wifi stability in the changelog. Do you still plan to integrate your improvements? I'm available to test: good luck!

@earlchew
Copy link
Contributor Author

earlchew commented May 9, 2017

Thanks for the reminder and for your interest. With other matters to take care of, and lack of HW, I haven't put any more time into this recently. I'll find time to make some progress shortly.

@biva
Copy link

biva commented May 9, 2017

Great, thanks a lot! (this is quite annoying on RPI 3...)

@biva
Copy link

biva commented Jul 13, 2017

Hello @volumio
Is the issue solved? I'm still having issues with wifi on RPI3 with 2.201
And the "wifireconnect" plugin doesn't work on my config (see balbuze/volumio-plugins#64)
@earlchew : any news?
Thank you!

@malcolmjlear
Copy link

This was a big issue with me due to my need for hotspot only (car media player). However the boot time on 2.201 has now significantly improved to 55 seconds on a 2B which is quite acceptable. This issue still shows itself on an older B+ which boots slower at 2 and a half minutes whilst it tries connecting every which way but hotspot.
Hopefully this will move forward as earlchew's solution is very neat.

@biva
Copy link

biva commented Sep 6, 2017

Hello @volumio
I'm still having stability issues with wifi: version 2.246 on RPI3
Are you planning any improvement? Or include an improved version of wifireconnect plugin?
Thank you!

@biva
Copy link

biva commented Sep 6, 2017

@volumio For the record, I dug into this, and I think the problem is that RPI3 (maybe other systems?) needs to see the full path for cron jobs (for example /sbin/ip instead of ip.
See balbuze/volumio-plugins#64 (comment)

@volumio
Copy link
Owner

volumio commented Sep 6, 2017

We have planned a rework: if wifi is dropped there will be a setting for it to reconnect automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants