Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EUS - HTTP stops responding while station is connecting #1180

Closed
jfollas opened this issue Mar 21, 2016 · 6 comments
Closed

EUS - HTTP stops responding while station is connecting #1180

jfollas opened this issue Mar 21, 2016 · 6 comments

Comments

@jfollas
Copy link
Contributor

jfollas commented Mar 21, 2016

While @robertfoss is working on #1138, I thought I'd mention this issue that has been plaguing us for the past few weeks in case it leads to something that can be corrected in that PR:

I have a private fork of NodeMCU that also has a fork of the End User Setup module (I kept the original in the source so that it would keep in sync with the upstream repo, and just copied the code into a new module). We use the modified version of EUS on our device because there's a few additional endpoints that are implemented, and a couple of other changes unique to our needs.

One of the endpoints is similar to the /status endpoint in that a client sends a request on a regular basis (2 second interval) and gets a JSON response back with information. With debugging turned on, we see this 2-second request working regularly right until the time that credentials are provided.

The code executes wifi_station_set_config() just fine (inside do_station_cfg()), and the config gets saved to the flash. But, as soon as the subsequent wifi_station_connect() executes, no more HTTP requests are accepted by the ESP until at least the wifi status != 1 (connecting). It doesn't matter what endpoint the browser tries, the ESP just quits answering TCP requests while the station is connecting and the web client (XHR) times out.

This wouldn't be a big deal to just wait out the Connecting status until there's either a success or failure, since it's only 5-10 seconds. But more often than not, it doesn't resume processing HTTP requests even after the station connects (even using manual mode to control the AP and stop()). So the web page never gets the final status that was requested.

This symptom of requests not being served sounds familiar: does anyone remember similar behavior, and what may have led up to it?

@jfollas
Copy link
Contributor Author

jfollas commented Mar 22, 2016

I took Robert's latest enduser_setup.c and reapplied all of my additions to it. The scenario is still there where requests are not serviced while the station is connecting (so that's probably normal), and so far, I'm getting requests served after the connection is established.

Edit: Spoke too soon. Still seeing the inconsistent behavior where requests aren't always being served after the station connects. So, I'm still troubleshooting this one to see if there's an actionable issue for the End User Setup module. Else I'll close this issue.

@jmattsson
Copy link
Member

I can explain this behaviour. It's not that the http server stops responding as such, it's that the entire AP gets channel-shifted to whichever network's channel the station side is trying to join. That's largely a hardware limitation of trying to run both a station and an AP on the one radio. Things get even wackier if you try to join a non-existent network, whereby the ESP keeps switching channel searching for the network... Normally your client (phone, laptop etc) catches on to the change of channel after ~10sec or so (in my experience), but if the ESP is channel-hopping it has a much harder time of keeping up and staying in touch with the http server.

I don't have a solution to this problem. I'm not sure there is one. It'd be great if someone did find one though.

@jfollas
Copy link
Contributor Author

jfollas commented Mar 22, 2016

Wow, you don't know how many sleepless nights it's been trying to come up with an explanation for why it was happening. And the culprit turns out to be so simple (and at a layer lower than where I was assuming). Thanks Johny!

Now that you say that, I see it's clearly documented in the [2C] Programming Guide 1.5.1 page 200.

My brainstorm for ways to workaround:

  • Do a scan ahead of starting the SoftAP and set the SoftAP's channel to the same as the strongest AP found by the scan. Assumption here is that the user will likely be picking that AP to join as part of EUS, so there will be no channel swapping.
  • In EUS, after a successful Station event, query the channel. If it's different than the one that was used to start the AP, then stop the Station and switch back to the original AP channel in order to continue serving requests for the remainder of the 10 seconds (for auto shutdown). Hold off on calling the success callback until the end of the 10 seconds, and then start the Station back up as the AP is shut down.

@jmattsson
Copy link
Member

Interesting ideas. The one potential gotcha I can see is that in our experience the SDK sometimes doesn't honour the channel setting for the AP, even if it's not running a station. I'd be most interested in your results if you try it out though!

@robertfoss
Copy link
Contributor

@jfollas I like your suggestions.

  1. Would not always work, and cost us X startup latency.
  2. Would would always work, and cost us Y connection latency.

@jfollas
Copy link
Contributor Author

jfollas commented Mar 22, 2016

I'm actually working on implementing both in my private fork. Finding out that phones (Android in particular) seem to drop the Wi-Fi during that short 5-second duration while the Station changes the channel in the second option. Still testing, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants