Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

espota.py stalls on some OTA updates leaving Python running in a tight infinite loop #4746

Closed
Paraphraser opened this issue May 20, 2018 · 9 comments
Assignees
Labels
component: OTA waiting for feedback Waiting on additional info. If it's not received, the issue may be closed.

Comments

@Paraphraser
Copy link
Contributor

Basic Infos

  • [X ] This issue complies with the issue POLICY doc.
  • [X ] I have read the documentation at readthedocs and the issue is not addressed there.
  • [?] I have tested that the issue is present in current master branch (aka latest git).
  • [X ] I have searched the issue tracker for a similar issue.
  • [n/a ] If there is a stack dump, I have decoded it.
  • [X ] I have filled out all fields below.

Platform

  • Hardware: [WeMos D1 R2]
  • Core Version: [arduino.esp8266.com/stable/package_esp8266com_index.json]
  • Development Env: [Arduino IDE 1.8.5]
  • Operating System: [MacOS Sierra 10.12.6]

Settings in IDE

  • Module: [WeMos D1 R2 & Mini]
  • Flash Mode: [no such entry in IDE that I can see]
  • Flash Size: [4M (1M SPIFFS)]
  • lwip Variant: [v2 Lower Memory]
  • Reset Method: [no such entry in IDE that I can see]
  • Flash Frequency: [no such entry in IDE that I can see]
  • CPU Frequency: [80Mhz]
  • Upload Using: [OTA]
  • Upload Speed: [n/a]

Problem Description

I am using OTA updating. On most attempts (>95%) the sketch compiles and uploads without incident. On the remaining 5% of occasions:

  1. I click the "Upload" button in the IDE.
  2. The "Upload" button changes to a yellow surround.
  3. The sketch compiles.
  4. The status row changes to "Uploading...".
  5. The row of dots marches much further to the right than expected before finally stopping.
  6. The status row remains stuck at "Uploading..." indefinitely.
  7. The "Upload" button remains yellow indefinitely.
  8. I realise that the upload has stalled.
  9. I re-click the "Upload" button (which responds, even though it is still yellow).
  10. The sketch compiles.
  11. The status row changes to "Uploading...".
  12. The row of dots marches the expected distance.
  13. The status row changes to "Done uploading".
  14. The "Upload" button changes to a light green surround.

Thus far, I have never had a second attempt (ie starting at step 9) stall, requiring a third attempt.

I could live with the occasional stall but one side-effect is that Python never quits. It shows up in the Mac OS "Activity Monitor" as being responsible for 99.9% of energy consumption, which suggests it is stuck in a tight infinite loop. The high energy consumption is why I noticed the problem because the laptop I was working on became unusually hot. The only way to recover from this is to kill Python (or restart). Quitting the IDE does not cause Python to quit.

This side effect of high energy consumption by Python is my main motivation in reporting the problem, so that anyone following in my footsteps will know that finding and clobbering Python is an effective workaround.

I am not including a test sketch because OTA updating works 95% of the time, suggesting that OTA updating is set up correctly in the code running on the WeMos board. I currently have one WeMos D1 R2s and one WeMos D1 R2 Mini Pro, running totally different sketches, both being updated OTA. The problem is evident on both targets but does not "follow the board" and does not "follow the sketch". The common element is the IDE which suggests that this problem is more likely to be with either the IDE or espota.py.

The IDE does not produce any error messages associated with this problem. The closest I can get is some "ps" output for the stalled invocation of Python:

$ ps -ax|grep -i python
2040 ?? 21:59.05 python /Users/home/Library/Arduino15/packages/esp8266/hardware/esp8266/2.4.1/tools/espota.py -i 192.168.1.152 -p 8266 --auth= -f /var/folders/py/j18yts_n3rzb3hmxyz41bghm0000gn/T/arduino_build_945086/sketch_19_hikingMeterLogger_WeMosD1.ino.bin

@SWR-DMaster
Copy link

Hi Paraphraser,

I, too, am having this issue; though at 15% cpu per instance.
I found this post with a user having 25%, same issue. 4966

Using an EPS-201 off e-bay here but hangs most of the time after an OTA.
I am loading to the ESP via OTA while the ESP is an AP.
Have had concerns the ESP rebooting and losing the AP was causing the issue but now know better.

Hopefully there will be a solution soon.

Thanks,
SWR_DMaster

@glyndon
Copy link

glyndon commented Aug 10, 2018

Same issue here. Happens only about 30-50% of the time.
Easy to recover from each time, just issue the following at a shell prompt.
pkill -f ota
Presuming of course that you don't have another process running with 'ota' in its name. If so, just look at your running process list and come up with a more selective grep string.
When you kill the looping OTA process, the IDE will suddenly report an error having occurred, since the process didn't exit cleanly, but you can ignore the error and proceed as if nothing's wrong.

@dragondaud
Copy link
Contributor

dragondaud commented Aug 28, 2018

Also seeing this issue, at least 50% of the time when updating over OTA. I'm on Win10, with Python 2.7.14 and Arduino 1.8.6 using github version of esp8266 arduino. Seen on both NodeMCU and D1 mini lite.

@d-a-v
Copy link
Collaborator

d-a-v commented Sep 13, 2018

Have you tried again with version 2.4.2 ?
Could you also try with #5135 ?

@d-a-v d-a-v self-assigned this Sep 13, 2018
@d-a-v d-a-v added waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. component: OTA labels Sep 13, 2018
@Paraphraser
Copy link
Contributor Author

I am using version 2.4.2 at the moment. Over the last week I have pushed OTA updates to ESP8266 devices several times and have not encountered the stall condition. Whether that means 2.4.2 has "cured" the problem OR the problem is still there and I was just lucky to avoid it, is unknown.

On the second part of your reply, I must confess ignorance. I do not know what that means. I simply have the requisite "Additional Board Manager" URL for the ESP8266 in my preferences pane and I just accept updates as and when the IDE tells me there is something new. At the moment, the IDE is telling me everything is up-to-date.

I'm willing to give "#5135" a whirl but please point me at some step-by-step instructions that cover (a) how to bring that change into my system and (b) how to back it out again.

@d-a-v
Copy link
Collaborator

d-a-v commented Sep 13, 2018

@Paraphraser right, thanks for the feedback.

If the issue happens again and instead of trying #5135, find the file WiFiClient.cpp on your computer, and near line 278, do this modification then rebuild, reflash and report back whether it has improved.
You can back it out by just removing the line.

@d-a-v
Copy link
Collaborator

d-a-v commented Oct 2, 2018

@SWR-DMaster @glyndon @dragondaud Can you please try with latest git version ?

@washcroft
Copy link

washcroft commented Dec 8, 2018

See #4283 (comment)

@earlephilhower
Copy link
Collaborator

The original filer seems to have reported it fixed, and there were later issues and PRs that optimized this. Closing for now. If there's a problem 2.5.0 please do file a new one with template.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: OTA waiting for feedback Waiting on additional info. If it's not received, the issue may be closed.
Projects
None yet
Development

No branches or pull requests

7 participants