Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What conditions will trigger a software restart? #2678

Closed
AndrewHoover opened this issue May 9, 2018 · 19 comments
Closed

What conditions will trigger a software restart? #2678

AndrewHoover opened this issue May 9, 2018 · 19 comments

Comments

@AndrewHoover
Copy link

This is not so much an issue but a question.
What are the failures/conditions under which the software will perform a restart? I ask because I am trying to troubleshoot an issue and I need to narrow down the possibilities.

I have a sonoff basic and I am using version 5.13.1.

@Frogmore42
Copy link
Contributor

There are quite a few configuration changes that will cause a restart. But those are generally user initiated.

What is the issue you are trying to troubleshoot?

Depending on how you have it configured, not being able to connect to WiFi or MQTT can cause it too.

@AndrewHoover
Copy link
Author

I'm using a sonoff basic in a 3D printed wall switch case and while it is fully seated in the case on my desk, it restarts about once every 30 seconds to 2 minutes. If I pull the board up slightly off of the bottom of the box so that the contacts aren't near the plastic, the restart interval will go up to once every hour. If I take it out of the case entirely, it will run all day without issue.

All the while, I see MQTT errors like the following regularly, regardless of how often it actually restarts. The reported WIFI strength ranges from 75-100 seemingly independent if it is in the box or not. The fluctuation looks more directly tied to me being between the device and the AP than if it is in the box.
1525911773: Client DVES_2DEC07 has exceeded timeout, disconnecting.
1525911773: Socket error on client DVES_2DEC07, disconnecting.

@Frogmore42
Copy link
Contributor

I would suggest including the information requested for all issues. In particular status 0.

I will bet your device/environment is one that doesn't work well with core 2.4.x and that is at least part of the problem.

@AndrewHoover
Copy link
Author

I have several sonoff basic boards in the standard cases working for 20+ days without an issue plus I have 3 APs.
One of the APs is in my lab 15 ft from this device. I'm pretty sure it isn't environmental.

The status 0 details:
{"Status":{"Module":1,"FriendlyName":["Sonoff"],"Topic":"wallswitch2","ButtonTopic":"0","Power":0,"PowerOnState":3,"LedState":1,"SaveData":1,"SaveState":1,"ButtonRetain":0,"PowerRetain":1},"StatusPRM":{"Baudrate":115200,"GroupTopic":"sonoffs","OtaUrl":"http://sonoff.maddox.co.uk/tasmota/sonoff.ino.bin","RestartReason":"Software/System restart","Uptime":"0T00:00:09","StartupUTC":"","Sleep":0,"BootCount":53,"SaveCount":303,"SaveAddress":"F5000"},"StatusFWR":{"Version":"5.13.1","BuildDateTime":"2018-05-07T16:57:10","Boot":31,"Core":"2_4_1","SDK":"2.2.1(cfd48f3)"},"StatusLOG":{"SerialLog":0,"WebLog":2,"SysLog":0,"LogHost":"","LogPort":514,"SSId":["Sanctuary_io","Sanctuary"],"TelePeriod":300,"SetOption":["00008029","55818000"]},"StatusMEM":{"ProgramSize":534,"Free":468,"Heap":15,"ProgramFlashSize":1024,"FlashSize":1024,"FlashMode":3},"StatusNET":{"Hostname":"wallswitch2-5204","IPAddress":"192.168.1.202","Gateway":"192.168.1.1","Subnetmask":"255.255.255.0","DNSServer":"192.168.1.1","Mac":"DC:4F:22:29:74:54","Webserver":2,"WifiConfig":3},"StatusMQT":{"MqttHost":"192.168.1.16","MqttPort":1883,"MqttClientMask":"DVES_%06X","MqttClient":"DVES_297454","MqttType":1,"MAX_PACKET_SIZE":1000,"KEEPALIVE":15},"StatusTIM":{"UTC":"Thu Jan 01 00:00:16 1970","Local":"Thu Jan 01 00:00:16 1970","StartDST":"Thu Jan 01 00:00:00 1970","EndDST":"Thu Jan 01 00:00:00 1970","Timezone":-6,"Sunrise":"07:43","Sunset":"16:03"},"StatusSNS":{"Time":"1970-01-01T00:00:16"},"StatusSTS":{"Time":"1970-01-01T00:00:16","Uptime":"0T00:00:09","Vcc":3.522,"POWER":"OFF","Wifi":{"AP":1,"SSId":"Sanctuary_io","RSSI":72,"APMac":"56:D9:E7:49:31:30"}}}

@Frogmore42
Copy link
Contributor

You might be right, but as predicted, you are using core 2.4.x. Some people/devices have had good luck with that, others not so much.

You might try compiling with core 2.3.0 and try it on the device that is most problematic. It will either make it better or it won't. I don't believe it will make it worse. Several people have said the same things as you have and then tried 2.3.0 and discovered that their devices were much more reliable on 2.3.0

@Frogmore42
Copy link
Contributor

Note that your time is not set, which means your NTP servers are not set correctly or the device does not have access to the internet.

@Jason2866
Copy link
Collaborator

@Frogmore42 Maybe the device fails under some circumstances when no correct time is set...

@AndrewHoover
Copy link
Author

I'll give core 2.3.0 a shot. I didn't get what you had meant earlier, thanks for the persistence in the suggestion.

As for the time, in the status details above, the device had just reset and it has been consistently taking between 15 and 30 seconds for the time to update. After I pulled the status details, the time was set successfully.

@Frogmore42
Copy link
Contributor

@Jason2866 I have a device that has been running for a month or so with the wrong time, so that doesn't seem to cause a reset to occur.

@AndrewHoover I have a NodeMCU that is running core 2.4.0. It doesn't stay up for more than a day. I had assumed it was because it was somewhat defective, but maybe it is because of the 2.4.0 core. I will have to try it with 2.3.0 and see if that improves things or not.

@Frogmore42
Copy link
Contributor

@AndrewHoover 15 to 30s to get the time is a little slow, but not unreasonable, mine usually get it within 10s

@Jason2866
Copy link
Collaborator

15 to 30 sec is long. Do you have a really slow internt connection?
Or is your wifi laggy? RSSI is good with 72
Just tried. After restart i get after 6 seconds the correct time

@Jason2866
Copy link
Collaborator

Jason2866 commented May 10, 2018

@Frogmore42 Yes that was old build. In later releases Theo added the timer function with sunrise and sunset function. Maybe this functions produces errors with wrong time....
I would squezze out any possibilty that could produce errors.
But over all i think problem will be gone if using 2.3.0!

@AndrewHoover
Copy link
Author

Ok, I am simply amazed at the difference between 2.3 and 2.4.1!
I reflashed the device and the only changes I made was to downgrade to 2.3, add the appropriate .ld and boards.txt files and now the device is lightning fast.

  • Before it would take a second or 2 for individual webpages to load from the device, now they are almost instant.
  • Before MQTT command responses would vary from 1-6 seconds, now they are nearly realtime.
  • You are aware of how long the NTP data would take to refresh, now it is updated in the first 10 seconds of uptime.
  • I am seeing no socket or timeout errors in the logs

I fully buttoned up the board in the wall switch box as before and, while it hasn't been online for long, it is definitely performing far better than it was on 2.4.1. Fully enclosed I have RSSI values of 90+

Thank you very much for the recommendation to change board versions. This is incredible!
Do you know why there is so much of a difference?

@Frogmore42
Copy link
Contributor

I believe the newer versions use lwip2, which is a networking library. It is supposed to be better, and I am sure it is in some ways. But, it seems not so much in others.

@AndrewHoover
Copy link
Author

I see.
Ok, so I have two of these and they've both been running 8 hours straight without an issue. I consider MY issue closed. It sounds like there may be an underlying issue that may not be, so while I'm fine with closing this thread, I wanted to see if you all had a compelling reason to leave it open?

@Frogmore42
Copy link
Contributor

I'll let @arendst answer that. I am pretty sure he knows about the issue, which is with the underlying esp8266 Arduino code and not Tasmota.

@Frogmore42
Copy link
Contributor

Might be related to one or more of the many issues like these: esp8266/Arduino#4689
esp8266/Arduino#4497

@ascillato
Copy link
Contributor

Hi

If your issue is solved, please close it. Thanks 👍

@AndrewHoover
Copy link
Author

Yes!
My issue was resolved by downgrading to 2.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants