Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Values in Home Assistant becomes unavalible for a split second #13

Closed
Belaial opened this issue Jul 9, 2023 · 35 comments
Closed

Values in Home Assistant becomes unavalible for a split second #13

Belaial opened this issue Jul 9, 2023 · 35 comments

Comments

@Belaial
Copy link
Contributor

Belaial commented Jul 9, 2023

Hi,

I have only spent about 1,5 days running this project on my Solis stick now but one of my findings so far is that about every 15 minutes or so the readings in Home Assistant for a split second becomes Unavailable and then goes back just as fast, I have tried to get a recording of this to show for demonstration purposes but no luck yet with that.

This seems to happen on most if not all the values I read out but the value I know it happens for and that causes issues for me is Active Power, the reason I know this is that I spent hours trying to figure out why one of my automation's stopped working when I changed from my Pi running https://github.com/incub77/solis2mqtt to reading stuff out with the Solis stick, the automation is simple, if Active Power is above 750W for 20 minutes it should turn on a power outlet, I noticed this had not happened for many hours of Active Power being greater than 750W, when I was looking at this I noticed with my eyes how Active Power all of a sudden was Unavailable and then back to whatever the actual power was, since this seems to happen around every ~15 minutes or so my automation of 20 minutes won't trigger since the value has not been above 750 for 20 minutes with these small Unavailable breaks in between, I could just lower the timer on my automation and be done with it but I figured it's best to report it if it might be something that could be solved.

I am 99% sure this is not related to my reportings regarding WiFi here #12 since I see the stick online via ping at the same time as values becomes Unavailable.

The only "proof" I can show of this issue now is based on the "Logbook" in Home Assistant, however for some reason the Active Power does not get logged there so I will show history of the value for Country Code instead

image

Also here it can be seen as small gaps in the timeline

image

Now this is for Country Code as and example, even more values does this at the same time

image

And then they return within the same second or 1-2 seconds after.

Any ideas what could be causing this?
I can just lower the timers on my automation but hopefully we can figure out the issue and resolve it.

@hn
Copy link
Owner

hn commented Jul 9, 2023

Try monitoring the stick's verbose logs, e.g. python3 -m esphome logs solis-esphome-emw3080.yaml --device <ip address>

@Belaial
Copy link
Contributor Author

Belaial commented Jul 9, 2023

Ok, will do!

I have the command running and logging now, will report back if I find anything.

@Belaial
Copy link
Contributor Author

Belaial commented Jul 9, 2023

Ok, that was a quick and lucky troubleshooting!

Unfortunately it seems my "99% sure it's not WiFi" above turned out to be wrong I guess.... I based this on Home Assistant not detecting the stick as offline via ping but that does not ping often enough to detect this short connection timeout.

image

Those 5 lost pings is enough for Home Assistant to declare the device (or values) Unavailable.... I see nothing odd in the logs from the stick during the same time frame, normal reporting with normal values included and then a warning that the connection was dropped, obviously since the stick looses connection for a short period of time.

So I guess WiFi needs to be more stable but that was not within the scope of this project as you mentioned here #12 already, any idea why this might happens roughly every 15 minutes?

Only thing that comes to my mind is this comment libretiny-eu/libretiny#113 (comment) where it's mentioned that there is a 'reboot_timeout=15min' ("The amount of time to wait before rebooting when no WiFi connection exists") could that timer somehow be related to a quick 5 second drop in connection? I guess the stick does not reboot and reconnect to WiFi in 5 seconds?

Does your stick loose a few pings every ~15 minutes @hn ?
I will leave Pingplotter running towards the stick while I'm away, will report back in a few hours with a more detailed report how often it looses it's connection.

@Belaial
Copy link
Contributor Author

Belaial commented Jul 9, 2023

Here are roughly 6 hours of Pingplotter towards the Solis stick, I would say it looses connection for a short period of time every 15 minutes, almost like clockwork.

(click for larger image)

Red lines = lost pings

Pingplotter_Solis_Stick

I will migrate back to my solution with the Pi later tonight since I want more stability in monitoring my inverter. I will still keep an close eye on https://github.com/hn/ginlong-solis and https://github.com/kuba2k2/libretiny (since I really want to use this project with my Solis stick, otherwise I have no other use for it really) in case there are any updates to the WiFi "stability" and these connection drops every 15 minutes, would be interesting to know if anyone else sees them or if it's some hardware issue with my stick.

@Belaial
Copy link
Contributor Author

Belaial commented Jul 10, 2023

Today I am dumping more logs but this time over serial so I can see what happens when the stick goes missing, so far I only have output what happens when stick looses WiFi and then stays disconnect for X amount of time, so this is not one of those short 5 second disconnects, will report back once I have that captured as well.

This happened 15 minutes after last disconnect so the "timer" from previous post is the same, the stick is working normally reporting "WiFi Signal", it gets no other data obviously since it's not connected to the inverter, this goes on for 15 minutes and then all of a sudden it just reboots I guess and then instantly tries to reconnect to WiFi but fails, over and over and over again, I just included 2 attempts but the last part just repeats until it manages to connect, anyway here is the output from serial

[D][sensor:093]: 'WiFi Signal': Sending state -50.00000 dBm with 0 decimals of accuracy
[D][modbus_controller:029]: Modbus command to device=1 register=0xBBC countdown=0 no response received - removed from send queue
[D][modbus_controller:029]: Modbus command to device=1 register=0xBCD countdown=0 no response received - removed from send queue
[D][modbus_controller:029]: Modbus command to device=1 register=0xBD9 countdown=0 no response received - removed from send queue
[D][modbus_controller:029]: Modbus command to device=1 register=0xBE1 countdown=0 no response received - removed from send queue
[D][modbus_controller:029]: Modbus command to device=1 register=0xBFF countdown=0 no response received - removed from send queue
[D][sensor:093]: 'WiFi Signal': Sending state -48.00000 dBm with 0 decimals of accuracy
[D][modbus_controller:029]: Modbus command to device=1 register=0xBBC countdown=0 no response received - removed from send queue
[D][modbus_controller:029]: Modbus command to device=1 register=0xBCD countdown=0 no response received - removed from send queue
[D][modbus_controller:029]: Modbus command to device=1 register=0xBD9 countdown=0 no response received - removed from send queue
[D][modbus_controller:029]: Modbus command to device=1 register=0xBE1 countdown=0 no response received - removed from send queue
[D][modbus_controller:029]: Modbus command to device=1 register=0xBFF countdown=0 no response received - removed from send queue
[D][sensor:093]: 'WiFi Signal': Sending state -48.00000 dBm with 0 decimals of accuracy
[D][modbus_controller:029]: Modbus command to device=1 register=0xBBC countdown=0 no response received - removed from send queue
[D][modbus_controller:029]: Modbus command to device=1 register=0xBCD countdown=0 no response received - removed from send queue
[D][modbus_controller:029]: Modbus command to device=1 register=0xBD9 countdown=0 no response received - removed from send queue
[D][modbus_controller:029]: Modbus command to device=1 register=0xBE1 countdown=0 no response received - removed from send queue
[D][modbus_controller:029]: Modbus command to device=1 register=0xBFF countdown=0 no response received - removed from send queue
[D][sensor:093]: 'WiFi Signal': Sending state -47.00000 dBm with 0 decimals of accuracy
ROM:[V0.1]
FLASHRATE:4
BOOT TYPE:0 XTAL:40000000
IMG1 DATA[1128:10002000]
IMG1 ENTRY[800053d:100021ef]
IMG1 ENTER
CHIPID[000000ff]
read_mode idx:2, flash_speed idx:2
calibration_result:[1:19:13][3:15]
calibration_result:[2:21:11][1:15]
calibration_result:[3:0:0][ff:ff]
calibration_ok:[2:21:11]
FLASH CALIB[NEW OK]
OTA2 ADDR[8100000]
OTAx SELE[ffffffff]
OTA1 USE
IMG2 DATA[0x80b1700:4496:0x10005000]
IMG2 SIGN[RTKWin(10005008)]
IMG2 ENTRY[0x10005000:0x80403a9]
BOOT_FLASH_RDP RDP enable
RDP bin decryption Failed!
checksum_ipsec = 0x4ddfb232, checksum_rdp_flash = 0x44a444f0
System_Init1
System_Init2
I [ 0.000] LibreTiny v1.1.0+sha.f8876bb on generic-rtl8710bx-4mb-980k, compiled at Jul 7 2023 22:15:20, GCC 10.3.1 (-Os)
[I][logger:302]: Log initialized
[C][ota:477]: There have been 0 suspected unsuccessful boot attempts.
[D][lt.preferences:104]: Saving 1 preferences to flash...
[D][lt.preferences:132]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed
[I][app:029]: Running through setup()...
[C][uart.lt:049]: Setting up UART...
[C][switch.gpio:011]: Setting up GPIO Switch 'led_orange_com'...
[D][switch:016]: 'led_orange_com' Turning OFF.
[D][switch:055]: 'led_orange_com': Sending state OFF
[D][switch:016]: 'led_orange_com' Turning OFF.
[C][switch.gpio:011]: Setting up GPIO Switch 'led_green_net'...
[D][switch:016]: 'led_green_net' Turning OFF.
[D][switch:055]: 'led_green_net': Sending state OFF
[D][switch:016]: 'led_green_net' Turning OFF.
[D][binary_sensor:034]: 'button_reset': Sending initial state OFF
[C][wifi:038]: Setting up WiFi...
[C][wifi:048]: Starting WiFi...
[C][wifi:049]: Local MAC: 80:A0:36:A7:F7:1F
interface 0 is initialized
interface 1 is initialized

Initializing WIFI ...
WIFI initialized
[D][wifi:425]: Starting scan...
[D][wifi:440]: Found networks:
[I][wifi:483]: - 'SSID1_2.4G' (3A:D4:37:81:C2:0F) ▂▄▆█
[D][wifi:485]: Channel: 1
[D][wifi:486]: RSSI: -47 dB
[I][wifi:483]: - 'SSID1_2.4G' (BA:03:A6:BF:C2:EF) ▂▄▆█
[D][wifi:485]: Channel: 11
[D][wifi:486]: RSSI: -62 dB
[I][wifi:483]: - 'SSID1_2.4G' (7A:D4:37:71:B0:DF) ▂▄▆█
[D][wifi:485]: Channel: 6
[D][wifi:486]: RSSI: -67 dB
[D][wifi:488]: - 'SSID2' (3A:D4:37:81:C2:01) ▂▄▆█
[D][wifi:488]: - 'SSID2' (BA:03:A6:BF:C2:E1) ▂▄▆█
[D][wifi:488]: - 'SSID2' (7A:D4:37:71:B0:D1) ▂▄▆█
[I][wifi:274]: WiFi Connecting to 'SSID1_2.4G'...
E [ 3.015] WIFI: Connection failed; ret=-1
[W][wifi_lt:119]: esp_wifi_connect failed! 4
[E][wifi:320]: wifi_sta_connect_ failed!
[W][wifi_lt:288]: Event: Disconnected ssid='' bssid=00:00:00:00:00:00 reason='Unspecified'
[D][wifi:425]: Starting scan...
[D][wifi:440]: Found networks:
[D][wifi:442]: No network found!
[D][wifi:425]: Starting scan...
[D][wifi:440]: Found networks:
[I][wifi:483]: - 'SSID1_2.4G' (BA:03:A6:BF:C2:EF) ▂▄▆█
[D][wifi:485]: Channel: 11
[D][wifi:486]: RSSI: -64 dB
[I][wifi:483]: - 'SSID1_2.4G' (7A:D4:37:71:B0:DF) ▂▄▆█
[D][wifi:485]: Channel: 6
[D][wifi:486]: RSSI: -67 dB
[I][wifi:483]: - 'SSID1_2.4G' (3A:D4:37:81:C2:0F) ▂▄▆█
[D][wifi:485]: Channel: 1
[D][wifi:486]: RSSI: -50 dB
[D][wifi:488]: - 'SSID2' (3A:D4:37:81:C2:01) ▂▄▆█
[D][wifi:488]: - 'SSID2' (BA:03:A6:BF:C2:E1) ▂▄▆█
[D][wifi:488]: - 'SSID2' (7A:D4:37:71:B0:D1) ▂▄▆█
[I][wifi:274]: WiFi Connecting to 'SSID1_2.4G'...
E [ 14.372] WIFI: Connection failed; ret=-1
[W][wifi_lt:119]: esp_wifi_connect failed! 4
[E][wifi:320]: wifi_sta_connect_ failed!

@Belaial
Copy link
Contributor Author

Belaial commented Jul 10, 2023

And here is the serial output when the stick just drops 5 pings and returns, included some more logs so will use Pastebin this time

https://pastebin.com/jkTkFjfS

As I see it the same thing happens as when the stick is gone for longer periods of time, the only difference is that it actually manages to connect to WiFi on first attempt.

So there are 2 problems as I see it

  1. Why does it reboot every 15 minutes? This must be some timer causing this since it's exactly 15 minutes every time, does not matter if it has connected to WiFi or not, it just reboots either way.

  2. Why does it connect to WiFi on first attempt some of the time and not all of the time, seems to be a "lottery" if it succeeds or not. (mentioned / discussed here Stick won't connect back to WiFi #12 (comment) )

Really curious if this is only my stick or if someone else has this issue, quick way to find out is to just ping the stick and see if it drops out every 15 minutes, hopefully someone else can try this to see if they have the same issue.

@hn
Copy link
Owner

hn commented Jul 10, 2023

My S3 stick has stable WiFi (once initially connected) from morning to evening (power-on ... off).

Random thoughts:

  • Try playing with WiFi reboot_timeout and API reboot_timeout, maybe additional timer settings exists somwhere.
  • Some other device in your network or your neighbour might interfere with your WiFi every 15 min
  • The RealTek WiFi SDK or LibreTiny implementation may be broken some way

@Belaial
Copy link
Contributor Author

Belaial commented Jul 10, 2023

I will look into that

  • Some other device in your network or your neighbour might interfere with your WiFi every 15 min

I very much doubt it, I have stable WiFi on all other devices in my house that use same SSID and same APs, also this 15 minute reboot timer occurs even in the "xmodem transfer (UART boot mode)" on my stick and in that made there is no WiFi, also the timer is just to perfect, if this was something external I think I would see some variation, but it's spot on, I can count 900 seconds and it will reboot precisely after 15 minutes from first ping.

  • The RealTek WiFi SDK or LibreTiny implementation may be broken some way

Might be, I'm far to casual to be able to know or determine that.

I will look into our first point regarding WiFi reboot_timeout and API reboot_timeout, hopefully I will find something out from that.

@Belaial
Copy link
Contributor Author

Belaial commented Jul 10, 2023

Sorry for what probably is a super basic question....

python3 -m esphome upload solis-esphome-emw3080.yaml --device <ipaddress>

Does not work for me

[D][ota:147]: Starting OTA Update from 192.168.0.197...
[W][ota:240]: Auth failed! Passwords do not match!
[W][ota:362]: Backend error code: 0000

I have a OTA password in the secrets file as instructed when replacing the main application, I can't seem to figure out with Google how to use that password when running the update via OTA 😞

I re-complied the firmware on the same computer and all, just edited some timeout's in the solis-esphome-emw3080.yaml so figured the OTA password would be included somehow, but I guess it might needs to be passed along in the update string somehow?

@hn
Copy link
Owner

hn commented Jul 10, 2023

For the OTA password problem, please see #7 (comment) and libretiny-eu/libretiny#142 (I never had these problems)

Maybe you can help @kuba2k2 to fix OTA password-, ltchiptool flashing-, and WiFi problems by contributing to the corresponding LibreTiny issues.

@Belaial
Copy link
Contributor Author

Belaial commented Jul 10, 2023

Thanks, will check those links out.

Happy to help in any way I can!

@Belaial
Copy link
Contributor Author

Belaial commented Jul 10, 2023

(disabled OTA for now for easier re-flashing)

I have now tried

api:
  reboot_timeout: 5min

api:
  reboot_timeout: 0s

wifi:
  reboot_timeout: 5min

wifi:
  reboot_timeout: 0s

Neither makes any difference, stick reboots after 15 minutes, just like before.

@hn
Copy link
Owner

hn commented Jul 11, 2023

Maybe you want to try to revert to an older version of LibreTiny-ESPhome, e.g. this is the exact version I'm currently using:

git clone https://github.com/kuba2k2/libretiny-esphome
git reset --hard e483e397933bdbb73df457d0b3fe06cd99533f6e

@Belaial
Copy link
Contributor Author

Belaial commented Jul 11, 2023

Just as a sanity check I restored the default Solis firmware to the stick to check how i behaves.

As long as it's missing WiFi it reboots every 15 minutes just like running it with LibreTiny firmware. Tested this by powering down APs so it could not connect to any SSID.

Once it has WiFi it becomes stable, not a single reboot over close to 3 hours hours now, verified with ping, I can also see it's online timer on the Solis cloud

image

So.... I can only assume there is some underlying WiFi issue in the firmware compiled with LibreTiny for my stick, if there are any hardware revisions of these sticks that differentiates them I have no clue, I have a date printed on the PCB of the stick that says "2021-05-15" just next to the WiFi antenna on the same side as the LED's are.

I will also try and build new FW with the version you suggested above!

@Belaial
Copy link
Contributor Author

Belaial commented Jul 11, 2023

Just wanna make sure I did it correctly.

I created a new directory called "Lower_Version"
cd Lower_Version
git clone https://github.com/kuba2k2/libretiny-esphome
cd libretiny-esphome
git reset --hard e483e397933bdbb73df457d0b3fe06cd99533f6e

Got this message
HEAD is now at e483e397 Merge remote-tracking branch 'upstream/dev' into platform/libretuya

I then moved in the solis-esphome-emw3080.yaml / solis-modbus-inv.yaml / secrets.yaml and then ran the compile in the new "Lower_Version" folder, figured all the other steps are already completed

python3 -m esphome compile solis-esphome-emw3080.yaml

RAM: [== ] 22.5% (used 59048 bytes from 262144 bytes)
Flash: [======= ] 68.4% (used 686780 bytes from 1003520 bytes)
Building UF2 OTA image
|-- esphome_2023.5.0-dev_generic-rtl8710bx-4mb-980k_rtl8710bn_lt1.1.0.uf2
|-- firmware.uf2
|-- firmware.bin
[SUCCESS] Took 81.65 seconds
INFO Successfully compiled program.

And then like before flashed the stick with python2, booting the stick the version changed in Home Assistant to

Firmware: 2023.5.0-dev (Jul 11 2023, 12:59:43)

Booting the stick shows the following lines regarding version

I [ 0.000] LibreTiny v1.1.0+sha.f8876bb on generic-rtl8710bx-4mb-980k, compiled at Jul 11 2023 13:00:06, GCC 10.3.1 (-Os)
[I][app:102]: ESPHome version 2023.5.0-dev compiled on Jul 11 2023, 12:59:43

So I assume I did everything correctly?

@Belaial
Copy link
Contributor Author

Belaial commented Jul 12, 2023

Assuming my post above declares I did everything correct the results are unfortunately the same, no difference att all, reboots every 15 minutes just like before.

Only stable state this stick has is with original Solis firmware and WiFi connected, otherwise it reboots every 15 minutes on Solis firmware also, maybe there is some clue in that finding, I guess a ticket over at LibreTiny is the next step?

@hn
Copy link
Owner

hn commented Jul 12, 2023

Random thoughts II:

  • Try (temporarily) disabling mDNS and/or setting a fixed IP for the stick
  • The problem might not be related to LibreTiny or the S3 stick but to your local HA/WiFi/network/something setup. Do you have any other ESPhome devices within your network? Consider buying a 3€ Wemos D1 mini and setup some kind of dummy sensor, watch for reboots or WiFi drops there

@Belaial
Copy link
Contributor Author

Belaial commented Jul 12, 2023

Random thoughts II:

* Try (temporarily) disabling mDNS and/or setting a fixed IP for the stick

* The problem _might_ not be related to LibreTiny or the S3 stick but to your local HA/WiFi/network/something setup. Do you have any other ESPhome devices within your network? Consider buying a 3€ Wemos D1 mini and setup some kind of dummy sensor, watch for reboots or WiFi drops there

Sure thing, can try disabling mDNS and putting a static IP in the configuration of the stick and OTA update it.

Regarding your other thoughts,

HA: not sure how since the stick reboots after 15 minutes with or without WiFi (except when using Solis firmware) = HA can't interfere when without WiFi and it still reboots, I can shut down HA at a more convenient point to test this also.

WiFi: Reboots with or without WiFi so not sure how that could affect it, also, all other devices in my network be it WiFi or cable are stable and without issues.

Network: Nothing is ever certain but I have no other issues in my network with other devices. I work as a network technician so I am quite certain my network is ok, I really hope I don't end up having to explain myself if this turns out to be network related 😄 but I honestly don't see how it would be the network.

I don't have other ESPhome devices yet but I have bought some Wemos D1 mini and a Pro actually for future projects, I will try and take the time today / tonight to set one up with some Dallas DS18B20 sensors or something.

@Belaial
Copy link
Contributor Author

Belaial commented Jul 12, 2023

mdns:
  disabled: true

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  manual_ip:
    static_ip: 192.168.0.67
    gateway: 192.168.0.1
    subnet: 255.255.255.0

Did not make any difference, up to 4 reboots now unfortunately.
(in case I never said it I also power cycle the stick after a OTA update, probably not needed but I do it anyway)

I hope to try out a Wemos D1 mini later tonight if I get the time, got some IRL stuff I need to handle now.

@Belaial
Copy link
Contributor Author

Belaial commented Jul 13, 2023

I have not yet tried the other Wemos D1, did not have time yesterday, this morning I just did a quick experiment with
https://esphome.io/components/logger.html

logger:
level: VERBOSE

Showed nothing new, it just reboots after 15 minutes without any odd message in the log

logger:
level: VERY_VERBOSE

Slowed down the stick sooooo much that after 15 minutes it had not even arrived at the
[I][ota:113]: Boot seems successful, resetting boot loop counter.
which usually is around 5 minutes I think, I will let it run for a few hours on VERY_VERBOSE now to see if it ever comes to the reboot and if something of interest would show up.

@Belaial
Copy link
Contributor Author

Belaial commented Jul 13, 2023

So, running

logger:
level: VERY_VERBOSE

Delayed the reboot to ~30 minutes instead of 15, I see nothing out of the ordinary when the reboot occurs, here is a pastebin of it

https://pastebin.com/KUv91PVc

Looks exactly like all the other reboots to me.

What I do find interesting is that it was delayed significantly when the stick was occupied with VERY_VERBOSE debugging, does that not prove this is some internal timer that reboots the stick? At least that is the conclusion I see with the knowledge I have regarding stuff like this.

I now also have the Wemos D1 programmed and pinging to see if that stays online, same SSID and same AP as the Solis stick, will report back later.

@Belaial
Copy link
Contributor Author

Belaial commented Jul 13, 2023

Wemos D1 Mini
Basic setup, only WiFi, disabled "api: reboot_timeout: 0s" since I did not integrate it to HA, otherwise it ofc would reboot.
Compiled / flashed via HA ESPhome integration

Stable and working for 2+ hours
7312 seconds run-time, 1 ping missed (WiFi is WiFi so it is what it is 😄) 0 reboots.

Wemos D1 Mini
Setup according to this project with ESP8266 ( https://github.com/hn/ginlong-solis#software-esphome ), disabled "api: reboot_timeout: 0s" since I did not integrate it to HA, otherwise it ofc would reboot.
Compiled / flashed via HA ESPhome integration

Stable and working for 2+ hours
7684 seconds run-time, 1 ping missed (WiFi is WiFi so it is what it is 😄) 0 reboots.

So we now also know that a Wemos D1 Mini with ESPhome is stable and working in my network, both with really simple configuration and with this project (I did not wire it up with a RS485-to-serial adapter but I don't see that making any difference really for this test since it would not be connected to the inverter anyway), it's only the Solis stick that is rebooting every 15 minutes, I'm personally out of ideas to test right now...

@hn
Copy link
Owner

hn commented Jul 14, 2023

Puhhh, that's all I know about it too. Probably now is the time to ask at LibreTiny or ESPhome.

@Belaial
Copy link
Contributor Author

Belaial commented Jul 14, 2023

Yeah same, I need to take the time to write a good post and summarize all tests that have been done, it's somewhat spread out in a few tickets here already.

I saw LibreTiny just released a new version so might try that also, I tried to find some change log to see if there are any changes that could help but could not find one.

I guess we can close this ticket, otherwise it will be auto-closed sooner or later, I will update here if / when a solution is found, many thanks for all the help so far @hn !

EDIT

Found a "change log" 😄

https://github.com/kuba2k2/libretiny/commits/master

@kuba2k2
Copy link

kuba2k2 commented Jul 14, 2023

Just FYI, the new version doesn't have any important changes, it mostly updates some OTA code (not related to encryption issues) and reworks parts of the GPIO functions.

@Belaial
Copy link
Contributor Author

Belaial commented Jul 14, 2023

Ah ok @kuba2k2
Thanks for the information, any ideas what could be the cause for the "15 minute reboot issue"?
(I can also write a proper ticket at LibreTiny but figured I might as well ask when I saw you here 😄)

@kuba2k2
Copy link

kuba2k2 commented Jul 14, 2023

I assume that you've disabled all timeouts (api, wifi, etc) in esphome config.

original Solis firmware and WiFi connected, otherwise it reboots every 15 minutes on Solis firmware also

Why is that? Maybe there's a hardware device inside (another chip, kind of a "watchdog") that reboots the device?

@hn
Copy link
Owner

hn commented Jul 14, 2023

Thanks to @kuba2k2's helpful comment, I took a look at a second (newer) S3 stick (which is supposedly more similar/equal to @Belaial is using): Breaking news, it has a slightly different board layout :)

image

Especially the 6-pin IC in the middle of the picture is

  • at first glance the main difference between the old and new PCB layout
  • seems to be connected to PIN 11 of the EMW3080, which is CHIP_EN ("The CHIP_EN pin is an enable reset pin"). So this IC might indeed be a hardware watchdog IC.

They covered all of the PCB with some protective coating which makes it hard to identify this IC (it might have "RCENG" printed on it?)

If it really is a hardware watchdog, the question is how to reset the watchdog timer. PIN 12 (PA_0) has not been used in the old PCB layout and might be connected to this IC. Maybe it is sufficient to toggle PA_0 every X seconds to reset the WDT? @Belaial You could configure an ESPhome gpio-switch for this pin and toggle it periodically:

switch:
  - platform: gpio
    pin:
      number: PA00
      mode: output
    id: mcu_wdt

interval:
  - interval: 1s
    then:
      - switch.toggle: mcu_wdt

I'm busy during the next days, don't expect any news soon.

@Belaial
Copy link
Contributor Author

Belaial commented Jul 15, 2023

Hi, sorry for late reply, ended up getting a few beers after work and time just went by 😄

Anyway, I have this very same IC as pictured above by @hn (also great thinking @kuba2k2)

@kuba2k2

I will now test the suggestions made by @hn regarding PA00, will update you once I know more!

@Belaial
Copy link
Contributor Author

Belaial commented Jul 15, 2023

I can now HAPPILY report that the stick is stable and no reboots for 3+ hours, not even a single missed ping! 😄

switch:
  - platform: gpio
    pin:
      number: PA00
      mode: output
    id: mcu_wdt

interval:
  - interval: 1s
    then:
      - switch.toggle: mcu_wdt

That solved he reboot issue, which I guess turned out to be a hardware watchdog timer on newer versions of the stick, again great thinking @kuba2k2 and many many thanks for all the troubleshooting along the way @hn (I was thinking along these lines here #13 (comment) but never got around to compare my PCB to the image @hn have uploaded here https://github.com/hn/ginlong-solis/blob/master/solis-wifi-stick-s3-pcb-front.jpg)

I think the - interval: 1s probably can be something bigger like 5 minutes or so but I have not tested and verified that yet but since the reboot timer is "exactly" 15 minutes I guess the interval could be 5 or 10 minutes instead, I will test and verify this later this weekend or beginning of next week, got some IRL stuff I need to attend to now.

@hn
Copy link
Owner

hn commented Jul 16, 2023

@Belaial Can you please check b25f978 . It sends a 100ms 'high' pulse every just under 5 minutes to reset the WDT, but only if WiFi is connected (so it should catch WiFi reconnect problems as well).

@Belaial
Copy link
Contributor Author

Belaial commented Jul 17, 2023

@Belaial Can you please check b25f978 . It sends a 100ms 'high' pulse every just under 5 minutes to reset the WDT, but only if WiFi is connected (so it should catch WiFi reconnect problems as well).

Manually added b25f978
Recompiled and uploaded via OTA

1+ hour of stable connection, 0 missed pings so b25f978 looks good to me 👍

@hn
Copy link
Owner

hn commented Jul 17, 2023

Perfect! Thank you for pursuing this issue so vigorously @Belaial @kuba2k2

I think I'll also exchange my 'old' S3 stick with the 'new' one for the benefit of having a workaround for the reconnect issue :)

@hn hn closed this as completed Jul 17, 2023
@Belaial
Copy link
Contributor Author

Belaial commented Jul 17, 2023

Happy to help! Like I said here #9 (comment)

"Either we get it to accept custom firmware or the stick dies, whatever comes first 👍 "

So happy we finally found all the issues and resolved them! 😄

@kuba2k2
Copy link

kuba2k2 commented Jul 17, 2023

I feel they were like "damn it, these realtek chips have such a crappy wifi, let's add a hardware watchdog to put them in order".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants