Hardware Watchdog Reboots

Moderators: grovkillen, Stuntteam, TD-er

Message
Author
rira2005
Normal user
Posts: 15
Joined: 25 Jun 2017, 21:09

Hardware Watchdog Reboots

#1 Post by rira2005 » 21 May 2019, 10:32

Hello

I am Using about 20 EspEasy Devices, Relay/Temp Mesure/ Motor Action and soon.
I am happy with your project, many thanks for your hard work.
BUT!
Since 1 Year i have a Problem with the Firmware on more an more Wemos D1 Shield....
i get more Hardware Watchdog reboots and Exception Reboots.

i tested the old version
Build 20102 - Mega (ESP82xx Core 2_4_1, NONOS SDK 2.2.1(cfd48f3))
GIT version mega-20180501

with the same config (Counter and LCD and mqtt) no Rules same AP and soon... and a get no reboot... running fine about 6 days...

Same Board same Config with
Build:⋄ 20103 - Mega
System Libraries:⋄ ESP82xx Core 2_4_2, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.0.3 PUYA support
Git Build:⋄ mega-20190419

i get reboots with hardware watchdog and Exception Errors every 3-12 hours...
replaced power supply chanced the board same problem worked fine with old firmware with new reboots...

how can i help the developer?
i can attach a serial an monitor the debug messages send over the serial would this help....

TIA
raphi

-------------
Soon to let you know a also have 2 Sonoff 4Ch running with the same Firmware (only for the 8255 Chip) and thisone is running fine!

User avatar
grovkillen
Core team member
Posts: 3225
Joined: 19 Jan 2017, 12:56
Location: Hudiksvall, Sweden
Contact:

Re: Hardware Watchdog Reboots

#2 Post by grovkillen » 21 May 2019, 10:39

Have you tried another core version?
ESP Easy Flasher [flash tool and wifi setup at flash time]
ESP Easy Webdumper [easy screendumping of your units]
ESP Easy Netscan [find units]
Official shop: https://firstbyte.shop/
Sponsor ESP Easy, we need you :idea: :idea: :idea:

rira2005
Normal user
Posts: 15
Joined: 25 Jun 2017, 21:09

Re: Hardware Watchdog Reboots

#3 Post by rira2005 » 21 May 2019, 10:53

Yes 2.4.1 and 2.4.2 (same behavier..)
but not 2.4.6 should i try? is a alpha....

tia

User avatar
grovkillen
Core team member
Posts: 3225
Joined: 19 Jan 2017, 12:56
Location: Hudiksvall, Sweden
Contact:

Re: Hardware Watchdog Reboots

#4 Post by grovkillen » 21 May 2019, 11:01

2.6.0 is part of the nightly releases
ESP Easy Flasher [flash tool and wifi setup at flash time]
ESP Easy Webdumper [easy screendumping of your units]
ESP Easy Netscan [find units]
Official shop: https://firstbyte.shop/
Sponsor ESP Easy, we need you :idea: :idea: :idea:

rira2005
Normal user
Posts: 15
Joined: 25 Jun 2017, 21:09

Re: Hardware Watchdog Reboots

#5 Post by rira2005 » 21 May 2019, 14:41

hi tried this :
ESP_Easy_mega-20190511_normal_ESP8266_4M.bin
this release have a 2.4.1 core.
reboots...

tried
ESP_Easy_mega-20190511_normal_core_260_sdk222_alpha_ESP8266_4M.bin
this one have a 2.6 core...
1.) lost config after flashing... ok is a alpha...
2.) new wifi config done but get after first connect

INIT : Booting version: mega-20190511 (ESP82xx Core 2.6.0-dev, NONOS SDK 2.2.2-dev(c0eb301), LWIP: 2.1.2 PUYA support)
91 : INIT : Free RAM:27392
92 : INIT : Warm boot #11 - Restart Reason: Software Watchdog
94 : FS : Mounting...
118 : FS : Mount successful, used 75802 bytes of 957314
494 : CRC : program checksum ...OK
506 : CRC : SecuritySettings CRC ...OK
507 : CRC : binary has changed since last save of Settings
613 : INIT : Free RAM:24208
615 : INIT : I2C
615 : INIT : SPI not enabled
629 : INFO : Plugins: 47 [Normal] (ESP82xx Core 2.6.0-dev, NONOS SDK 2.2.2-dev(c0eb301), LWIP: 2.1.2 PUYA support)
632 : WIFI : No valid wifi settings
633 : WIFI : Could not connect to AP!
633 : WIFI : Set WiFi to AP
1557 : WIFI : AP Mode ssid will be ESP_Easy_0 with address 192.168.4.1
2893 : WD : Uptime 0 ConnectFailures 0 FreeMem 19560 WiFiStatus 0
5000 : AP Mode: Client connected: 08:C5:E1:BF:79:70 Connected devices: 1

pls tell me what (not alpha) have a 2.6 core?
many thanks.
raphi

User avatar
grovkillen
Core team member
Posts: 3225
Joined: 19 Jan 2017, 12:56
Location: Hudiksvall, Sweden
Contact:

Re: Hardware Watchdog Reboots

#6 Post by grovkillen » 21 May 2019, 14:55

The core 2.6.0 is alpha. We're not creating the core ourselves even if we have put code back into the core it's the responsibilty of Espressif. So I guess it's something inside the core that is not working correctly with our implementation of the core.
ESP Easy Flasher [flash tool and wifi setup at flash time]
ESP Easy Webdumper [easy screendumping of your units]
ESP Easy Netscan [find units]
Official shop: https://firstbyte.shop/
Sponsor ESP Easy, we need you :idea: :idea: :idea:

rira2005
Normal user
Posts: 15
Joined: 25 Jun 2017, 21:09

Re: Hardware Watchdog Reboots

#7 Post by rira2005 » 21 May 2019, 15:11

Okay understand.
But how can i find the reason of the reboots with the "non alpha" ESP_Easy_mega-20190511_normal_ESP8266_4M.bin with the 2.4.2 core?
i cant believe that the new builds a not stable as the old one...
and the stable running have a 2.4.1 core... but the build aus from may 2018...

i also tested it with the ESP_Easy_mega-20190511_normal_core_241_ESP8266_4M.bin but same problem reboots... so i think its a core problem!

may any idea?

will a bugreport at github help to find the problem....

tia
raphi

User avatar
ThomasB
Normal user
Posts: 400
Joined: 17 Jun 2018, 20:41
Location: USA

Re: Hardware Watchdog Reboots

#8 Post by ThomasB » 22 May 2019, 04:46

will a bug report at github help to find the problem....
No need to do that for this well known issue. There are already a few github tickets for ESPEasy watchdog reboots and random crashes. These problems began mid-2018 and have been ongoing ever since.

@TD-er is the main developer that has been trying to solve the reboot issues. I would imagine that all his hair has turned gray or has fallen out from the debugging. There have many discussions about what may be causing the reboots but no holy grail solutions yet.

BTW, not all of the reboot issues are due to the software. They can also occur if the ESP8266's 3.3V voltage is noisy or sags. The chip's average current consumption is low, but there are short duration high-current spikes (during some operations) that may cause voltage problems.

Regarding voltage, some users have found that their ESP8266 clone boards had inadequate supply filtering and needed additional caps close to the chip's power pins. Plus there are reports that some Wemos D1 have under-rated voltage regulators. For example: https://www.reddit.com/r/esp8266/commen ... r_on_your/

Another influence on reboots seems to be related to WiFi signal performance. Despite strong/reliable RF signals to my ESPEasy devices they experience many reboots. So follow the recommendations but don't expect miracles to occur.

- Thomas

rira2005
Normal user
Posts: 15
Joined: 25 Jun 2017, 21:09

Re: Hardware Watchdog Reboots

#9 Post by rira2005 » 22 May 2019, 09:08

hi thomas
many thanks for your answer.
yep allready see this, that the problem began in middle 2018 on github....

but it is really a problem for me,
...set a espswitch ON via mqtt, them the esp reboot and start up with OFF....
so i have to send the mqtt info that the esp rebooted and than i have to send to a extra mqtt msg to the esp with the current state.... is not fine...

i hope the dev will find a solution for the reboots because in more situations the new firmware isnt usable....

i dint understand why my sonoff 4ch (8255) run fine with the new firmware....same compiler same core and soon....

many thanks
raphi

User avatar
ThomasB
Normal user
Posts: 400
Joined: 17 Jun 2018, 20:41
Location: USA

Re: Hardware Watchdog Reboots

#10 Post by ThomasB » 22 May 2019, 18:03

...set a espswitch ON via mqtt, them the esp reboot and start up with OFF....
so i have to send the mqtt info that the esp rebooted and than i have to send to a extra mqtt msg to the esp with the current state.... is not fine...
That is how others have solved it. But there is another solution. After a watchdog reboot the system variable values are automatically restored from the RTC memory. So this feature provides a workaround to restore the GPIO pins on a warm reboot.

Use the Generic - Dummy Device plugin to store the GPIO pin states that are important. That is to say, create your rules so that whenever a GPIO pin is changed its new state is saved to a dummy variable. Use -1 for OFF state, 1 for On state. Then create a On System#Boot rule that uses the saved dummy values to restore the GPIO pins. Also, your rules can test for a 0 value state, which means the RTC memory was erased due to a hard reset.

I haven't tried this conceptual idea but it is a possible solution for your situation.

- Thomas

User avatar
grovkillen
Core team member
Posts: 3225
Joined: 19 Jan 2017, 12:56
Location: Hudiksvall, Sweden
Contact:

Re: Hardware Watchdog Reboots

#11 Post by grovkillen » 22 May 2019, 18:41

@Thomas: that's pretty clever. I'll use that one, thanks!
ESP Easy Flasher [flash tool and wifi setup at flash time]
ESP Easy Webdumper [easy screendumping of your units]
ESP Easy Netscan [find units]
Official shop: https://firstbyte.shop/
Sponsor ESP Easy, we need you :idea: :idea: :idea:

georgep
Normal user
Posts: 33
Joined: 05 May 2019, 16:32
Location: Somerset, UK

Re: Hardware Watchdog Reboots

#12 Post by georgep » 28 May 2019, 11:43

Hi guys,

I'm pretty new to the whole ESP8266 thing and have now set up a few units with ESPEasy and my long-established Domoticz server.

I've now come across the random reboots issue having noticed less-than-expected uptime figures and also a couple of instances of weird behaviour with my bathroom towel rail controller (using a genuine Sonoff Touch T1). This unit in particular seems to randomly reboot more than my others, although a cheap (clone?) Wemos D1 Mini in a less critical role also reboots for no reason.

I've tried various firmware versions which seem to make no difference and I've now set a static IP on the Sonoff Touch unit to see if there could be any interaction between dhcp lease renewals and the reboots.

I'd be happy to get involved in testing to try to resolve this issue.

George
Somerset, UK

User avatar
ThomasB
Normal user
Posts: 400
Joined: 17 Jun 2018, 20:41
Location: USA

Re: Hardware Watchdog Reboots

#13 Post by ThomasB » 28 May 2019, 17:51

I'd be happy to get involved in testing to try to resolve this issue.
One of the developers (TD-er) suggests using the recent releases with core 2.5.2. He says this core may be a bit more stable (less reboots), but is too soon to tell.

- Thomas

georgep
Normal user
Posts: 33
Joined: 05 May 2019, 16:32
Location: Somerset, UK

Re: Hardware Watchdog Reboots

#14 Post by georgep » 29 May 2019, 11:34

ThomasB wrote:
28 May 2019, 17:51
One of the developers (TD-er) suggests using the recent releases with core 2.5.2.
He says this core may be a bit more stable (less reboots), but is too soon to tell.
Thanks.
I'll give this a try when I get some time and report back here :-)
-George

georgep
Normal user
Posts: 33
Joined: 05 May 2019, 16:32
Location: Somerset, UK

Re: Hardware Watchdog Reboots

#15 Post by georgep » 30 May 2019, 22:03

OK so I flashed the core-252 version onto a couple of my units and just as I was about to start to think it was better I had another random HW Watchdog reboot...

It is a 'clone' Wemos D1 mini running 'ESP_Easy_mega-20190523_normal_core_252_ESP8266_4M.bin' and it rebooted after 1975 minutes (I'm continuously reporting my units' uptime to my Domoticz server).

User avatar
ThomasB
Normal user
Posts: 400
Joined: 17 Jun 2018, 20:41
Location: USA

Re: Hardware Watchdog Reboots

#16 Post by ThomasB » 30 May 2019, 23:51

I was about to start to think it was better I had another random HW Watchdog reboot..
I wouldn't expect a recent release with core-252 to fully solve the reboot problem. But they say it should offer some improvement.

At least that is what I am hoping for; Two days ago I installed ESP_Easy_mega-20190523_test_core_252_ESP8266_4M.bin on two of NodeMCU's. They have not rebooted yet.
It is a 'clone' Wemos D1 mini ...
FWIW, I posted a link (see May 21 post) concerning the undersized VReg found on some cloned Wemos. Also, some ESP8266 installations may have voltage noise problems too. That is to say, there are many contributors to the reboot problem and some can't be fixed by firmware. I'm not implying that firmware isn't causing reboots (it is) but improving stability may involve the hardware too.

- Thomas

rira2005
Normal user
Posts: 15
Joined: 25 Jun 2017, 21:09

Re: Hardware Watchdog Reboots

#17 Post by rira2005 » 31 May 2019, 07:59

hi

me again. Yes the 2.5.2 is more stable, runs 2 days without reboot.
@georgep, do you use also a mttq to communicate with your domoticz?

because, after i disabled the mqtt controler(spezial domoticz mqtt) the esp didnt reboot since 4 days!
@td-er, is this possible? that a controler(plugin) can do a hardware watchdog reboot?

only a idea....

rg
raphi

georgep
Normal user
Posts: 33
Joined: 05 May 2019, 16:32
Location: Somerset, UK

Re: Hardware Watchdog Reboots

#18 Post by georgep » 31 May 2019, 09:35

rira2005 wrote:
31 May 2019, 07:59

@georgep, do you use also a mttq to communicate with your domoticz?
I have been doing some experimenting with mqtt but now it is all disabled.
I was trying to simplify my setup to make it easier to find problems so now I am using only Domoticz/HTTP.

I agree that 2.5.2 seems much better; I think that some of my issues are also now hardware related :-/

George

georgep
Normal user
Posts: 33
Joined: 05 May 2019, 16:32
Location: Somerset, UK

Re: Hardware Watchdog Reboots

#19 Post by georgep » 31 May 2019, 12:56

ThomasB wrote:
30 May 2019, 23:51

FWIW, I posted a link (see May 21 post) concerning the undersized VReg found on some cloned Wemos. Also, some ESP8266 installations may have voltage noise problems too. That is to say, there are many contributors to the reboot problem and some can't be fixed by firmware. I'm not implying that firmware isn't causing reboots (it is) but improving stability may involve the hardware too.
I just checked my Wemos D1 Mini units and they do have the under-rated voltage regulator! :cry:
I will buy my next units from the Official Lolin Store https://lolin.aliexpress.com/store/1331105

-George

rira2005
Normal user
Posts: 15
Joined: 25 Jun 2017, 21:09

Re: Hardware Watchdog Reboots

#20 Post by rira2005 » 03 Jun 2019, 11:58

hi
i checkt my wemos d1 shields too. but on this one there are used mp2359 smd part.
so dont know if this one have a problem, your link only references to the wemos d1 mini...
br
raphi

georgep
Normal user
Posts: 33
Joined: 05 May 2019, 16:32
Location: Somerset, UK

Re: Hardware Watchdog Reboots

#21 Post by georgep » 03 Jun 2019, 22:39

ThomasB wrote:
30 May 2019, 23:51

I wouldn't expect a recent release with core-252 to fully solve the reboot problem. But they say it should offer some improvement.

At least that is what I am hoping for; Two days ago I installed ESP_Easy_mega-20190523_test_core_252_ESP8266_4M.bin on two of NodeMCU's. They have not rebooted yet.
Now this is looking better :-)

Uptime: 3 days 9 hours 52 minutes
Binary Filename: ESP_Easy_mega-20190523_normal_core_252_ESP8266_4M.bin

This is on one of my Wemos D1 Mini clone boards with the 'wrong' voltage regulator.

georgep
Normal user
Posts: 33
Joined: 05 May 2019, 16:32
Location: Somerset, UK

Re: Hardware Watchdog Reboots

#22 Post by georgep » 08 Jun 2019, 17:30

I'm thinking that I may have some idea what might be behind all these random Watchdog reboots.

Some days ago I configured the 'System Info' plugin to report the uptime of all of my units to Domoticz (http) to let me see how often my units were rebooting.

Since I did that the random reboots have definitely been more frequent - this morning none of my units had uptimes longer than just a few hours!!
It also seems to me that after a number of random reboots, the uptimes get shorter - as though a watchdog reboot is not really clearing everything in the same way as a power-down reboot. This morning my bathroom towel rail controller rebooted twice during what should have been a 2-hour period of warming the towels :-( I power-off rebooted the towel rail controller and it then ran successfully for a 2-hour cycle.

If reporting uptimes to Domoticz every 5 minutes *IS* making the reboots more frequent, then maybe it is the code relating to the 'Data Acquisition / Reporting' process that is somehow causing more watchdog reboots.

Does this make any sense to anyone?

I have now disabled the uptime reporting and slowed down the data reporting (every minute to every 10 minutes) on some of my units to see if this makes any noticeable difference.

Another interesting fact is that I have re-purposed one of my Wemos D1 Mini boards (previously running ESP-Easy and suffering frequent reboots) as a WiFi repeater (https://github.com/martin-ger/esp_wifi_repeater) and while it still has the same no-frills USB power supply, the "wrong" 3.3v regulator chip and is ostensibly working much harder is has so far been up for over 3 days - longer than any of my ESP-Easy units.

I think that ESP-Easy is an amazing project but with all of my units rebooting randomly every few hours I really need to find an answer - I'll happily devote some time to looking for the solution :-)

User avatar
grovkillen
Core team member
Posts: 3225
Joined: 19 Jan 2017, 12:56
Location: Hudiksvall, Sweden
Contact:

Re: Hardware Watchdog Reboots

#23 Post by grovkillen » 08 Jun 2019, 19:27

How did you measure the uptime before using the system info plugin?
ESP Easy Flasher [flash tool and wifi setup at flash time]
ESP Easy Webdumper [easy screendumping of your units]
ESP Easy Netscan [find units]
Official shop: https://firstbyte.shop/
Sponsor ESP Easy, we need you :idea: :idea: :idea:

georgep
Normal user
Posts: 33
Joined: 05 May 2019, 16:32
Location: Somerset, UK

Re: Hardware Watchdog Reboots

#24 Post by georgep » 08 Jun 2019, 20:38

grovkillen wrote:
08 Jun 2019, 19:27
How did you measure the uptime before using the system info plugin?
I would quickly step around the 'homepage' of all my units and make a note on paper of the uptimes, then repeat this when I get a chance.

Code: Select all

Unit Number:	3
Git Build:	mega-20190523
Local Time:	2019-06-08 19:34:57
Uptime:	0 days 6 hours 36 minutes
Not accurate I know, but.....

User avatar
grovkillen
Core team member
Posts: 3225
Joined: 19 Jan 2017, 12:56
Location: Hudiksvall, Sweden
Contact:

Re: Hardware Watchdog Reboots

#25 Post by grovkillen » 08 Jun 2019, 20:54

You should then try to use a syslog server and do the same test, if you can please ;)
ESP Easy Flasher [flash tool and wifi setup at flash time]
ESP Easy Webdumper [easy screendumping of your units]
ESP Easy Netscan [find units]
Official shop: https://firstbyte.shop/
Sponsor ESP Easy, we need you :idea: :idea: :idea:

User avatar
ThomasB
Normal user
Posts: 400
Joined: 17 Jun 2018, 20:41
Location: USA

Re: Hardware Watchdog Reboots

#26 Post by ThomasB » 08 Jun 2019, 21:01

Some days ago I configured the 'System Info' plugin to report the uptime of all of my units to Domoticz (http) to let me see how often my units were rebooting.
I have my devices send me an email each time they reboot. This provides a convenient notification of the event.

I was hoping that collecting the emails would help me identify a pattern. There have been some time periods where multiple units rebooted within a two hour window, but overall the issue appears to be truly random.

I suspect that there are several contributors to the reboot issue. And not being able to reproduce it on demand, plus its random nature, makes this a debugging nightmare. I think that it will require a JTAG hardware debugger manned by a coding genius with endless free time.

But the little fellow on my shoulder says that maybe the problem is screaming at us with important clues. So perhaps all that is needed is for an observant user to connect the dots.

- Thomas

waspie
Normal user
Posts: 110
Joined: 09 Feb 2017, 19:35

Re: Hardware Watchdog Reboots

#27 Post by waspie » 10 Jun 2019, 13:00

georgep wrote:
08 Jun 2019, 17:30
I'm thinking that I may have some idea what might be behind all these random Watchdog reboots.

Some days ago I configured the 'System Info' plugin to report the uptime of all of my units to Domoticz (http) to let me see how often my units were rebooting.

Since I did that the random reboots have definitely been more frequent - this morning none of my units had uptimes longer than just a few hours!!
It also seems to me that after a number of random reboots, the uptimes get shorter - as though a watchdog reboot is not really clearing everything in the same way as a power-down reboot. This morning my bathroom towel rail controller rebooted twice during what should have been a 2-hour period of warming the towels :-( I power-off rebooted the towel rail controller and it then ran successfully for a 2-hour cycle.

If reporting uptimes to Domoticz every 5 minutes *IS* making the reboots more frequent, then maybe it is the code relating to the 'Data Acquisition / Reporting' process that is somehow causing more watchdog reboots.

Does this make any sense to anyone?

I have now disabled the uptime reporting and slowed down the data reporting (every minute to every 10 minutes) on some of my units to see if this makes any noticeable difference.

Another interesting fact is that I have re-purposed one of my Wemos D1 Mini boards (previously running ESP-Easy and suffering frequent reboots) as a WiFi repeater (https://github.com/martin-ger/esp_wifi_repeater) and while it still has the same no-frills USB power supply, the "wrong" 3.3v regulator chip and is ostensibly working much harder is has so far been up for over 3 days - longer than any of my ESP-Easy units.

I think that ESP-Easy is an amazing project but with all of my units rebooting randomly every few hours I really need to find an answer - I'll happily devote some time to looking for the solution :-)
you don't *need* the system info plugin to get uptime.

you can send the system variable %uptime% and other variables available at http://yourESPip/sysvars

Domosapiens
Normal user
Posts: 283
Joined: 06 Nov 2016, 13:45

Re: Hardware Watchdog Reboots

#28 Post by Domosapiens » 12 Jun 2019, 00:05

To minimize reboot measurement disturbance, I use this rule,

Code: Select all

On System#Boot do
timerset,3,30 //Reboot detection wait for WiFi
Endon

On Rules#Timer=3 do //reboot detected
SendToHTTP 192.168.1.7,8080,/json.htm?type=command&param=switchlight&idx=219&switchcmd=On
SendToHTTP 192.168.1.7,8080,/json.htm?type=command&param=switchlight&idx=219&switchcmd=Off
Endon
..... to switch ON/OFF a switch in Domotcz (IDX=219 in this case)

The switch graph gives a clear view on the reboot rithm :shock: :shock:
But no clue on all those reboots.
Effect - Cause - Effect .... still difficult
30+ ESP units for production and test. Ranging from control of heating equipment, flow sensing, floor temp sensing, energy calculation, floor thermostat, water usage, to an interactive "fun box" for my grandson. Mainly Wemos D1.

waspie
Normal user
Posts: 110
Joined: 09 Feb 2017, 19:35

Re: Hardware Watchdog Reboots

#29 Post by waspie » 12 Jun 2019, 14:01

On reboot/reconnect I send the dummy values that represent the switch/item states to openhab and openhab compares them to what they should be.
for any item not in sync openhab then sends the command to change whatever item/switch back to ESPEasy. works pretty well.

Say I have a switch that I control via MQTT from openhab - > ESPEasy.

When the MQTT command is received the first thing I do is set the value to a dummy item and then i actuate the actual switch.
If the unit reboots or resets then 30 seconds or so after it connects it sends that dummy value. If that value is not equal to what openhab thinks it should be then openhab sends that mqtt command back.

georgep
Normal user
Posts: 33
Joined: 05 May 2019, 16:32
Location: Somerset, UK

Re: Hardware Watchdog Reboots

#30 Post by georgep » 14 Jun 2019, 18:35

This morning I had a bit of a 'Eureka' moment and I wondered if the perhaps the ESP-linking between my units could have anything to do with all the reboots.
None of my units had an uptime of more than a few hours.

I went in to "Tools / Advanced' and set the "Inter-ESPEasy Network" UDP port number to zero on all my units.
At the same time I unticked the "Enable serial port" box - again on all of my [5] units.

Since then (approx 8hrs ago now) I have had not one single reboot.

Fingers crossed!!

Maybe someone else who is suffering random reboots could try doing this?

User avatar
ThomasB
Normal user
Posts: 400
Joined: 17 Jun 2018, 20:41
Location: USA

Re: Hardware Watchdog Reboots

#31 Post by ThomasB » 14 Jun 2019, 19:08

I went in to "Tools / Advanced' and set the "Inter-ESPEasy Network" UDP port number to zero on all my units.
At the same time I unticked the "Enable serial port" box - again on all of my [5] units.
Good ideas, may help someone reduce their reboots. I have observed that serial activity can invite more reboots.

For the record, mine have been configured per your suggestion for a long time. Reboots still occur. So don't expect it to be the holy grail that permanently solves all the reboot problems.

- Thomas

waspie
Normal user
Posts: 110
Joined: 09 Feb 2017, 19:35

Re: Hardware Watchdog Reboots

#32 Post by waspie » 17 Jun 2019, 13:02

anecdotal evidence to the contrary: :lol:
up.PNG
up.PNG (14.89 KiB) Viewed 6793 times

georgep
Normal user
Posts: 33
Joined: 05 May 2019, 16:32
Location: Somerset, UK

Re: Hardware Watchdog Reboots

#33 Post by georgep » 17 Jun 2019, 13:20

waspie wrote:
17 Jun 2019, 13:02
anecdotal evidence to the contrary: :lol:
WOW!
What have you done to achieve that?
I can't get any of my units to stay up for more than a few hours, yet the same hardware with an alternative firmware that I won't mention here will stay up with no issues.

TD-er
Core team member
Posts: 1487
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: Hardware Watchdog Reboots

#34 Post by TD-er » 18 Jun 2019, 11:04

Sorry I have not visited the forum lately.
Usually I only get here when GrovKillen "summons" me via Slack. :)

About the WDT reboots.
There seem to be various reasons that cause these WDT reboots, like already well explained by ThomasB.

Some of the fixes in core 2.5.2 are related to handling stalled network connections, which were also high on my suspect list for causes of these WDT reboots.
The thing is, a WiFi connection is not that stable as you may think. Also on a mobile phone or a laptop, the WiFi connection may experience lost packets or malformed ones and even disconnects every now and then.
That's all inherent to the way WiFi is operating, so it is part of normal operations.
But sadly error state handling is not very stable on the esp8266/Arduino platform.
At least 3 of those issues were tackled in the last core 2.5.2 version, but there are more of them.
One of the things I am quite certain about is that the internal state of the WiFi connection status is wrong every now and then.
This state is being used to start initiating a new network connection, but if it is incorrect then we may try to start a new connection while not being connected.
This may lead to buffer overflows or waiting forever for a reply.

One 'funny' thing that may improve stability is simply sending a ping message to the node.
I still have no idea why sending an ICMP packet is actually changing behavior of the access point and ESP so dramatically, but it does help. (still no holy grail, just making it harder to reproduce the reboots :) )

So one thing you may conclude is that increasing network traffic does increase the chance to run into some WDT reboot.
Thus disabling ESPeasy p2p traffic (setting port to 0) will statistically lower the chance for a WDT reboot.

@ThomasB
I still have my hair, but are running out of nails....

georgep
Normal user
Posts: 33
Joined: 05 May 2019, 16:32
Location: Somerset, UK

Re: Hardware Watchdog Reboots

#35 Post by georgep » 18 Jun 2019, 11:19

TD-er wrote:
18 Jun 2019, 11:04
One of the things I am quite certain about is that the internal state of the WiFi connection status is wrong every now and then.
I've definitely noticed unexpected WiFi disconnects and re-connects, some of which seem to coincide with a reboot, some not.
I've tried setting a static IP address and also extending the lease time on m DHCP server, neither of which have made a significant difference.

TD-er wrote:
18 Jun 2019, 11:04
One 'funny' thing that may improve stability is simply sending a ping message to the node.
I still have no idea why sending an ICMP packet is actually changing behavior of the access point and ESP so dramatically, but it does help. (still no holy grail, just making it harder to reproduce the reboots :) )
I'll set that up and see if it changes anything for me.

Thanks for the input!
George

waspie
Normal user
Posts: 110
Joined: 09 Feb 2017, 19:35

Re: Hardware Watchdog Reboots

#36 Post by waspie » 18 Jun 2019, 13:10

georgep wrote:
17 Jun 2019, 13:20
waspie wrote:
17 Jun 2019, 13:02
anecdotal evidence to the contrary: :lol:
WOW!
What have you done to achieve that?
I can't get any of my units to stay up for more than a few hours, yet the same hardware with an alternative firmware that I won't mention here will stay up with no issues.
probably sheer luck! i have other nodes of that vintage that seem to do well but its espeasy from a year or so ago running core 2.4.2 IIRC.

TD-er
Core team member
Posts: 1487
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: Hardware Watchdog Reboots

#37 Post by TD-er » 19 Jun 2019, 23:20

Very likely core 2.3.0

I still regret we made the change to core 2.4.x with so many changes that there's no way back.

User avatar
ThomasB
Normal user
Posts: 400
Joined: 17 Jun 2018, 20:41
Location: USA

Re: Hardware Watchdog Reboots

#38 Post by ThomasB » 20 Jun 2019, 00:29

I still regret we made the change to core 2.4.x with so many changes that there's no way back.
In rare cases core 2.4.x is golden too.

All my ESPEasy's (a mix of NodeMCU and Sonoff modules) have reboot issues. Except one gifted Sonoff that has not rebooted since it was deployed. Now reporting a system uptime of 51 days.
-
espeasy.jpg
espeasy.jpg (61.9 KiB) Viewed 6545 times
-
It was flashed with ESP_Easy_mega-20181207_normal_core_241_ESP8266_1024.bin. It is a minimal installation with two plugins (Switch input and Generic Dummy) and one controller (OpenHab MQTT ). I use it to control a clothes dryer so that Alexa can turn it on.

Success stories like this are interesting. But sadly they don't help solve the reboot problem.

- Thomas

TD-er
Core team member
Posts: 1487
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: Hardware Watchdog Reboots

#39 Post by TD-er » 20 Jun 2019, 11:30

I have had a few "golden" builds myself too.
Problem is that some of the reboot issues seem to be related to some kind of randomness.
Either at boot, or at build time.

I tend to believe there is somewhere a bug (can be in our code, or in the core libraries) related to some buffer size not including a 0 termination character, or not setting it.
This makes it depending on the order of allocation of data and maybe also on whether the memory was set to 0's before using that variable.
Some ordering of data is determined at compile time and some at run time.

So either a buffer is created too small (not including room for a 0 termination character) or the array is not initialized well.
Either way will lead to undetermined results when trying to parse a string in it. (the end will not be found)

waspie
Normal user
Posts: 110
Joined: 09 Feb 2017, 19:35

Re: Hardware Watchdog Reboots

#40 Post by waspie » 20 Jun 2019, 13:01

TD-er wrote:
19 Jun 2019, 23:20
Very likely core 2.3.0

I still regret we made the change to core 2.4.x with so many changes that there's no way back.
esp.PNG
esp.PNG (27.81 KiB) Viewed 6474 times

georgep
Normal user
Posts: 33
Joined: 05 May 2019, 16:32
Location: Somerset, UK

Re: Hardware Watchdog Reboots

#41 Post by georgep » 20 Jun 2019, 15:07

TD-er wrote:
19 Jun 2019, 23:20
I still regret we made the change to core 2.4.x with so many changes that there's no way back.
Are there any archived earlier versions available anywhere?
In many applications (such as feeding sensor data to Domoticz/OpenHAB/Whatever) reboots really don't matter, but I have one application (controlling a bathroom towel rail using on/off switching to maintain heat yet save power) which doesn't need any sensors and uses a few simple rules, but in this scenario a reboot turns the unit off (meaning cold towels and unhappy family :o ).
OK I could just write more code to attempt to survive reboots, but perhaps a 'no-frills' early version might just be stable [enough]?

George

User avatar
ThomasB
Normal user
Posts: 400
Joined: 17 Jun 2018, 20:41
Location: USA

Re: Hardware Watchdog Reboots

#42 Post by ThomasB » 20 Jun 2019, 17:17

I tend to believe there is somewhere a bug (can be in our code, or in the core libraries) related to some buffer size not including a 0 termination character, or not setting it.
I suspect that kind of problem too. Or possibly a var pointer with an address that isn't initialized.
Are there any archived earlier versions available anywhere?
Try the R120 release. When I was using it, it never rebooted. But in my installation it required a power cycle if WiFi was lost, which would occur whenever I rebooted my router.
http://www.letscontrolit.com/downloads/ESPEasy_R120.zip

- Thomas

georgep
Normal user
Posts: 33
Joined: 05 May 2019, 16:32
Location: Somerset, UK

Re: Hardware Watchdog Reboots

#43 Post by georgep » 20 Jun 2019, 17:42

ThomasB wrote:
20 Jun 2019, 17:17
Try the R120 release. When I was using it, it never rebooted.
But in my installation it required a power cycle if WiFi was lost, which would occur whenever I rebooted my router.
Thanks @ThomasB - I'll give that a try :)

TD-er
Core team member
Posts: 1487
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: Hardware Watchdog Reboots

#44 Post by TD-er » 21 Jun 2019, 18:01

One of the builds I had here which were running very stable was built on the last day of 2017.
Also builds up-to end of March 2018 were running core versions before 2.4.0

TD-er
Core team member
Posts: 1487
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: Hardware Watchdog Reboots

#45 Post by TD-er » 22 Jun 2019, 13:10

I was just looking through my own nodes and this one is currently having the longest uptime of 14 days, 9 hours:
ESP_Easy_mega-20190523_normal_core_252_ESP8266_4M.bin

It also has quite an unstable WiFi (260 reconnects), so it should have had enough chances to crash when sending data.
It has ESPeasy p2p active and is sending data to Domoticz via MQTT.

Wiki
Normal user
Posts: 138
Joined: 23 Apr 2018, 17:55

Re: Hardware Watchdog Reboots

#46 Post by Wiki » 22 Jun 2019, 14:34

I did update almost all of my nodes early in January and the watchdog timeouts began instantly. The version they were running on before was dated on 22nd of May 2018 - without any wd timeouts and uptimes of >100 days. The only hint of this version was, that the resolution of the Dallas snsors was not changeable. The core of the version I have to look at, but I think I remember 2.4.1. All oftthem are Wemos D1 clones from China, equipped with the weak voltage regulator.

[edit]
confirmed: ESP_Easy_mega-20180522_normal_ESP8266_4096.bin is core 2.4.1
[/edit]

georgep
Normal user
Posts: 33
Joined: 05 May 2019, 16:32
Location: Somerset, UK

Re: Hardware Watchdog Reboots

#47 Post by georgep » 22 Jun 2019, 15:37

I put one of my nodes back to the 120 version and this is [believe it or not!] one of my best uptimes since I started with ESP Easy some weeks ago :shock:

Code: Select all

System Time:	14:22
Load:	9% (LC=27869)
Uptime:	1033 minutes
Wifi RSSI:	-76 dB
IP:	192.168...
GW:	192.168...
Build:	120
Core Version:	2_3_0
Unit:	0
I was initially curious about the 'bad voltage regulator' story but this is from a clone Wemos-D1-mini with the 'wrong' regulator running different firmware (and look at the supply voltage!) :lol: :

Code: Select all

CMD>show stats
System uptime: 357:04:55
7841 KiB in (64595 packets)
71568 KiB out (78100 packets)
Power supply: 2.944 V
Phy mode: n
Free mem: 48024
Last edited by georgep on 22 Jun 2019, 19:05, edited 1 time in total.

User avatar
ThomasB
Normal user
Posts: 400
Joined: 17 Jun 2018, 20:41
Location: USA

Re: Hardware Watchdog Reboots

#48 Post by ThomasB » 22 Jun 2019, 18:20

I don't know where I read it, but someone stated that the current spikes during WiFi negotiations has increased in later core releases. This would suggest a reason for the voltage "noise" related reboots not affecting older firmware versions.

- Thomas

User avatar
grovkillen
Core team member
Posts: 3225
Joined: 19 Jan 2017, 12:56
Location: Hudiksvall, Sweden
Contact:

Re: Hardware Watchdog Reboots

#49 Post by grovkillen » 22 Jun 2019, 20:06

Here's my longest run, prior to that it ran for about 180 days when I by mistake manually rebooted it. It's a 8×DS18b20 equipped unit powered by a UPS.

Image
ESP Easy Flasher [flash tool and wifi setup at flash time]
ESP Easy Webdumper [easy screendumping of your units]
ESP Easy Netscan [find units]
Official shop: https://firstbyte.shop/
Sponsor ESP Easy, we need you :idea: :idea: :idea:

User avatar
dynamicdave
Normal user
Posts: 168
Joined: 30 Jan 2017, 20:25
Location: Hampshire, UK

Re: Hardware Watchdog Reboots

#50 Post by dynamicdave » 23 Jun 2019, 07:06

Hi guys,
I've been doing experiments on 'deepsleep' with a set of Wemos D1 Mini devices and encountered the problem with random reboots.
When I checked some of my other units, I found they were acting the same (rebooting at odd times).
Some would reboot in say 23 mins, others might go for 70 or 80 minutes.
Although there is a +3V3 regulator on the Wemos board, there isn't any reservoir capacitors.
So I took a Wemos D1 Mini (with all the correct logos and curved PCB) fiited it to a breadboard with a BME sensor (so I had something to measure) and added a 220uF and a 0.1uF capacitors from +5V to GND and +3V3 to GND.
i.e. four capacitors in total. (I can post a photo if anyone needs it).
Each pair of capacitors are wired in parallel and stradle the two power rails.

The idea is the 220uF acts as a reservoir and helps maintain the voltage when sudden loads are applied, while the 0.1uF offers a low impedance path to ground for any nasty little spikes.

Note: I used 220uF capacitors because they were the only ones I had in my 'bits-n-bobs' box.
I'm sure you could probably use a smaller one (I'll do some experiments this week and/or try to find my textbook about how to calculate the size).

So far the Wemos has been running for 620 minutes without rebooting (which is a dramatic improvement).
I just thought I'd share this with the forum as other people could try it and see if it helps their situation.

It's a long time since I designed a power supply, but I suspect when the Wemos switches on the WiFi the current consumption rises dramatically.
If the power supply that is feeding the Wemos is poor, then the supplied voltage will fall for a short period of time (and if it falls too low the Wemos goes into reboot).
This might explain why @grovkillen's device has run for such a long time as his is driven from a UPS which by its very nature has excellent energy storage capability.

Cheers from David.

Post Reply

Who is online

Users browsing this forum: No registered users and 17 guests