Hardware Watchdog Reboots

Moderators: grovkillen, Stuntteam, TD-er

Message
Author
rayE
Normal user
Posts: 122
Joined: 12 Oct 2017, 12:53
Location: Philippines

Re: Hardware Watchdog Reboots

#101 Post by rayE » 13 Jul 2019, 14:03

And tada.... both nodes are up and running for 3 days now. No more web hangers etc
I totally agree, There is a lot of evidence to the underlying problem given the router configuration. This needs to be documented in some kind of on line spread sheet that the "testers" update (maybe), im sure the results may HELP point a fix for the developers?

Ray

Shardan
Normal user
Posts: 1105
Joined: 03 Sep 2016, 23:27
Location: Bielefeld / Germany

Re: Hardware Watchdog Reboots

#102 Post by Shardan » 20 Jul 2019, 14:58

After some days of testing I think the reason is located with the WiFi reconnects.

As posted above I'm runing three nodes with the 0630-version.
These nodes were resetted by WD after a few hours.
The following steps made them working far more stable:

Usually I run a Ubiquiti Unifi Long Range AP for severeal WiFi's and about 35 devices from TV to ESPEasy.

I disabled the WiFi for home automation on that AP and grabbed an old TPLink AP from the basement shelf
so all nodes are running on a separate AP now.
Secind I set the configuration to "Force WiFi No Sleep"= on and "Periodically send Gratuitous ARP" = on (should be set by default).

Sadly we had a complete power outage here so the uptimesare shorted but atm I have uptimes of around 7 days on all three devices.
I didn't see such uptimes for a while... ;)

I won't say this is a solution.
But it shows quite clear the reason for WD resets is located in the WiFi part, I assume inside the core lib.

Have a nice weekend everynone.
Regards
Shardan

TD-er
Core team member
Posts: 1360
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: Hardware Watchdog Reboots

#103 Post by TD-er » 29 Jul 2019, 10:31

Last night (it was past 2 am) I did make a test build: https://www.dropbox.com/s/uknq183mlb8yx ... 4.zip?dl=0
Can some of you please test these to see if this is:
- Connecting to WiFi (was an issue in last few builds)
- Crashing when disconnected from WiFi (power down AP, or in some AP's you can force disconnect some clients)

In my tests here, the units did not crash and did a reconnect like they should.
I tested several (4M) builds here and all do connect to WiFi and reconnect like they should.
But still it is best not to install it on a node which needs removing ceilings or parts of a wall to reach for an USB update :)

One last note.
All (except one) of my nodes I updated last night are still running, even after 10+ WiFi disconnects.
The one that did reboot did so on a WDT reboot, but I think that one may have other issues (using pulse counter).

If this one does seem to work, then you will make me very happy since it is the end of about a year of debugging.
Still a bit skeptical here since it took so long, but I was very pleased to see it did do a WiFi reconnect like it should.
That's a significant part of the WDT reboots.

Wiki
Normal user
Posts: 127
Joined: 23 Apr 2018, 17:55

Re: Hardware Watchdog Reboots

#104 Post by Wiki » 29 Jul 2019, 13:54

I'd like to make you happy.

At my site a Wemos D1 China clone is waiting to be tested. Any suggestions/wishes, which of the 8266-4M files to use, any suggestions/wishes of specialized Wifi-configurations (i.e. fixed IP, reconnect, Wifi b/g,....)?

TD-er
Core team member
Posts: 1360
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: Hardware Watchdog Reboots

#105 Post by TD-er » 29 Jul 2019, 14:05

Wiki wrote:
29 Jul 2019, 13:54
I'd like to make you happy.

At my site a Wemos D1 China clone is waiting to be tested. Any suggestions/wishes, which of the 8266-4M files to use, any suggestions/wishes of specialized Wifi-configurations (i.e. fixed IP, reconnect, Wifi b/g,....)?
Just start with the "normal" 4M build.
There is also a "custom 4M" build included, which has just the most basic plugins available (the ones I use in my own nodes ;) ) See: https://github.com/letscontrolit/ESPEas ... _script.py
P.S. this Python script will be extended in the future and you may also use it yourself to quickly make a special build for yourself.

Just start with DHCP as IP and default settings.
I also did change some stuff related to the start and stop of the AP mode on new nodes.
Please also report issues with that part.
I did a revert of most code, but took that part again from the changes I made last few weeks. Just hope I did not forget any part of it last night.

Wiki
Normal user
Posts: 127
Joined: 23 Apr 2018, 17:55

Re: Hardware Watchdog Reboots

#106 Post by Wiki » 29 Jul 2019, 16:51

I have setup the deivce from scratch (means: blanked with blank_4MB.bin), flashed ESP_Easy_mega-20190716-15-PR_2514_normal_ESP8266_4M.bin

Changes in Setup: activated UTP port 62888, added INA219 & DS18B20 as devices, publishing every 10 sec, switched on the onboard LED, set serial log to info, added NTP server, added syslog server (syslog level debug), MQTT to Domoticz

I've put the device to a location with very poor Wifi. As a first result I am sending the syslog entries from my server of this device, see attached file. As you can see, the device doesn't stay up more than some minutes.

Please let mr know if you need more / other logs / infos, or different configurations. I am prepared.
Attachments
ESPTest.zip
(6.06 KiB) Downloaded 8 times

Wiki
Normal user
Posts: 127
Joined: 23 Apr 2018, 17:55

Re: Hardware Watchdog Reboots

#107 Post by Wiki » 29 Jul 2019, 16:57

Additional info:

After flashing I went through the standard procedure of connecting to the device through the AP functionality and adding it to my Wifi using the standard setup procedure using the Web GUI (in an environment of normal Wifi conditions, not the hard conditions where it is running now). Connection and configuration worked flawlessly with optimum performance.

Wiki
Normal user
Posts: 127
Joined: 23 Apr 2018, 17:55

Re: Hardware Watchdog Reboots

#108 Post by Wiki » 29 Jul 2019, 17:50

I doublechecked the functionality of the device by just moving it to a location with reasonable Wifi. I reduced the syslog level to info due to the huge amount of data which would be produced using debug level. Syslog attached.
Attachments
ESPTest.1.zip
(7.55 KiB) Downloaded 12 times

Shardan
Normal user
Posts: 1105
Joined: 03 Sep 2016, 23:27
Location: Bielefeld / Germany

Re: Hardware Watchdog Reboots

#109 Post by Shardan » 29 Jul 2019, 19:12

I did some testings as described here in the thread.

Moved all ESP devices to a separate access point, set wifi to run permanently etc., see above.

My test devices ran for about 14 days without a problem, then I got WD restarts on all three devices.

As far as I can say there is at least one problem with the core libs newer then 2.3.0.
If WiFi reconnects they tend to WD reset. So a first workaro8nud is to set "Force WiFi No Sleep" in the advanced settings.

But even a restart after 14 days isn't the optimum. It seems something is piling up and causes a restart after a time.
Regards
Shardan

Wiki
Normal user
Posts: 127
Joined: 23 Apr 2018, 17:55

Re: Hardware Watchdog Reboots

#110 Post by Wiki » 29 Jul 2019, 19:35

And I will give you a prophecy:

From now on these three devices will be more unstable than before if you let them run without reset or powercycle....

User avatar
grovkillen
Core team member
Posts: 3148
Joined: 19 Jan 2017, 12:56
Location: Hudiksvall, Sweden
Contact:

Re: Hardware Watchdog Reboots

#111 Post by grovkillen » 29 Jul 2019, 21:11

Are you running the release that TD-er linked?
ESP Easy Flasher [flash tool and wifi setup at flash time]
ESP Easy Webdumper [easy screendumping of your units]
ESP Easy Netscan [find units]
Official shop: https://firstbyte.shop/
Sponsor ESP Easy, we need you :idea: :idea: :idea:

Wiki
Normal user
Posts: 127
Joined: 23 Apr 2018, 17:55

Re: Hardware Watchdog Reboots

#112 Post by Wiki » 29 Jul 2019, 22:08

Whom do you ask?

TD-er
Core team member
Posts: 1360
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: Hardware Watchdog Reboots

#113 Post by TD-er » 30 Jul 2019, 12:24

Wiki is using the test build I made as far as I can see in his posts.
Shardan is talking about an uptime of over 14 days, so that's definitely not the test build :)

@Shardan, could you also try the test build? (or last night's build)
Just make sure not to use static IP, since that's already reported in an issue on GitHub.

The last build should be much more stable when it comes to a WiFi reconnect.
But still there are other issues at stake here since some of my nodes already rebooted running the test build.

frank
Normal user
Posts: 85
Joined: 15 Oct 2016, 20:17
Location: Nederland

Re: Hardware Watchdog Reboots

#114 Post by frank » 06 Aug 2019, 20:23

mega-20190805 norrmal 4mb
I do not know if the recent update includes the correction but since the installation no watchdog reboots and 1 day uptime. :) :) :)

User avatar
dynamicdave
Normal user
Posts: 158
Joined: 30 Jan 2017, 20:25
Location: Hampshire, UK

Re: Hardware Watchdog Reboots

#115 Post by dynamicdave » 06 Aug 2019, 20:28

Same here... +1day uptime with mega-20190805 norrmal 4Mb running on a Wemos D1 Mini.

User avatar
ThomasB
Normal user
Posts: 385
Joined: 17 Jun 2018, 20:41
Location: USA

Re: Hardware Watchdog Reboots

#116 Post by ThomasB » 07 Aug 2019, 00:18

The user feedback sounds promising. So today I loaded ESP_Easy_mega-20190805_normal_ESP8266_1M.bin on a Sonoff Basic to try it out. Hopefully its goodness spreads my way too.

This device was running a late June build (mega-20190630_normal). I'm looking forward to seeing less reboots on this fresh August release.

- Thomas

User avatar
ThomasB
Normal user
Posts: 385
Joined: 17 Jun 2018, 20:41
Location: USA

Re: Hardware Watchdog Reboots

#117 Post by ThomasB » 07 Aug 2019, 22:11

As mentioned in the previous post, mega-20190805 was installed on my Sonoff basic yesterday. Today it rebooted (after running 24 hrs, 15 mins). System info says:

Code: Select all

Boot: Manual reboot (1) 
Reset Reason:	Exception
Rebooting approximately once a day is what this device experienced with the June build it was previously using. I'll continue to monitor the new firmware's performance so I can make a better judgement on the uptime reliability of my test installation.

- Thomas

User avatar
ManS-H
Normal user
Posts: 232
Joined: 27 Dec 2015, 11:26
Location: the Netherlands

Re: Hardware Watchdog Reboots

#118 Post by ManS-H » 07 Aug 2019, 22:23

Hello,
I dont not really how the Watchdog works.
This is my firmware:
'
Watchdog-1.jpg
Watchdog-1.jpg (41.37 KiB) Viewed 517 times
And this is what i see:
Watchdog-2.jpg
Watchdog-2.jpg (29.73 KiB) Viewed 517 times
My question, is this a normal situation?

User avatar
ThomasB
Normal user
Posts: 385
Joined: 17 Jun 2018, 20:41
Location: USA

Re: Hardware Watchdog Reboots

#119 Post by ThomasB » 08 Aug 2019, 04:21

My question, is this a normal situation?
Nice to see the 97 day uptime on your old June-2018 core_2_4_1 build. With the 2.4.1 core, and up to the mid-2018 time frame, reliable run times were achieved by most users. I fondly remember those good old days.

- Thomas

frank
Normal user
Posts: 85
Joined: 15 Oct 2016, 20:17
Location: Nederland

Re: Hardware Watchdog Reboots

#120 Post by frank » 08 Aug 2019, 10:04

the wd boots are back :( :(

It looks like the lower the wifi signal the more reboots there are
Last edited by frank on 08 Aug 2019, 10:17, edited 1 time in total.

User avatar
ManS-H
Normal user
Posts: 232
Joined: 27 Dec 2015, 11:26
Location: the Netherlands

Re: Hardware Watchdog Reboots

#121 Post by ManS-H » 08 Aug 2019, 10:15

ThomasB wrote:
08 Aug 2019, 04:21
My question, is this a normal situation?
Nice to see the 97 day uptime on your old June-2018 core_2_4_1 build. With the 2.4.1 core, and up to the mid-2018 time frame, reliable run times were achieved by most users. I fondly remember those good old days.

- Thomas
Thanks for the reply. I used this version with a Sonoff for switch on a table light. Then i keep this version for the work it did.

User avatar
dynamicdave
Normal user
Posts: 158
Joined: 30 Jan 2017, 20:25
Location: Hampshire, UK

Re: Hardware Watchdog Reboots

#122 Post by dynamicdave » 08 Aug 2019, 17:51

Quick update...

+3days uptime with mega-20190805 norrmal 4Mb running on a Wemos D1 Mini.

Fingers crossed this will be huge step forward for mankind.

TD-er
Core team member
Posts: 1360
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: Hardware Watchdog Reboots

#123 Post by TD-er » 08 Aug 2019, 21:08

4 of my boards haven't been rebooted since I installed the firmware on them with the supposed fix for the WDT reboots.

Uptime: 10 days 18 hours 23 minutes
Build Time:⋄ Jul 29 2019 01:34:03
Binary Filename:⋄ ESP_Easy_mega-20190716-15-PR_2514_normal_core_252_ESP8266_4M.bin

The test builds have the last official nightly timestamp in the filename, it is a bit confusing....

Well, the bugfix was specific for the WDT reboots occurring when reconnecting to WiFi.
There are still others and for example a reboot with Exception as reboot reason is a totally different one.
For example it can be a stupid programming error (divide by zero for example) or trying to dereference an object which has already been deleted.
But also out of memory can be an issue and numerous other reasons.

One of my own nodes still showed a WDT reboot, but I am sure it must have been another reason since I could not find a network reconnect in the logs.
Another one also has had some reboots, but that one is using the pulse counter and that one has a known bug which I still have to fix.

User avatar
ThomasB
Normal user
Posts: 385
Joined: 17 Jun 2018, 20:41
Location: USA

Re: Hardware Watchdog Reboots

#124 Post by ThomasB » 08 Aug 2019, 22:38

There are still others and for example a reboot with Exception as reboot reason is a totally different one.
For example it can be a stupid programming error (divide by zero for example) or trying to dereference an object which has already been deleted.
But also out of memory can be an issue and numerous other reasons.
Thanks, made me check the installation again to review my configuration. And I found something unusual, the System Info Plugin is corrupted. Screenshots below show that SYSINFO is not fully initialized (missing Values). BTW, I wasn't using the missing Values in rules and wasn't sending them to a controller. In case you ask, I cleared the memory before flashing ESP_Easy_mega-20190805_normal_ESP8266_1M.bin.

Cold rebooting did not fix it. But deleting the plugin and re-installing it did the trick. Now I have my four System Info Values being reported on the device page.

Maybe this crippled plugin was causing the exception reboots I've been getting on the mega-20190805. Fingers are crossed.

- Thomas
Attachments
esp_sysinfo.jpg
esp_sysinfo.jpg (116.58 KiB) Viewed 441 times
esp_devices.jpg
esp_devices.jpg (84.64 KiB) Viewed 441 times

User avatar
ThomasB
Normal user
Posts: 385
Joined: 17 Jun 2018, 20:41
Location: USA

Re: Hardware Watchdog Reboots

#125 Post by ThomasB » 11 Aug 2019, 18:15

Update: Fixing the System Info plugin seems to have eliminated the Exception reboots. But they have been replaced with Watchdog reboots. So I didn't win the reboot lottery, but at least know where the exceptions came from.

- Thomas

DebugBug
Normal user
Posts: 2
Joined: 11 Feb 2019, 21:47

Re: Hardware Watchdog Reboots

#126 Post by DebugBug » 21 Aug 2019, 21:35

Any new updates on this topic? My devices keep rebooting with a maximum uptime of 1-2 days, running the latest releases.

User avatar
dynamicdave
Normal user
Posts: 158
Joined: 30 Jan 2017, 20:25
Location: Hampshire, UK

Re: Hardware Watchdog Reboots

#127 Post by dynamicdave » 22 Aug 2019, 09:15

Hi,
I've been running a Wemos D1 Mini for just over 5-days with the latest firmware and have not had any re-boots (yippee).

ScreenShot073.png
ScreenShot073.png (20.37 KiB) Viewed 73 times

TD-er
Core team member
Posts: 1360
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: Hardware Watchdog Reboots

#128 Post by TD-er » 22 Aug 2019, 10:36

Just a question then for those with the modules up for a few days already.
I've noticed some of my modules have AP mode still enabled (not turned off), do you also see your nodes as SSIDs in the WiFi network?

It seems like there is still an issue with not receiving the "got IP" event in the WiFi, which is the trigger to turn off the AP. (and turn it on also)
So I have to look into that for sure, but I was wondering if it is something with the WiFi here, or maybe others also experience this.

DebugBug
Normal user
Posts: 2
Joined: 11 Feb 2019, 21:47

Re: Hardware Watchdog Reboots

#129 Post by DebugBug » 22 Aug 2019, 18:02

I'm not seeing an SSID for my modules. They have all connected successfully to my wifi and switched off the SSID.
I normally get 0-2 days uptime on my modules between reboots, but the 20190817 release is currently on 4+ days, however the change log does not really show any changes that would explain why? Will keep monitoring the uptime.

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests