mega-20180914: Manual reboot (75), for what reason? Log settings ?

Moderators: grovkillen, Stuntteam, TD-er

Post Reply
Message
Author
Domosapiens
Normal user
Posts: 307
Joined: 06 Nov 2016, 13:45

mega-20180914: Manual reboot (75), for what reason? Log settings ?

#1 Post by Domosapiens » 20 Sep 2018, 21:37

On the breadboard:
Running GIT version mega-20180914 Normal on Wemos D1 mini.
Load ~30 %, but a number of times 100% is reported to Domoticz.
Tasks: Generic pulse counter (~20Hz), SSD1306 OLED, 2# DS18B20 (per 30sec.), Dummy Device, System Info Uptime.
GPIO 13 as PWM output.

Objective:
1: Measure Temperature difference, Measure the waterflow pulse counter, Calculate kW Energy.
2: Use Temperature difference for PWM Pump control within the specific PWM Pump bounderies (0% is MAX, 100% is OFF)

I observed sometimes/moretimes NAN values from DS18B20.
How are NAN values handled in the Rules, in a subtraction?

To block them I use in the Rules:

Code: Select all

On TempH#TemperatureH Do                                        // Prevent NAN values in calculation
If [TempL#TemperatureL]>1 And [TempH#TemperatureH]>1
TaskValueSet 7,1, [TempH#TemperatureH]-[TempL#TemperatureL]     //Task DeltaT var1 Delta_T contains difference
EndIf
EndOn
Is this safe? Or can there be an interrupt/update of TempL#TemperatureL between the IF statement and the TaskValueSet, so a subtraction with NAN??



I observed some 1-2 daily reboots, Reset Reason: Hardware Watchdog.
So I played around with the extensive Syslog Level & Syslog Facility possibility.
With Level Debug Dev, I ended-up with Manual reboot (75) in a few hours.
In the captured log file I can not find any reason for those reboots.
So, the more advanced Log level causes more reboots??

The Wiki has no clue or explanation (yet) for Syslog Level & Syslog Facility.

In the Log I can find:
Sep 20 14:27:55 2018;192.168.1.83; <13>MUC13_Flow2 EspEasy: WD : Uptime 13 ConnectFailures 0 FreeMem 17616
.
400 lines of logging and then
.
Sep 20 14:29:11 2018;192.168.1.83; <13>MUC13_Flow2 EspEasy: WD : Uptime 1 ConnectFailures 0 FreeMem 17936

So it happens in-between!
What is the logic behind the Hardware Watchdog?
What is the advised Log level for this problem?
A search in the 400 lines for "error" , "reboot", "exception" gives no result.
Any pointers to search for?


How to share this 400 lines Log file?
[Spoiler]text [/Spoiler]
[spoiler]text [/spoiler]
[spoil] text [/spoil]
[Spoil] text [/Spoil]
does not work
30+ ESP units for production and test. Ranging from control of heating equipment, flow sensing, floor temp sensing, energy calculation, floor thermostat, water usage, to an interactive "fun box" for my grandson. Mainly Wemos D1.

User avatar
ThomasB
Normal user
Posts: 1064
Joined: 17 Jun 2018, 20:41
Location: USA

Re: mega-20180914: Manual reboot (75), for what reason? Log settings ?

#2 Post by ThomasB » 22 Sep 2018, 18:18

I observed some 1-2 daily reboots, Reset Reason: Hardware Watchdog.
So I played around with the extensive Syslog Level & Syslog Facility possibility.
With Level Debug Dev, I ended-up with Manual reboot (75) in a few hours.
In the captured log file I can not find any reason for those reboots.
So, the more advanced Log level causes more reboots??
The reboot issue has been experienced by many users. The developers are trying to fix it.

It is not widely known that using serial log can increase the chance of reboots due to serial's code blocking (code blocking causes watchdog reset). The reboots will increase with the advanced log messages (too many messages flood the buffer and cause more code blocking). The most recent release introduced a log message "trim" function to help reduce the flood related reboots.

I do not know if this issue also affects Syslog. Does not seem likely, but anything is possible. BTW, I doubt that Syslog can send the details of a reboot to your syslog server since the WiFi connection is lost during a reboot. So perhaps you mean you are using serial log, which as I mentioned can increase the chance of more reboots.

Long story short, random reboots have been experienced by many (but not all) ESPEasy users. The developers have been working on it for weeks. It has been a difficult problem to solve.

- Thomas

User avatar
grovkillen
Core team member
Posts: 3621
Joined: 19 Jan 2017, 12:56
Location: Hudiksvall, Sweden
Contact:

Re: mega-20180914: Manual reboot (75), for what reason? Log settings ?

#3 Post by grovkillen » 22 Sep 2018, 18:35

Latest release should work pretty well. We'd like as much feedback as possible regarding reboot issues.
ESP Easy Flasher [flash tool and wifi setup at flash time]
ESP Easy Webdumper [easy screendumping of your units]
ESP Easy Netscan [find units]
Official shop: https://firstbyte.shop/
Sponsor ESP Easy, we need you :idea: :idea: :idea:

User avatar
ThomasB
Normal user
Posts: 1064
Joined: 17 Jun 2018, 20:41
Location: USA

Re: mega-20180914: Manual reboot (75), for what reason? Log settings ?

#4 Post by ThomasB » 22 Sep 2018, 20:11

Latest release should work pretty well. We'd like as much feedback as possible regarding reboot issues.
Thanks for mentioning the new release. I'm currently running it to see if it solved the reboot problem. Where do we post feedback on it?

- Thomas

User avatar
grovkillen
Core team member
Posts: 3621
Joined: 19 Jan 2017, 12:56
Location: Hudiksvall, Sweden
Contact:

Re: mega-20180914: Manual reboot (75), for what reason? Log settings ?

#5 Post by grovkillen » 22 Sep 2018, 21:00

Please post (informative) feedback on GitHub.
ESP Easy Flasher [flash tool and wifi setup at flash time]
ESP Easy Webdumper [easy screendumping of your units]
ESP Easy Netscan [find units]
Official shop: https://firstbyte.shop/
Sponsor ESP Easy, we need you :idea: :idea: :idea:

Domosapiens
Normal user
Posts: 307
Joined: 06 Nov 2016, 13:45

Re: mega-20180914: Manual reboot (75), for what reason? Log settings ?

#6 Post by Domosapiens » 23 Sep 2018, 00:56

The mentioned mega-20180914 runs now for 1 day 4 hours with a low intensive log level.
So indeed, the log level influences the number of reboots.

I see a number of times the DS18B20 error from one of the two sensors:
DS : Temperature: Error! (28-ff-f8-31-b1-16-5-79)
or
DS : Temperature: Error! (28-ff-e5-33-b1-16-5-ee)
But I can not relate this to any reboot, so just a read error.
While searching for the reason for this error I increased the log level,
resulting in more reboots, and cluttering the real error cause.

Currently running with Syslog Level: Error and Syslog Facility: Kernel,
I only get 18* Timeout while reading input data!
Based on older posts this seems to be a HTTP timeout:
Martinus: This time out message comes from a function that waits for a webserver reply, so it's not related to a local gpio or the DS sensor.
here viewtopic.php?t=3265#p17156
Also cluttering the error cause?


Back to the Effect-Cause-Effect questions:

- I see more than ever before NAN on the display for 2# DS18B20 (compared to ESPEasy_v2.0.0-dev11)
- Is this related to the improved timing ? (ESPEasy_v2.0.0-dev11 versus mega-20180522 and mega-20180914)
- Do NAN values trigger the Rules system?
- What is the advised (documented?) usage of Syslog Level and Syslog Facility?
- Yes, I follow daily the Github discussions
30+ ESP units for production and test. Ranging from control of heating equipment, flow sensing, floor temp sensing, energy calculation, floor thermostat, water usage, to an interactive "fun box" for my grandson. Mainly Wemos D1.

Post Reply

Who is online

Users browsing this forum: No registered users and 22 guests