Page 1 of 1

mega-20180914: Manual reboot (75), for what reason? Log settings ?

Posted: 20 Sep 2018, 21:37
by Domosapiens
On the breadboard:
Running GIT version mega-20180914 Normal on Wemos D1 mini.
Load ~30 %, but a number of times 100% is reported to Domoticz.
Tasks: Generic pulse counter (~20Hz), SSD1306 OLED, 2# DS18B20 (per 30sec.), Dummy Device, System Info Uptime.
GPIO 13 as PWM output.

Objective:
1: Measure Temperature difference, Measure the waterflow pulse counter, Calculate kW Energy.
2: Use Temperature difference for PWM Pump control within the specific PWM Pump bounderies (0% is MAX, 100% is OFF)

I observed sometimes/moretimes NAN values from DS18B20.
How are NAN values handled in the Rules, in a subtraction?

To block them I use in the Rules:

Code: Select all

On TempH#TemperatureH Do                                        // Prevent NAN values in calculation
If [TempL#TemperatureL]>1 And [TempH#TemperatureH]>1
TaskValueSet 7,1, [TempH#TemperatureH]-[TempL#TemperatureL]     //Task DeltaT var1 Delta_T contains difference
EndIf
EndOn
Is this safe? Or can there be an interrupt/update of TempL#TemperatureL between the IF statement and the TaskValueSet, so a subtraction with NAN??



I observed some 1-2 daily reboots, Reset Reason: Hardware Watchdog.
So I played around with the extensive Syslog Level & Syslog Facility possibility.
With Level Debug Dev, I ended-up with Manual reboot (75) in a few hours.
In the captured log file I can not find any reason for those reboots.
So, the more advanced Log level causes more reboots??

The Wiki has no clue or explanation (yet) for Syslog Level & Syslog Facility.

In the Log I can find:
Sep 20 14:27:55 2018;192.168.1.83; <13>MUC13_Flow2 EspEasy: WD : Uptime 13 ConnectFailures 0 FreeMem 17616
.
400 lines of logging and then
.
Sep 20 14:29:11 2018;192.168.1.83; <13>MUC13_Flow2 EspEasy: WD : Uptime 1 ConnectFailures 0 FreeMem 17936

So it happens in-between!
What is the logic behind the Hardware Watchdog?
What is the advised Log level for this problem?
A search in the 400 lines for "error" , "reboot", "exception" gives no result.
Any pointers to search for?


How to share this 400 lines Log file?
[Spoiler]text [/Spoiler]
[spoiler]text [/spoiler]
[spoil] text [/spoil]
[Spoil] text [/Spoil]
does not work

Re: mega-20180914: Manual reboot (75), for what reason? Log settings ?

Posted: 22 Sep 2018, 18:18
by ThomasB
I observed some 1-2 daily reboots, Reset Reason: Hardware Watchdog.
So I played around with the extensive Syslog Level & Syslog Facility possibility.
With Level Debug Dev, I ended-up with Manual reboot (75) in a few hours.
In the captured log file I can not find any reason for those reboots.
So, the more advanced Log level causes more reboots??
The reboot issue has been experienced by many users. The developers are trying to fix it.

It is not widely known that using serial log can increase the chance of reboots due to serial's code blocking (code blocking causes watchdog reset). The reboots will increase with the advanced log messages (too many messages flood the buffer and cause more code blocking). The most recent release introduced a log message "trim" function to help reduce the flood related reboots.

I do not know if this issue also affects Syslog. Does not seem likely, but anything is possible. BTW, I doubt that Syslog can send the details of a reboot to your syslog server since the WiFi connection is lost during a reboot. So perhaps you mean you are using serial log, which as I mentioned can increase the chance of more reboots.

Long story short, random reboots have been experienced by many (but not all) ESPEasy users. The developers have been working on it for weeks. It has been a difficult problem to solve.

- Thomas

Re: mega-20180914: Manual reboot (75), for what reason? Log settings ?

Posted: 22 Sep 2018, 18:35
by grovkillen
Latest release should work pretty well. We'd like as much feedback as possible regarding reboot issues.

Re: mega-20180914: Manual reboot (75), for what reason? Log settings ?

Posted: 22 Sep 2018, 20:11
by ThomasB
Latest release should work pretty well. We'd like as much feedback as possible regarding reboot issues.
Thanks for mentioning the new release. I'm currently running it to see if it solved the reboot problem. Where do we post feedback on it?

- Thomas

Re: mega-20180914: Manual reboot (75), for what reason? Log settings ?

Posted: 22 Sep 2018, 21:00
by grovkillen
Please post (informative) feedback on GitHub.

Re: mega-20180914: Manual reboot (75), for what reason? Log settings ?

Posted: 23 Sep 2018, 00:56
by Domosapiens
The mentioned mega-20180914 runs now for 1 day 4 hours with a low intensive log level.
So indeed, the log level influences the number of reboots.

I see a number of times the DS18B20 error from one of the two sensors:
DS : Temperature: Error! (28-ff-f8-31-b1-16-5-79)
or
DS : Temperature: Error! (28-ff-e5-33-b1-16-5-ee)
But I can not relate this to any reboot, so just a read error.
While searching for the reason for this error I increased the log level,
resulting in more reboots, and cluttering the real error cause.

Currently running with Syslog Level: Error and Syslog Facility: Kernel,
I only get 18* Timeout while reading input data!
Based on older posts this seems to be a HTTP timeout:
Martinus: This time out message comes from a function that waits for a webserver reply, so it's not related to a local gpio or the DS sensor.
here viewtopic.php?t=3265#p17156
Also cluttering the error cause?


Back to the Effect-Cause-Effect questions:

- I see more than ever before NAN on the display for 2# DS18B20 (compared to ESPEasy_v2.0.0-dev11)
- Is this related to the improved timing ? (ESPEasy_v2.0.0-dev11 versus mega-20180522 and mega-20180914)
- Do NAN values trigger the Rules system?
- What is the advised (documented?) usage of Syslog Level and Syslog Facility?
- Yes, I follow daily the Github discussions