150 days online

Moderators: grovkillen, Stuntteam, TD-er

Post Reply
Message
Author
stefbo
Normal user
Posts: 18
Joined: 24 Apr 2016, 15:35
Location: Germany

150 days online

#1 Post by stefbo » 05 Feb 2020, 22:28

Today one of my devices (ESP12E) made the 150days online!
2020-02-05 22_16_42-esp12-03.png
2020-02-05 22_16_42-esp12-03.png (24.55 KiB) Viewed 1679 times
SW is based on version from September with some smaller addtional changes (additional plugins, which is not used in this device).
I have 2 DS18b20 connected and send data every 30s via UDP to influxdb...
looking forward for the next 150days...

Stefan

User avatar
dynamicdave
Normal user
Posts: 200
Joined: 30 Jan 2017, 20:25
Location: Hampshire, UK

Re: 150 days online

#2 Post by dynamicdave » 06 Feb 2020, 07:50

Wooo - that must be a world record.

User avatar
grovkillen
Core team member
Posts: 3439
Joined: 19 Jan 2017, 12:56
Location: Hudiksvall, Sweden
Contact:

Re: 150 days online

#3 Post by grovkillen » 06 Feb 2020, 09:38

dynamicdave wrote:
06 Feb 2020, 07:50
Wooo - that must be a world record.
We have some reports well over 200 days ;)
ESP Easy Flasher [flash tool and wifi setup at flash time]
ESP Easy Webdumper [easy screendumping of your units]
ESP Easy Netscan [find units]
Official shop: https://firstbyte.shop/
Sponsor ESP Easy, we need you :idea: :idea: :idea:

TD-er
Core team member
Posts: 2117
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: 150 days online

#4 Post by TD-er » 06 Feb 2020, 14:35

grovkillen wrote:
06 Feb 2020, 09:38
dynamicdave wrote:
06 Feb 2020, 07:50
Wooo - that must be a world record.
We have some reports well over 200 days ;)
Not yet for those "September builds" ;)

User avatar
ThomasB
Normal user
Posts: 539
Joined: 17 Jun 2018, 20:41
Location: USA

Re: 150 days online

#5 Post by ThomasB » 06 Feb 2020, 20:22

Nice Uptime. Respect.

It seems to me that the long Uptimes are on ESPEasy Mega devices that don't use the OpenHAB MQTT plugin. I suspect that the OH plugin (or perhaps MQTT in general) is inviting some reboots.

A couple days ago I installed ESP_Easy_mega-20200204_test_ESP8266_4M1M_VCC.bin on a NodeMCU device. This release has another MQTT patch from TD-er. Hopefully this fix is the one that prevents the persistent reboots on my OpenHAB connected devices.

- Thomas

TD-er
Core team member
Posts: 2117
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: 150 days online

#6 Post by TD-er » 06 Feb 2020, 21:22

ThomasB wrote:
06 Feb 2020, 20:22
[...]
It seems to me that the long Uptimes are on ESPEasy Mega devices that don't use the OpenHAB MQTT plugin. I suspect that the OH plugin (or perhaps MQTT in general) is inviting some reboots.

A couple days ago I installed ESP_Easy_mega-20200204_test_ESP8266_4M1M_VCC.bin on a NodeMCU device. This release has another MQTT patch from TD-er. Hopefully this fix is the one that prevents the persistent reboots on my OpenHAB connected devices.
When I had to shut it off, the unit I had as a weather station outside had an uptime of over 100 days.
It was running Domoticz MQTT, so it should not be just MQTT.

In this house my units were performing horrible, which has a bit strange network configuration.
Modem => router (running DHCP) => WiFi bridge => WiFi => WiFi receiver => network.

The ESP nodes also connect to either one of both parts of the WiFi bridge.
So the access points I use also have other duties.
This may cause some delays, which is apparently good for letting ESPEasy get its daily reboot cycles.

So not sure what is the thing with OpenHAB MQTT, compared to Domoticz MQTT.
Are the strings much longer? (running out of memory) or is it something else?
One thing that is clear to me, is that handling unexpected delays on the network is not the strongest feature of ESPEasy.

User avatar
ThomasB
Normal user
Posts: 539
Joined: 17 Jun 2018, 20:41
Location: USA

Re: 150 days online

#7 Post by ThomasB » 06 Feb 2020, 22:08

Are the strings much longer? (running out of memory) ...?
Maybe? My rules send MQTT strings like this example:

Code: Select all

Publish %sysname%/STATEVAR/rlyval,[STATEVAR#rlyval]
Plus some plugin's send data using the device controller function.
... or is it something else?
Perhaps it has something to do with the number of MQTT enabled devices and/or the quantity of messages? I have a several devices that send/receive MQTT data, usually a couple times a minute for each device plugin. So the MQTT broker is rarely idle.
One thing that is clear to me, is that handling unexpected delays on the network is not the strongest feature of ESPEasy.
Agreed. When I reboot my OpenHAB/broker system after maintenance activity, I see many ESPEasy devices soon do a reboot on their own (caused by the temporarily absent broker). I've experimented with ESPEasy's Connection Failure Threshold which hasn't improved stability during broker reboots (currently using 25).
Also, I've witnessed random reboots from "stable" devices within minutes of manually rebooting some other ESPEasy device that's on the network. But it's random, so could just be coincidence.

BTW, all of my ESPEasy devices send me an email whenever they reboot. The email includes some basic info, but nothing that helps debug the problem. It would be nice if there was a way to include all the data shown on /sysinfo and anything else that might provide breadcrumbs to the problem.

- Thomas

stefbo
Normal user
Posts: 18
Joined: 24 Apr 2016, 15:35
Location: Germany

Re: 150 days online

#8 Post by stefbo » 06 Feb 2020, 23:07

I think it is not only the SW. I had several units with same or similar SW-Version that behaved completely different (reboot every 2 or 3 days) I continuously update SW without becomming better. Must be HW issue.
This one (150d) has a good power supply (buffered with LiPo). Which definitely helps ;-)

Stefan

User avatar
ThomasB
Normal user
Posts: 539
Joined: 17 Jun 2018, 20:41
Location: USA

Re: 150 days online

#9 Post by ThomasB » 07 Feb 2020, 04:26

I think it is not only the SW. I had several units with same or similar SW-Version that behaved completely different (reboot every 2 or 3 days) I continuously update SW without becomming better.
I agree. Hardware problems can haunt some ESPEasy installations.

I tend to see different reboot behavior when I install the various Mega software releases. For that reason I believe the firmware is involved. TD-er has made progress and the reboots are getting less frequent. But the 150+ day Uptime is still outside my reach.

- Thomas

Domosapiens
Normal user
Posts: 301
Joined: 06 Nov 2016, 13:45

Re: 150 days online

#10 Post by Domosapiens » 08 Feb 2020, 00:39

Great experience here with mega-20191116 Core260 running on 13 units.
Finally able to optimize/debug my application aspects.
Thumbs up for TD-er and the team.
30+ ESP units for production and test. Ranging from control of heating equipment, flow sensing, floor temp sensing, energy calculation, floor thermostat, water usage, to an interactive "fun box" for my grandson. Mainly Wemos D1.

Wiki
Normal user
Posts: 149
Joined: 23 Apr 2018, 17:55

Re: 150 days online

#11 Post by Wiki » 08 Feb 2020, 01:23

Using
08021.JPG
08021.JPG (39.53 KiB) Viewed 1348 times
the recording graph of uptime in minutes changed dramatically:
08022.JPG
08022.JPG (44.05 KiB) Viewed 1348 times
This is only one example of several Wemos D1 with similar graphs containing the uptime in minutes.

So for me is the conclusion: Hardware Watchdog Timeout reboots are history.....

Thanks to the team, and special thanks @TD-er. Thumbs up.

TD-er
Core team member
Posts: 2117
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: 150 days online

#12 Post by TD-er » 08 Feb 2020, 11:16

And also thanks for keep trying to help fix these issues.

HomeJCL
Normal user
Posts: 69
Joined: 03 Feb 2018, 10:42

Re: 150 days online

#13 Post by HomeJCL » 08 Feb 2020, 20:40

Winter time is not the best to beat records

Lots of time on hand so it testing for me

Last time had 97 days, could have been longer but the 20200204 update itched too much to let it go untested 😀👍
Belgium and land of ESP ... counting :D

TD-er
Core team member
Posts: 2117
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: 150 days online

#14 Post by TD-er » 08 Feb 2020, 22:28

HomeJCL wrote:
08 Feb 2020, 20:40
Winter time is not the best to beat records

Lots of time on hand so it testing for me

Last time had 97 days, could have been longer but the 20200204 update itched too much to let it go untested 😀👍
So I should have waited for 3 more days to merge it? :)

HomeJCL
Normal user
Posts: 69
Joined: 03 Feb 2018, 10:42

Re: 150 days online

#15 Post by HomeJCL » 09 Feb 2020, 14:23

Arrrggghhh I’ll want to .... :lol:

To be honest, never really checked it before.

But greenhouse time is starting the coming weeks, if weather stays as is.

Usually, well after one too reckless update, from March to September I do not touch the most important setup, for me.

And I think, must have beaten the 150 mark at least once.

PS : Once I tried mid season updates and flooded the greenhouse. Lessons learned : winter time = test time, spring-summer is “production time” :D
Belgium and land of ESP ... counting :D

User avatar
ThomasB
Normal user
Posts: 539
Joined: 17 Jun 2018, 20:41
Location: USA

Re: 150 days online

#16 Post by ThomasB » 13 Feb 2020, 20:39

In my quest to fix the random reboots I have been validating releases of ESPEasy firmware since last fall. The latest testing on a NodeMCU device with the new ESP_Easy_mega-20200204_test_ESP8266_4M1M_VCC.bin shows good stability when it is in close proximity to the WiFi router.

But reboots continue to occur when it is at the preferred location. The router is on the first floor and the device on second floor (wood construction), total 10 meters distance. ESPEasy reports RSSI is about -74dBm, which is reasonable. But I suspect random RF interference is occasionally blocking access from the ESPEasy device, causing the reboots. Just my theory.

So I need some advice on how to configure ESPEasy's settings so that it better tolerates lost or weak WiFi signals. In case it matters, I use static IP. Below are the current settings on the Tools=>Advanced page.
advanced.png
advanced.png (122.15 KiB) Viewed 976 times
- Thomas

HomeJCL
Normal user
Posts: 69
Joined: 03 Feb 2018, 10:42

Re: 150 days online

#17 Post by HomeJCL » 13 Feb 2020, 22:29

Well, on checking My exterior nodes, I see 1 reconnect since 4/02, but then I rebooted my router :oops:

RSS is also at 74.

I am fixed IP and use my NAS for NTP.

For exterior nodes I only use genuine Mini Pro, all old ones, Rev 1, with external aerial (more psychological than real advantage, although minimal ...)

I replaced the power supply in the cabinet with a big Meanwell, with a good buffering, since, except flaky releases (well documented) everything seems fine.
Belgium and land of ESP ... counting :D

TD-er
Core team member
Posts: 2117
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: 150 days online

#18 Post by TD-er » 14 Feb 2020, 11:41

The orientation of the node may also have an impact.
For example, most antennas for WiFi have a vertical polarization.
Meaning that it will cover an area in the horizontal plane around the antenna.

If the node does have its radiation pattern not in the same plane, you will loose a lot in your signal-noise-ratio, although the RSSI may appear similar.
RSSI is an indication of the power of the signal, but has little to do with the "quality" or "stability" of the signal.

The radiation pattern of an ESP is also not uniform for all modules. For example a PCB trace antenna (most common) has a different radiation pattern compared to those ceramic antenna's you also see on some nodes (often white rectangular shaped) and some also have metal "3D" antenna's.

The RSSI can show some positive effect when you orient the node to have its radiation plane parallel to the radiation plane of the access point.
But you have to monitor it a number of cycles, without being near the node yourself to see its real effect.

In the core library, I can select several "SDK versions" and the one I now have active is that of July 3rd. (PIO_FRAMEWORK_ARDUINO_ESPRESSIF_SDK22x_190703)
See here in the PlatformIO definitions:
https://github.com/letscontrolit/ESPEas ... i#L98-L139

You can try the difference between "SDK3" and "SDK2.x" and I can also make you a build based on later builds of the SDK.
It does seem to make a difference in stability for nodes in a certain range of reported RSSI values.
I chose this one as it is most stable in the RSSI values you normally see. (either very low, or very bad)
Yours is probably a bit inbetween. :(

Apart from that there is also a difference between nodes themselves.
Maybe you could exchange the node with just any other node and use the "unstable" one in a place with different RSSI.
Another thing you may want to try and what is maybe a bit counter intuitive, is to shield the node from WiFi.
The whole idea behind it is that if you have a lower RSSI, it will connect to the access point at a lower bit rate.
I can also add some selector in the settings to force a module to only accept lower bit rates.
A lower bit rate does make it less susceptible to changes in noise.
I have the feeling that's also what makes the difference between the "July" and "November" builds of the SDK, that its preset WiFI parameters do select different bit rates (and time-out values) based on the observed RSSI at connection time.
These time out values are not made available for me to set, but the bitrates we can advertise to the access point are.
Some more elaborate access points (not more expensive as the cheapest MikroTik ones also support) do allow to show the used bit rate.
Not sure if I can show it in the status page, but maybe that's also nice to have for debugging, or at least to make it look like we do know what we are talking about :)


What you can do is select "Periodical send Gratuitous ARP" and "Force WiFi No Sleep".
I think the first one makes the most sense to have and even if it doesn't help, it does also no harm.
Problem is that the ESP may sometimes miss a beacon interval of the access point so also ARP requests may be missed and then the switches in the network no longer know how to reach your node.
This may make the ESP wait forever on a reply it was expecting. (N.B. an access point is also a switch)
By sending out Gratuitous ARP, you will be the one in the room continuously giving answers to questions nobody asked, so it may feel a bit socially awkward, but we're not running a "social network" here, so don't think too much about it :)

Domosapiens
Normal user
Posts: 301
Joined: 06 Nov 2016, 13:45

Re: 150 days online

#19 Post by Domosapiens » 14 Feb 2020, 16:04

ThomasB wrote:
13 Feb 2020, 20:39
ESP_Easy_mega-20200204_test_ESP8266_4M1M_VCC.bin shows good stability when it is in close proximity to the WiFi router.
But reboots continue to occur when it is at the preferred location.
ESPEasy reports RSSI is about -74dBm, which is reasonable.
So I need some advice on how to configure ESPEasy's settings so that it better tolerates lost or weak WiFi signals. In case it matters, I use static IP. Below are the current settings on the Tools=>Advanced page.
Thomas,
I use DD-WRT on a TPLINK TL-WDR4300 router and Fixed IP
I'm very happy with mega-20191116 on 15 nodes and stopped updating there.
(to finally focus on application debugging)

The fixed IP bug introduced after mega-20191116 should be solved with ESP_Easy_mega-20200204
I see for mega-20200204a few remarks that could (??) induce new problems like you have:
-[WiFi] Improve reconnect management
-Revert "[WiFi] Improve reconnect management"
-[Controller] Optimization by reducing number of checks.



I use these 15 nodes with mega-20191116 with the following settings:
Controllers: minimum send 500ms, Max Queue 10, Max retries 10, delete oldest, ignore ack, Client time out 200msec.
Advanced: connection failure treshold 0 Force WiFi no sleep, send ARP.
But all.... RSSI in general better than -60dB

If your node is a producer with a high update rate, it keeps (?) the Wifi awake.
In that case Force WiFi No Sleep and ARP would not important.
But if your node is a consumer on low frequency, these 2 settings could help ... I hope.
30+ ESP units for production and test. Ranging from control of heating equipment, flow sensing, floor temp sensing, energy calculation, floor thermostat, water usage, to an interactive "fun box" for my grandson. Mainly Wemos D1.

User avatar
ThomasB
Normal user
Posts: 539
Joined: 17 Jun 2018, 20:41
Location: USA

Re: 150 days online

#20 Post by ThomasB » 14 Feb 2020, 17:38

Thanks everyone for the advice. Some comments:

1. The test board sends MQTT messages at ~10 second intervals. But I'll enable "Force WiFi no sleep" and "Periodical send Gratuitous ARP" anyways.

2. I set connection failure threshold to 25 thinking this would allow the node to be more tolerant. I had no idea that setting it to zero would be better for WiFi. So I'll try it.

3. The NodeMCU board has been oriented for maximum RSSI. So cross-polarization related RF attenuation is minimal. BTW, I was tempted to hack the board and add an external antenna (I have the RF design experience to do it correctly). But I want to leave it as-is so I can continue to validate the firmware releases using the same test conditions.

4. I changed the hardware for another NodeMCU a couple months ago (to determine if hardware was causing the reboots). Although I believe this is a firmware issue, I'll consider swapping the board again. FWIW, I've added additional decoupling and bulk caps to the ESP8266 module to ensure VDC is stable.

5. Power is via the USB connector. I Have tried many 5VDC USB supplies; currently testing with a 2A wall wart taken from a new Amazon Echo.

6. I'm using mega-20200204 because it has patched the static IP and Nextion Plugin bugs of the previous recent releases. But as noted by Domosapiens, maybe the latest wifi related updates have introduced another reboot bug.

- Thomas

TD-er
Core team member
Posts: 2117
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: 150 days online

#21 Post by TD-er » 14 Feb 2020, 23:43

2. I set connection failure threshold to 25 thinking this would allow the node to be more tolerant. I had no idea that setting it to zero would be better for WiFi. So I'll try it.
You know what this does????
If the ESP -for whatever reason- has exceeded this set threshold of failed connect attempts, it will issue an reboot.
So if a broker or what you try to connect is sometimes refusing a connection, then the ESP may sooner or later reboot.

User avatar
ThomasB
Normal user
Posts: 539
Joined: 17 Jun 2018, 20:41
Location: USA

Re: 150 days online

#22 Post by ThomasB » 15 Feb 2020, 00:36

You know what this does????
If the ESP -for whatever reason- has exceeded this set threshold of failed connect attempts, it will issue an reboot.
That is what I assumed it did. A few weeks ago I set it to 25 because that is a much larger threshold than zero and I was trying to stop the reboots. A bad mistake in retrospect since I just found that zero disables the connection failure reboot feature. D'oh!

Today I set the threshold to zero. It will be fantastic if this is the final tweak that stops my random reboots. I will report back in a few days. Spirits are high, expecting to pop the cork on expensive bubbly.

Thomas

User avatar
ThomasB
Normal user
Posts: 539
Joined: 17 Jun 2018, 20:41
Location: USA

Re: 150 days online

#23 Post by ThomasB » 20 Feb 2020, 21:30

My NodeMCU test device (with mega-20200204 firmware) has reached six days without a reboot. That's a record for it. So I'm popping the cork on the good bubbly; Have a glass on me.

The final piece of the puzzle was to set Tools=>Advanced=>Connection Failure Threshold to zero, which disables the feature. That simple change has made a world of difference. I highly recommend anyone using this feature do the same.

I have since learned that the Connection Failure Threshold feature has a bug that will eventually cause a reboot in a typical installation. Plus, despite its good intentions of forcing a reboot if a controller hangs, as-is it does not work to our benefit. Until the code is refactored I think it is best to avoid using it.

connectionFailure.jpg
Set it to zero!
connectionFailure.jpg (127.19 KiB) Viewed 536 times

A big Thank You to @TD-er for working on the stability improvements. Your many months of effort means I'll be able to join the 150 day club too.

- Thomas
Last edited by ThomasB on 20 Feb 2020, 21:39, edited 1 time in total.

TD-er
Core team member
Posts: 2117
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: 150 days online

#24 Post by TD-er » 20 Feb 2020, 21:38

ThomasB wrote:
20 Feb 2020, 21:30
[...]
A big Thank You to @TD-er for working on the stability improvements. Your many months of effort means I'll be able to join the 150 day club too.
You're welcome and many thanks for extensively testing it and keeping faith in once opening the good bubbly :)

GravityRZ
Normal user
Posts: 22
Joined: 23 Dec 2019, 21:24

Re: 150 days online

#25 Post by GravityRZ » 22 Feb 2020, 11:17

i do not use ntp since i do not need the time.
this might also help in stability.

probing ntp servers and failing to update might screw up the data flow and affect things(eg when the esp is busy updating it can not do something else)

just my thoughts

rayE
Normal user
Posts: 138
Joined: 12 Oct 2017, 12:53
Location: Philippines

Re: 150 days online

#26 Post by rayE » 24 Feb 2020, 10:41

Hi All,
I have 3 units running a custom build from release 20200204. All 3 units are running VERY WELL with NO reboots in the last 4 days, that is a FIRST for me. Just to note im running NTP and a modified domoticz MQTT controller to work with ThingsBoard IOT.

Iv got a gut feeling you at "lets control it" have cracked the reboot problems.....................well done guys and hats off to you :-)

Firmware
Build:⋄ 20104 - Mega
System Libraries:⋄ ESP82xx Core 2_6_3, NONOS SDK 2.2.2-dev(38a443e), LWIP: 2.1.2
Git Build:⋄ My Build: Feb 20 2020 10:34:06
Plugins:⋄ 5 [Normal]
Build Time:⋄ Feb 20 2020 05:12:59
Binary Filename:⋄ Self built!

WiFi Settings
Force WiFi B/G: true
Restart WiFi Lost Conn: false
Force WiFi No Sleep: true
Periodical send Gratuitous ARP: true
Connection Failure Threshold: 0

Post Reply

Who is online

Users browsing this forum: No registered users and 36 guests