strange behaviour

Moderators: grovkillen, Stuntteam, TD-er

Post Reply
Message
Author
manjh
Normal user
Posts: 516
Joined: 08 Feb 2016, 11:22

strange behaviour

#1 Post by manjh » 24 Sep 2018, 23:06

not sure if this is an ESP Easy problem, but perhaps someone recognizes it.
At this moment, I have about 15 active ESP units in my network. To keep things under control, I have assigned static IP addresses in the range from 201 upwards.
Beside the ESP units, I have about 14 or 15 other devices attached mostly via WiFi.

Everything works OK, except that at some point in time, all ESP units magically disappear from the network. When this happens, I restart things here and there and usually get things working again. Not a big deal, except for a few Sonoff switches that are built in, and it it not easy to get to them to reboot.

It happened again today, and I thought I'd try restarting my router (Netgear R7000). After this, everything was back online again...

I don't know if this means it is really a Netgear problem, or that the problem is in ESP Easy and a restarting router triggered to fix it.

My ESP Easy units run various builds, mostly R147 and ESP Easy Mega 20102. Also two run R148, and a "special" for my P1 meter.

Does this make sense to anyone?

TD-er
Core team member
Posts: 8643
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: strange behaviour

#2 Post by TD-er » 29 Sep 2018, 17:57

After the reboot of the Netgear, can you see the uptime of the ESP nodes?
I suspect they rebooted also.

I had something similar last week. All nodes stopped sending data to Domoticz at the same time. I rebooted the Raspberry Pi running Domoticz (and Mosquitto) and all reconnected again.
To me it looks like some connections may not get closed and thus prevent other connections from being used. After closing the connection from the other side (either disconnect MQTT server or reboot wifi access point) may finally free those connections.

Not sure where this blocks, it is just an idea I am having about this.

manjh
Normal user
Posts: 516
Joined: 08 Feb 2016, 11:22

Re: strange behaviour

#3 Post by manjh » 02 Oct 2018, 20:03

TD-er wrote: 29 Sep 2018, 17:57 After the reboot of the Netgear, can you see the uptime of the ESP nodes?
I suspect they rebooted also.

I had something similar last week. All nodes stopped sending data to Domoticz at the same time. I rebooted the Raspberry Pi running Domoticz (and Mosquitto) and all reconnected again.
To me it looks like some connections may not get closed and thus prevent other connections from being used. After closing the connection from the other side (either disconnect MQTT server or reboot wifi access point) may finally free those connections.

Not sure where this blocks, it is just an idea I am having about this.
I don't knowabout the uptime, I did not check.
But this is not the first time this happens. On a previous occasion someone told me it might be the bug in ESP Easy, where it freezes after a certain time. I then upgraded a number of units, but had to leave a few that were actually built in and not easely accessible. I found that upgrading from build 147 or 148 on to ESP Easy Mega takes a little manual work, OTA simply is not enough. I needed to connect to a computer and reconfigure via serial.

So what's next... I am thinking of putting in place a periodic check (simple ping will do) and keep track of ESP units that disappear. As soon as it happens again, I will at least have some info about timing...

TD-er
Core team member
Posts: 8643
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: strange behaviour

#4 Post by TD-er » 03 Oct 2018, 15:30

You can also add the "Generic - System info" plugin and set it to "uptime".
As it is a plugin, you can also send it to a controller and let it log in something like Domoticz, or whatever you use.
Then you have a record on how often it reboots.

manjh
Normal user
Posts: 516
Joined: 08 Feb 2016, 11:22

Re: strange behaviour

#5 Post by manjh » 10 Oct 2018, 11:36

TD-er wrote: 03 Oct 2018, 15:30 You can also add the "Generic - System info" plugin and set it to "uptime".
As it is a plugin, you can also send it to a controller and let it log in something like Domoticz, or whatever you use.
Then you have a record on how often it reboots.
Good idea, I put this in place for all my ESP units. They now send a "heartbeat" to Domoticz every minute, a special text device shows the uptime.
I also put a dzVents script in place that periodically checks all defined uptime text devices, looks at the "last update", and alerts me via e-mail if the unit has not sent the uptime in for more than 2 minutes.
This works quite well. I will let it run for a while. Perhaps I will tweak the timing somewhat, once a minute may be a bit much. once every five minutes sounds OK too...

TD-er
Core team member
Posts: 8643
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: strange behaviour

#6 Post by TD-er » 10 Oct 2018, 12:14

Maybe you could also post an update on the statistics of the uptime in like a week?

manjh
Normal user
Posts: 516
Joined: 08 Feb 2016, 11:22

Re: strange behaviour

#7 Post by manjh » 12 Oct 2018, 09:31

TD-er wrote: 10 Oct 2018, 12:14 Maybe you could also post an update on the statistics of the uptime in like a week?
I could, but first I need to develop a way to reduce the raw number of minutes into a neat dd:hh:mm:ss presentation.
Not sure if it could be done in a rule. If not, it will be a small script in Domoticz.
Something to do as soon as I have some time. 8-)

Domosapiens
Normal user
Posts: 307
Joined: 06 Nov 2016, 13:45

Re: strange behaviour

#8 Post by Domosapiens » 13 Oct 2018, 10:23

a way to reduce the raw number of minutes
I just divide %value%/60 and send it to Domoticz.
To a virtual sensor Type: Custom Sensor, Axis label: hr.
This already gives a good indication for (in-)stability.
Like this:
Image

Does not take a lot of time and helps the Dev. Team.
30+ ESP units for production and test. Ranging from control of heating equipment, flow sensing, floor temp sensing, energy calculation, floor thermostat, water usage, to an interactive "fun box" for my grandson. Mainly Wemos D1.

manjh
Normal user
Posts: 516
Joined: 08 Feb 2016, 11:22

Re: strange behaviour

#9 Post by manjh » 13 Oct 2018, 22:46

Domosapiens wrote: 13 Oct 2018, 10:23
a way to reduce the raw number of minutes
I just divide %value%/60 and send it to Domoticz.
To a virtual sensor Type: Custom Sensor, Axis label: hr.
This already gives a good indication for (in-)stability.
Like this:
Image

Does not take a lot of time and helps the Dev. Team.
ok, will do ..

Strider336
Normal user
Posts: 14
Joined: 13 Jan 2018, 11:51

Re: strange behaviour

#10 Post by Strider336 » 14 Oct 2018, 12:29

I seem to recall seeing a lot of complaints about the R7000 dropping connections to WiFi, some time ago when a friend asked me to recommend a new router.

Of course the most likely problem is local WiFi interference or simply too many networks locally.
Years ago I had neighbours who used to turn their router off when they didn't need it, the result was their router randomly picking WiFi channels at boot, which sometimes clashed with mine and wiped out my WiFi instantly. I ended up having to install repeaters at all corners of my property and using those to literally dominate a single WiFi channel 24/7.

manjh
Normal user
Posts: 516
Joined: 08 Feb 2016, 11:22

Re: strange behaviour

#11 Post by manjh » 25 Oct 2018, 20:54

I added an ESP unit to the switched power supply of my floor heating pump, so I can see how often and how long the pump is being switched on.
Used an "uptime" signal for this. It works, but the last value that is sent before the power is switched of remains on the graph until the next time the power is switched on, where it restarts at zero.
Is there a trick for this, or should I write a LUA to check? If the last signal from the ESP is older than 1 minute, reset the value to zero? I guess this would work, but perhaps there is a more elegant way to do this?

TD-er
Core team member
Posts: 8643
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: strange behaviour

#12 Post by TD-er » 26 Oct 2018, 00:58

If you're using MQTT, you could also abuse the LWT topic for it.
This is mainly an issue at the receiving end of the data, so that's probably the place to see how this should be corrected, since the ESP cannot send a new value when it is offline.
That's why I suggested the LWT, since that's something the broker will send as soon as the ESP node appears to be offline.

manjh
Normal user
Posts: 516
Joined: 08 Feb 2016, 11:22

Re: strange behaviour

#13 Post by manjh » 27 Oct 2018, 13:11

I decided to chicken-out and use the easy way... :oops:

Wrote a small LUA script that checks the age of the value in the uptime device; if it is older than 1 minute (ESP sends value every 30 secs) and it is > 0, then I reset it to zero.
Also at that point I write a single line to a logfile, with the last known value.
So now I can see the activity in the log of the device, but also have a detailed logfile.

Maybe not technically the most elegant way, but it is easy and pragmatic... :D

Post Reply

Who is online

Users browsing this forum: No registered users and 31 guests