Monitoring health of ESP nodes within rules

Message

jgrad · #1 Post by **jgrad** » 15 Apr 2024, 14:53

Hi,

on main page of every ESPEASY node there is a list of other ESP nodes within same network. For every node there is also information about AGE which somehow indicate that node is alive. Is it possible to reference AGE of particular node within rules - eg to implement rule within one node in network which check health of other nodes within network and send notifications?

If there is another hint how to implement such monitoring please share it.

BR.

TD-er · #2 Post by **TD-er** » 15 Apr 2024, 15:06

You could use the "sendto" command to send something to a specific node.
See: https://espeasy.readthedocs.io/en/lates ... nd-publish

For example a command like "event,fromnode=%unit%"

Code: Select all

sendto,N,"event,fromnode=%unit%"

With N being the unit nr of the other node

And then in the rules on that node:

Code: Select all

on fromnode do
  logentry,"Received from node: %eventvalue1%"
endon

jgrad · #3 Post by **jgrad** » 15 Apr 2024, 15:47

OK, this is one option but I have to modify rules on both sides. Is there a way to use alreasy available info which is distributed with broadcast?

TD-er · #4 Post by **TD-er** » 15 Apr 2024, 16:32

Not yet, but I will think about what could be done to make this available.
It feels more like a system variable, so maybe we can do something like %u_age%(N) with N being the unit nr.
A bit like the standard conversions.

jgrad · #5 Post by **jgrad** » 18 Apr 2024, 20:32

yes, system variables like:
%u_age%(N)
%u_name%(N)
%u_IP%(N)

would be perfect.

Values are available in response to http://localhost/json but I dont know how to get this response and parse it within rules.

Ath · #6 Post by **Ath** » 18 Apr 2024, 20:47

jgrad wrote: ↑18 Apr 2024, 20:32 Values are available in response to http://localhost/json but I dont know how to get this response and parse it within rules.

You can not get the JSON output from rules from the same ESP, and you can't process the resulting (large) json-text.

We'll try to add the functions.

Ath · #7 Post by **Ath** » 18 Apr 2024, 23:15

jgrad wrote: ↑18 Apr 2024, 20:32 %u_IP%(N)

That's already available, named %c_u2ip%(N,x) where N = unitnr, and x determines what's returned for an empty IP: 1 = "" (empty string), 2 = 0

TD-er · #8 Post by **TD-er** » 18 Apr 2024, 23:48

Maybe also add %u_build%(N)

Ath · #9 Post by **Ath** » 21 Apr 2024, 17:32

@jgrad I've created pull request #5039 to add the request conversions. The names are somewhat different then requested, as conversions are required to start with "%c_".

Quick documentation available in the PR. (Documentation is also updated, and will be deployed on ESPEasy RTD when the PR is merged).
Download available in this GH Action Run (You'll need a free github account to be able to download from there)

jgrad · #10 Post by **jgrad** » 21 Apr 2024, 22:22

@Ath

thanks for your effort. I uploaded ESP_Easy_mega_20240421_normal_ESP32_4M316k_ETH and sucessfully executed basic test.
I also implemented node healthcheck with email notification within rules - after some days when I will be sure that everything works as expected I will add code in rules which is used for monitoring.

Can you also add possibility to fetch load of selected node?

Ath · #11 Post by **Ath** » 21 Apr 2024, 23:26

jgrad wrote: ↑21 Apr 2024, 22:22 Can you also add possibility to fetch load of selected node?

Well, I wasn't sure if that would be useful, but as you're asking for it I've added it. And I also added the Type column value, both numeric and string-converted, that shows the type of chip in the remote unit (not only ESP's...

)

A new Actions run is working for you

jgrad · #12 Post by **jgrad** » 22 Apr 2024, 19:51

Hi @Ath

thanks for adding "load" as possible parameter - it can be usefull for monitoring load performance... (not tested yet)

Below I am sharing simple rule which is checking presence of all ESP nodes with IDs btw 1..100 - if age it above 300seconds =>mail notification. Nodes not present are skiped.

Code: Select all

on System#Boot do
 Let,10,1 //initializing counter for monitoring health of ESP nodes, counter is used also for ID of node which is checked in one cycle 
 loopTimerSet,2,1       // initialize timer2 for checking health of ESP nodes, cycle 1 second, each second 1 node is checked
endon

On Rules#Timer=2 do
  If %c_ubuild%([int#10])>0  //checking if ESPnode is on list of active nodes, rule is checking every ID btw 1..100, ID is stored in int#10
   If %c_uage%([int#10])>300  //if ESPnode is on node list then checking AGE of node since last status report 
   //if age is greater than 300 seconds send email notify
   Notify 1, "ESPEasy Node %c_uname%([int#10]) is not reporting its status for node list for more than %c_uage%([int#10]) seconds! (configured trashold=300 seconds)%CR%%LF%"
  endif
  endif
 Let,10, [int#10]+1 //increment counter
 If [int#10]>100  //restart counter
    Let,10,1
 endif
Endon

TD-er · #13 Post by **TD-er** » 22 Apr 2024, 20:06

Maybe you should also set some limit on the number of emails being sent, as the node will be "forgotten" after 10 minutes (or was it 5???) so it might hit the email call several times in a row until it is forgotten.

N.B. you can also nest variables, so you can keep track of sent emails per node.
I suggest to keep your "counter" in a higher variable index like "100" or "1000", so you can do stuff like this:

Code: Select all

let,[int#1000],1 // email sent

logentry,"Email sent state of node [int#1000]: [int#%v1000%]"

jgrad · #14 Post by **jgrad** » 23 Apr 2024, 09:55

Node disappears from "node list" in 600seconds (10min) of inactivity.

Based on above rules script if node is disconnected email is sent cca 3 times when node age is btw 300 and 600 seconds. After that node disappear from list (%c_ubuild%(N)=-1) and emails are not send anymore.

Ath · #15 Post by **Ath** » 01 May 2024, 23:23

@jgrad The PR has been merged, and will be in the next ESPEasy release.
It also now has the added feature of sending events for P2P Nodes when they 'join' or get 'out of sight', for you to respond on.

jgrad · #16 Post by **jgrad** » 03 May 2024, 13:27

@ath thanks for info - I will perform a test after official release will be out.

Monitoring health of ESP nodes within rules

Monitoring health of ESP nodes within rules

Re: Monitoring health of ESP nodes within rules

Re: Monitoring health of ESP nodes within rules

Re: Monitoring health of ESP nodes within rules

Re: Monitoring health of ESP nodes within rules

Re: Monitoring health of ESP nodes within rules

Re: Monitoring health of ESP nodes within rules

Re: Monitoring health of ESP nodes within rules

Re: Monitoring health of ESP nodes within rules

Re: Monitoring health of ESP nodes within rules

Re: Monitoring health of ESP nodes within rules

Re: Monitoring health of ESP nodes within rules

Re: Monitoring health of ESP nodes within rules

Re: Monitoring health of ESP nodes within rules

Re: Monitoring health of ESP nodes within rules

Re: Monitoring health of ESP nodes within rules

Who is online