OH-MQTT connection lost / Alive Timeout

Moderators: grovkillen, Stuntteam, TD-er

Post Reply
Message
Author
Micha_he
Normal user
Posts: 369
Joined: 07 Feb 2018, 19:14
Location: Helmstedt, Germany

OH-MQTT connection lost / Alive Timeout

#1 Post by Micha_he » 21 Jun 2021, 15:29

Hi. At some of my ESP's the OH-MQTT controller is often disconnect and reconnect immediately.
It seems, that the ESP's doesn't respond to a ping at the same time for ~ 5sec (maybe some delays in my WLAN mesh).

Is the default alive-timeout, which should be given with the MQTT-connect, so low !? Can I set a longer timeout ?

TD-er
Core team member
Posts: 8643
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: OH-MQTT connection lost / Alive Timeout

#2 Post by TD-er » 21 Jun 2021, 15:55

There is a lot of communication between the ESP and the MQTT broker.
So the MQTT client (the ESP) does notice it is disconnected quite fast.
Often even before I see the WiFi disconnected event.
If the ESP does not react to a ping, then your ESP does experience network issues at that moment, which can be caused by quite a lot of things.

- Switch/AP may have forgotten how to reach the ESP's MAC address -> check Send Gratuituous ARP
- WiFi got actually disconnected
- ESP does not receive WiFi packets or reply to requests in due time -> Try to find cause in timing stats page, can have lot of reasons
- AP may be set to "auto channel"
- etc.

Micha_he
Normal user
Posts: 369
Joined: 07 Feb 2018, 19:14
Location: Helmstedt, Germany

Re: OH-MQTT connection lost / Alive Timeout

#3 Post by Micha_he » 21 Jun 2021, 16:30

TD-er wrote: 21 Jun 2021, 15:55 - Switch/AP may have forgotten how to reach the ESP's MAC address -> check Send Gratuituous ARP
I'll test it with the two problematic ESP's and report here...
TD-er wrote: 21 Jun 2021, 15:55 - AP may be set to "auto channel"
Is set.

The ESP's with the disconnect/reconnect are always different (mostly 1-2 of 30). Some other at the same repeater are without problems. Mostly with the same firmware from 15.06.2021 and similar configuration, but I've had the problem before.

But I see a 'LWT disconnect' message at the broker. Is it possible, that the alive-packet sometimes arrives a little too late at the MQTT-broker due to the WLAN interruption?
The mosquito default alive timeout is 60s. But it seems, that ESPEasy set it to a lower value (< 15s) !? And then a timeout of 5s, short before alive package, would be critical.

TD-er
Core team member
Posts: 8643
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: OH-MQTT connection lost / Alive Timeout

#4 Post by TD-er » 21 Jun 2021, 16:43

I think the timeout for MQTT is 10 sec in ESPEasy, which is based on the default setting of Mosquitto of 15 sec.
If the ESP does need to reconnect to WiFi, I think it may sometimes take over 10 sec to complete.
But the LWT is published as soon as the broker considers the ESP to be no longer connected.
So either there is a disconnect received from the ESP, or the time the ESP was not reacting to a heart beat packet was > what is set in the broker as timeout.

Micha_he
Normal user
Posts: 369
Joined: 07 Feb 2018, 19:14
Location: Helmstedt, Germany

Re: OH-MQTT connection lost / Alive Timeout

#5 Post by Micha_he » 22 Jun 2021, 08:04

Micha_he wrote: 21 Jun 2021, 16:30
TD-er wrote: 21 Jun 2021, 15:55 - Switch/AP may have forgotten how to reach the ESP's MAC address -> check Send Gratuituous ARP
I'll test it with the two problematic ESP's and report here...
It didn't improve anything!

Micha_he
Normal user
Posts: 369
Joined: 07 Feb 2018, 19:14
Location: Helmstedt, Germany

Re: OH-MQTT connection lost / Alive Timeout

#6 Post by Micha_he » 24 Jun 2021, 18:38

I can't see any WiFi disconnections in the ESP log's...

Older Fw:

Code: Select all

3374216815: MQTT : Connection lost, state: Disconnected
3374216852: MQTT : Connected to broker with client ID: ESPClient_24:62:AB:0F:E8:69
3374216854: Subscribed to: DIM2GZ/#
3374216914: EVENT: MQTT#Disconnected
3374217015: EVENT: MQTT#Connected
Newer Fw:

Code: Select all

587680758: MQTT : Connection lost, state: Disconnected
587680760: Read settings: ControllerSettings index: 0
587680781: Read settings: ControllerSettings index: 0
587680821: MQTT : Connected to broker with client ID: Bewaesserung
587680824: Subscribed to: Bewaesserung/#
Can someone tell me the places in the sourcecode, where I can set a shorter KeepAlive-refresh-interval? Then I can check it with a own-build Fw.
Thanks.


Micha_he
Normal user
Posts: 369
Joined: 07 Feb 2018, 19:14
Location: Helmstedt, Germany

Re: OH-MQTT connection lost / Alive Timeout

#8 Post by Micha_he » 24 Jun 2021, 19:30

Thanks, I'll see what I can achieve...

Micha_he
Normal user
Posts: 369
Joined: 07 Feb 2018, 19:14
Location: Helmstedt, Germany

Re: OH-MQTT connection lost / Alive Timeout

#9 Post by Micha_he » 25 Jun 2021, 11:21

Additional info: These are the log-entrys on the broker:

Code: Select all

1624610164: Client Bewaesserung has exceeded timeout, disconnecting.
1624610164: Socket error on client Bewaesserung, disconnecting.
Based on the sourcecode it's not yet clear to me, whether the value from MQTT_KEEPALIVE is only used for the MQTT-connect or also for the MQTT-interval-loop !?
Is PubSubClient::loop() executed every second?

I'll try a bigger MQTT_SOCKET_TIMEOUT

TD-er
Core team member
Posts: 8643
Joined: 01 Sep 2017, 22:13
Location: the Netherlands
Contact:

Re: OH-MQTT connection lost / Alive Timeout

#10 Post by TD-er » 25 Jun 2021, 11:46

Micha_he wrote: 25 Jun 2021, 11:21 [...]
Based on the sourcecode it's not yet clear to me, whether the value from MQTT_KEEPALIVE is only used for the MQTT-connect or also for the MQTT-interval-loop !?
Is PubSubClient::loop() executed every second?
The pubSubClient loop is called a lot more frequent.
It is called every timermqtt_interval msec, which is 250 msec, unless you're not connected to the broker. Then the interval may increase in steps of 5 sec upto 30 sec to retry to connect. (see updateMQTTclient_connected() )
The interval can even be shorter, as the ESPEasy_Scheduler::IntervalTimer_e::TIMER_MQTT may be rescheduled to "now + 10 msec" if there are messages to be pushed to the broker.

Post Reply

Who is online

Users browsing this forum: No registered users and 25 guests