Page 1 of 1

OH-MQTT connection lost / Alive Timeout

Posted: 21 Jun 2021, 15:29
by Micha_he
Hi. At some of my ESP's the OH-MQTT controller is often disconnect and reconnect immediately.
It seems, that the ESP's doesn't respond to a ping at the same time for ~ 5sec (maybe some delays in my WLAN mesh).

Is the default alive-timeout, which should be given with the MQTT-connect, so low !? Can I set a longer timeout ?

Re: OH-MQTT connection lost / Alive Timeout

Posted: 21 Jun 2021, 15:55
by TD-er
There is a lot of communication between the ESP and the MQTT broker.
So the MQTT client (the ESP) does notice it is disconnected quite fast.
Often even before I see the WiFi disconnected event.
If the ESP does not react to a ping, then your ESP does experience network issues at that moment, which can be caused by quite a lot of things.

- Switch/AP may have forgotten how to reach the ESP's MAC address -> check Send Gratuituous ARP
- WiFi got actually disconnected
- ESP does not receive WiFi packets or reply to requests in due time -> Try to find cause in timing stats page, can have lot of reasons
- AP may be set to "auto channel"
- etc.

Re: OH-MQTT connection lost / Alive Timeout

Posted: 21 Jun 2021, 16:30
by Micha_he
TD-er wrote: 21 Jun 2021, 15:55 - Switch/AP may have forgotten how to reach the ESP's MAC address -> check Send Gratuituous ARP
I'll test it with the two problematic ESP's and report here...
TD-er wrote: 21 Jun 2021, 15:55 - AP may be set to "auto channel"
Is set.

The ESP's with the disconnect/reconnect are always different (mostly 1-2 of 30). Some other at the same repeater are without problems. Mostly with the same firmware from 15.06.2021 and similar configuration, but I've had the problem before.

But I see a 'LWT disconnect' message at the broker. Is it possible, that the alive-packet sometimes arrives a little too late at the MQTT-broker due to the WLAN interruption?
The mosquito default alive timeout is 60s. But it seems, that ESPEasy set it to a lower value (< 15s) !? And then a timeout of 5s, short before alive package, would be critical.

Re: OH-MQTT connection lost / Alive Timeout

Posted: 21 Jun 2021, 16:43
by TD-er
I think the timeout for MQTT is 10 sec in ESPEasy, which is based on the default setting of Mosquitto of 15 sec.
If the ESP does need to reconnect to WiFi, I think it may sometimes take over 10 sec to complete.
But the LWT is published as soon as the broker considers the ESP to be no longer connected.
So either there is a disconnect received from the ESP, or the time the ESP was not reacting to a heart beat packet was > what is set in the broker as timeout.

Re: OH-MQTT connection lost / Alive Timeout

Posted: 22 Jun 2021, 08:04
by Micha_he
Micha_he wrote: 21 Jun 2021, 16:30
TD-er wrote: 21 Jun 2021, 15:55 - Switch/AP may have forgotten how to reach the ESP's MAC address -> check Send Gratuituous ARP
I'll test it with the two problematic ESP's and report here...
It didn't improve anything!

Re: OH-MQTT connection lost / Alive Timeout

Posted: 24 Jun 2021, 18:38
by Micha_he
I can't see any WiFi disconnections in the ESP log's...

Older Fw:

Code: Select all

3374216815: MQTT : Connection lost, state: Disconnected
3374216852: MQTT : Connected to broker with client ID: ESPClient_24:62:AB:0F:E8:69
3374216854: Subscribed to: DIM2GZ/#
3374216914: EVENT: MQTT#Disconnected
3374217015: EVENT: MQTT#Connected
Newer Fw:

Code: Select all

587680758: MQTT : Connection lost, state: Disconnected
587680760: Read settings: ControllerSettings index: 0
587680781: Read settings: ControllerSettings index: 0
587680821: MQTT : Connected to broker with client ID: Bewaesserung
587680824: Subscribed to: Bewaesserung/#
Can someone tell me the places in the sourcecode, where I can set a shorter KeepAlive-refresh-interval? Then I can check it with a own-build Fw.
Thanks.

Re: OH-MQTT connection lost / Alive Timeout

Posted: 24 Jun 2021, 18:42
by TD-er

Re: OH-MQTT connection lost / Alive Timeout

Posted: 24 Jun 2021, 19:30
by Micha_he
Thanks, I'll see what I can achieve...

Re: OH-MQTT connection lost / Alive Timeout

Posted: 25 Jun 2021, 11:21
by Micha_he
Additional info: These are the log-entrys on the broker:

Code: Select all

1624610164: Client Bewaesserung has exceeded timeout, disconnecting.
1624610164: Socket error on client Bewaesserung, disconnecting.
Based on the sourcecode it's not yet clear to me, whether the value from MQTT_KEEPALIVE is only used for the MQTT-connect or also for the MQTT-interval-loop !?
Is PubSubClient::loop() executed every second?

I'll try a bigger MQTT_SOCKET_TIMEOUT

Re: OH-MQTT connection lost / Alive Timeout

Posted: 25 Jun 2021, 11:46
by TD-er
Micha_he wrote: 25 Jun 2021, 11:21 [...]
Based on the sourcecode it's not yet clear to me, whether the value from MQTT_KEEPALIVE is only used for the MQTT-connect or also for the MQTT-interval-loop !?
Is PubSubClient::loop() executed every second?
The pubSubClient loop is called a lot more frequent.
It is called every timermqtt_interval msec, which is 250 msec, unless you're not connected to the broker. Then the interval may increase in steps of 5 sec upto 30 sec to retry to connect. (see updateMQTTclient_connected() )
The interval can even be shorter, as the ESPEasy_Scheduler::IntervalTimer_e::TIMER_MQTT may be rescheduled to "now + 10 msec" if there are messages to be pushed to the broker.