Exploring RTOS plugin possibilities

Message

martinus · #1 Post by **martinus** » 08 Apr 2020, 12:46

Decided to open a new topic on RTOS specifics that could be useful for specific plugins.

I have been reading quite a lot on FreeRTOS fundamentals the last couple of days.

Also build a first plugin to learn how RTOS could aid in running timing critical plugin tasks.
Timing critical in this context means a very smooth and consistent response to events or schedules.
(for things like high frequency pulse detection, ISR's would still be required)

Running a fast blinking LED without any interference from other tasks is fun to see at work:

Code: Select all

void P254_blinkLED( void * parameter ) {
  TickType_t xLastWakeTime;
  const TickType_t xFrequency = 50;
  xLastWakeTime = xTaskGetTickCount();
  boolean state = 0;
  pinMode(P254_LED, OUTPUT);
  while (true) {
    state = (!state);
    digitalWrite(P254_LED, state);
    vTaskDelayUntil( &xLastWakeTime, xFrequency );
  }
}

But a more useful option that i'm currently exploring is a handy LCD menu for safeboot or other runtime options.
To avoid that running other features will interfere with normal ESPEasy operation, the menu and it's launch actions will run from another RTOS task.
As I'm not planning to reinvent the wheel, work will be based on this library:https://github.com/tomsuch/M5StackSAM

The entire ESPEasy project was never build with being "ThreadSafe" in mind. And it looks not so "Easy" to make it so.
But RTOS could really add some value to specific plugins without to much work and risk if you take special care about addressing (updating) global datastructures and running global functions. The LCD menu will currently only run functions private to it's plugin so that should be safe...

TD-er · #2 Post by **TD-er** » 08 Apr 2020, 14:11

That's a nice topic to look into.
The ESP32 is now so well available and the prices are dropping, so it would really make sense to split time critical tasks to another core.
Or maybe the other way around, place blocking code on the "other core" so the rest will run as smooth as possible.

At least moving the web interface to another task is something that would really make sense, so running code in other tasks does not suffer as much from serving a web page as the current implementation does.

It would make sense to have these split into separate tasks:
- Handling CPlugin calls
- Handling NPlugin calls
- Handling Plugin calls
- Handling web interface
- Processing rules

These parts are already somewhat separated in the code, so these should be relatively easy to split.
Maybe even have their own copy of the settings and after each 'loop' each task can see if the settings have changed and need to be updated.
This will make the transition rather smooth and does push the need for proper locking a bit forward.

Then it could be a user setting to decide which of these RTOS tasks should be assigned to which core.

This will also eliminate the need for the scheduler we now use, as RTOS has its own scheduler, although I don't know yet what impact that may have.
So to start with, each task can have its own implementation of our scheduler.

martinus · #3 Post by **martinus** » 08 Apr 2020, 15:09

The first experiments have left the crashing prototype stage and are actually working:

: M5.png (675.85 KiB) Viewed 47672 times

Left picture shows the boot state after autodecting the devicetype based on flash stored Mac addreslist.
When the safebutton is pressed during boot, the system goes into the menu and that's using a separate RTOS task.
The menu can also be requested during normal operation and does not rely on ESPEasy because it's an RTOS scheduled task.

Well the menu is empty now, but it's just to illustrate what we can do in general. From here we could boot into safemode, change the LCD backlight level, show statistics and even start/stop/suspend/resume RTOS tasks, etc...

The webgui shows a list of RTOS tasks that are running on core 1:

: RTOSInfo.png (37.31 KiB) Viewed 47672 times

Suspending task 1 puts ESPEasy on hold. Stopping it kills ESPEasy but the device still runs. It think it's even possible to restart ESPEasy without rebooting the ESP (?)

@TD: Looks like quite a challenge to start using the full potential of the RTOS SDK for ESPEasy. I don't even know if the "Arduino ESP32 Core" is threadsafe? Maybe everything now runs as a single task for a good reason. The RTOS features are exposed (not all of them unfortunatly), but likely to be used at your own risk

But it's certainly fun to start trying anyway

martinus · #4 Post by **martinus** » 08 Apr 2020, 15:54

TD-er wrote: ↑08 Apr 2020, 14:11 The ESP32 is now so well available and the prices are dropping, so it would really make sense to split time critical tasks to another core.
Or maybe the other way around, place blocking code on the "other core" so the rest will run as smooth as possible.

I don't thinks it's even needed to use core 0. It may also interfere with the core networking tasks.
My first experiments already show a very smooth and persistent experience because of the RTOS preemptive scheduler, just on a single core.

When looking at the fast blinking LED, i can't see any interference from accessing the webgui. Doing the same with the default 10persecond task makes a huge difference.

So choosing the ESP32 is not because of it's dual-core in the first place. Theoretically, a comparable result could be achieved on the single core ESP8266, but AFAIK there is no "Arduino Core" wrapped around the RTOS SDK for ESP8266.

TD-er · #5 Post by **TD-er** » 08 Apr 2020, 22:18

martinus wrote: ↑08 Apr 2020, 15:54 [...]
So choosing the ESP32 is not because of it's dual-core in the first place. Theoretically, a comparable result could be achieved on the single core ESP8266, but AFAIK there is no "Arduino Core" wrapped around the RTOS SDK for ESP8266.

That's the reason I made the timing stats page + logging, to track down the pieces of code in ESPEasy that take a lot of time, so we do achieve as multi-tasking feeling as much as possible.
The ESP8266 does have limited RAM, so task switching will take a lot of flash reads to move the code of the other thread in RAM.
So it will not be the same as on the ESP32.

thalesmaoa · #6 Post by **thalesmaoa** » 17 Aug 2024, 16:49

Hi there, despite of being an old topic. This one is the best reference I could find so far about RTOS.

During some dev in espeasy code I've noticed an option called:

Enable RTOS Multitasking.

I've been working with CANBus (CAN) and one of their main advantages are reliability and error handling. I would love to know where I can find more info about RTOS and EspEasy. Perhaps I can implement CANOpen.

TD-er · #7 Post by **TD-er** » 17 Aug 2024, 22:05

I disabled it as it requires locking between structs to exchange among cores.
You can create RTOS threads, each with their own stack.
I do think you can create a separate task on a different core, however I do not think you should re-enable the ESPEasy RTOS flag as there is probably dead code which will cause all kinds of different issues.

thalesmaoa · #8 Post by **thalesmaoa** » 18 Aug 2024, 03:10

Noted. I will not work with it. Thx

TD-er · #9 Post by **TD-er** » 18 Aug 2024, 09:42

Just keep in mind that not all ESP32-variants have multiple cores and on 'the other core' you're not alone as there is also WiFi running.

martinus · #10 Post by **martinus** » 24 Dec 2024, 08:53

TD-er wrote: ↑18 Aug 2024, 09:42 Just keep in mind that not all ESP32-variants have multiple cores and on 'the other core' you're not alone as there is also WiFi running.

Of course it is not required to have multiple cores to start unleashing the power of the preemptive RTOS scheduler (freeRTOS was not even designed for SMP originally).
I use RTOS to run a webserver and telnet server as a separate task. (Beware that I'm not running latest ESPEasy but my own core version of the rules engine based on a very legacy ESP Easy version)

This brings a smoother response experience in cases where the main task is shortly blocking for some reason. Also when the main task goes into an unpredicted loop, I can use a telnet session to recover/reboot the device. The RTOS scheduler switches between tasks at the millisecond level and it feels as true multitasking. It a task becomes nonresponsive it does not affect the other tasks from running.

This is from my latest ESP32-C3 that is single core, but still using multiple RTOS tasks:

: RTOS.png (15.42 KiB) Viewed 7246 times

I've only recently picked up some further development on RTOS and actually still learning on critical sections, SMP challenges, Mutex semaphores and such.
All quite interesting stuff to learn the deeper logic on how these things work, specially on the ESP32.

TD-er · #11 Post by **TD-er** » 24 Dec 2024, 09:06

Yep, RTOS scheduling is pretty good.
It is now being used in some parts of the code, mainly on SDK level. For example WiFi code is now working in a separate RTOS task.
The reason why it isn't still actively being used in ESPEasy is because of the lack of any mutex code protecting struct access, also because some mutex implementations were not even working well upto about a year ago. (fixes in ESP-IDF4.x)

ESPEasy is however using more events internally, some of which are being fired from other RTOS tasks.
This does impose some limitations as you cannot always use all function calls when handling events. For example the option to allow floats in those callbacks has only recently been added to ESP32-code.
But this is still easier to deal with compared to setting up a separate RTOS task as you cannot split everything just in separate threads.
For example some hardware access is better handled by the OS/SDK layer or else you may end up with significant slowdowns due to locking.
One such example of extreme slowdown is direct GPIO-access, which is in the latest ESP-IDF/Arduino3.x showing an extreme slowdown. (2 - 3 orders of magnitude slower)

martinus · #12 Post by **martinus** » 24 Dec 2024, 10:03

I must admin that I have not looked at the current ESP Easy code for quite a while. It has also developed to a level that no longer really matches my coding skills.

In order to keep things simple for me (scripting guy) I decide to build a new ESP core version based on R120 code. I've removed all tasks related code and gui stuff and everything runs from the rule engine using multiple files on littlefs. Plugins only add commands to the internal commands. The webgui is mainly just a file editor to edit the 'script' files.

Also using a mutex protected event queue that is polled from the main loop. Both ESPNOW and MQTT run in separate tasks and add events to the queue and it all works stable.
I also mutex protect writing to the filesystem while the rules engine is currently processing the file. This is needed because the webserver and rule engine run from different RTOS tasks.

I'm not saying that ESPEasy should extend on using more RTOS tasks, although having a dedicated telnet server 'backdoor' task could be handy there as well I guess.
Mainly when I'm developing new code, the main loop can get stuck and I still have remote access. I can even clear a 'frozen' semaphore or test them if needed from a separate task. Or just reboot the remote device.

BTW: this forum started as a 'multi project' forum for all techies that want to work on cool automation DIY projects, experiment with software and hardware, etc.
But it looks like only ESPEasy and RFLink have survived as sustainable projects. And uPyEasy and RPIEasy looks abandoned? I really would have liked to see a micropython version of ESPEasy.

TD-er · #13 Post by **TD-er** » 24 Dec 2024, 11:07

The same author as for rpieasy also has this: https://github.com/enesbcs/mpyeasy

The ESPEasy 'console' could in theory get stuck as it is dealt with from the main loop().
However it is not that often it gets stuck anymore.
The only really 'blocking' things right now are when trying to resolve an IP-address and performing a WiFi scan. But that's being worked on.
Also the sensor-interactions are now more and more being transformed into async calls, thus also non blocking.

I for sure do see some use cases of stuff to do in RTOS tasks, whenever true 'real-time' is required.
For example when dealing with Bluetooth, you really need to keep tight timings to make sure you're not affecting WiFi as they share the same radio circuits.

As said, the Mutex wasn't always working perfectly and in my tests I could reproduce locking issues (which are now fixed in ESP-IDF/Arduino3.x) which either led to incorrect locking or deadlocks.

I've also been using some ESP-IDF specific (and even RTOS specific) calls for features which weren't supported via Arduino.

martinus · #14 Post by **martinus** » 24 Dec 2024, 12:17

TD-er wrote: ↑24 Dec 2024, 11:07 The same author as for rpieasy also has this: https://github.com/enesbcs/mpyeasy

But the last commit is also 3 years ago...

martinus · #15 Post by **martinus** » 28 Dec 2024, 09:15

Small tip if anyone wants to experiment with RTOS tasks and has a function that can be called recursive, use the correct calls to semaphores like this:

Code: Select all

xSemaphoreFileSystem = xSemaphoreCreateRecursiveMutex();
if (xSemaphoreTakeRecursive(xSemaphoreFileSystem , portMAX_DELAY))
...
xSemaphoreGiveRecursive(xSemaphoreFileSystem);

The owner task can take the semaphore in a recursive way without deadlocking itself, but only if the correct take/give api is used. Took me a while to learn that these exist...

TD-er · #16 Post by **TD-er** » 28 Dec 2024, 13:08

That's a nice function I didn't know yet

martinus · #17 Post by **martinus** » 28 Dec 2024, 16:25

Another tip for free:

When you use vTaskSuspendAll() in a custom task to temporary suspend all other tasks (including the looptask), until xTaskResumeAll, do not use delay() or vTaskDelay() in your code segment as it will panic the MCU.
I did not find this in the Expressif docs, but Copilot showed me some solutions to this:

Microseconds delay

Code: Select all

void safe_delay_us(uint64_t number_of_us)
{
    uint64_t microseconds = (uint64_t)esp_timer_get_time();
    if (number_of_us) {
        while ((uint64_t)esp_timer_get_time() - microseconds < number_of_us)
            ;
    }
}

milliseconds delay

Code: Select all

void safe_delay_ms(int delayMS){
  const TickType_t delayTicks = pdMS_TO_TICKS(delayMS);
  TickType_t startTick = xTaskGetTickCount();
  while ((xTaskGetTickCount() - startTick) < delayTicks){}
}

I guess this is needed as delay actually calls vTaskDelay() on ESP32 ?? The Expressif docs mention that while using vTaskSuspendAll, you should not call other FreeTOS api's.
Although it seems ok to call xTaskGetTickCount();

Still learning every day...

TD-er · #18 Post by **TD-er** » 28 Dec 2024, 20:58

Calling delay in any form should actually be used as last resort for proper real-time programming as typically a delay call is used by the scheduler of the OS to switch tasks.

Only for really short delays like the microsecond you show, you could use delay-like calls as task-switching typically takes longer.

martinus · #19 Post by **martinus** » 29 Dec 2024, 09:09

I had a routine to OTA flash a program on an Atmega328 that is connected to an ESP32 serial port, using AVRDude on TCP.
It is quite timing critical and it was originally developed for the ESP8266. It has some delay(1) commands in the code.

So the easy way out on RTOS with muliple tasks, was just to try pause the scheduler during the flash programming and there i encountered the core panic behavior.
So using the 'safe' delays solved the issue without changing this flash function too much.

I agree that it could be developed more in async mode, but that is maybe something when i have more spare time.
And i only use this feature maybe once or twice a year.

In the end, at least, it is good to understand how things internally work and why they potentially fail.

TD-er · #20 Post by **TD-er** » 29 Dec 2024, 09:21

Absolutely! It is always nice to get more insight into what is happening.
Easy to try out and you will learn quite a lot from it

martinus · #21 Post by **martinus** » 31 Dec 2024, 11:49

Forgot to mention another tip for users that want to experiment with RTOS Tasks in their (custom) ESP32 projects:
BEWARE that pubsubclient (MQTT) is definitely NOT thread safe. It shares a buffer for both send and receive packages.
So it is not even sufficient to have send and receive mutexes, but you need one mutex to protect send and receive.
So use a single mutex on 'client->loop' and 'client->publish' calls at least to avoid kernel panics.
(depending on your MQTT traffic load, the core panic might happen within 24 hours or so)

It would be better if the library was rewriten to be threadsafe.
But I've noticed that the maintainer has not updated or pulled requests for 4 years. There are 70 pending PR's:
https://github.com/knolleary/pubsubclient
What happend to Nick O’Leary ?

In the mean time, I've also moved serial to a dedicated task and added stack usage info and some counters:

: RTOS.png (28.45 KiB) Viewed 6522 times

As you can see, tasks are running with close to 1000 iterations/second.
With this approach the system really feels responsive at all times on all the interfaces.
Before, I experienced the occasional delay in getting feedback from telnet, but now even when i put in a delay in the rules engine, it no longer interferes with web, telnet or serial access.

Next step is to have a look at plugins that could also deserve their own dedicated task.
I have not yet tried running tasks at core0. Not sure how busy that core actually is when there is no Wifi traffic?

TD-er · #22 Post by **TD-er** » 31 Dec 2024, 12:50

WiFi is handled in a RTOS task, so it should not really matter how busy it is.
I would not use mutexes to deal with pubsubclient.
That library is constantly modifying the content of the single buffer, so it is really hard to make it multi-threaded without an almost complete rewrite.
Better have a single RTOS task dealing with MQTT.
I did this in ESPEasy for all controllers. Each has its own queue system to deal with sending and receiving.
Any received command via MQTT is put into a command queue and sent messages are being processed at their own send interval.

N.B. I have a modified version of PubSubClient in ESPEasy which includes quite a few fixes and also adds a streaming option to send out really large messages.

martinus · #23 Post by **martinus** » 31 Dec 2024, 13:40

TD-er wrote: ↑31 Dec 2024, 12:50 WiFi is handled in a RTOS task, so it should not really matter how busy it is.

I was more thinking about the total load of Core 0. If it has not much to do, I could also schedule some work on that core as well. But just an idea...
Also Core 0 already seems to run 10 tasks.

TD-er wrote: ↑31 Dec 2024, 12:50 I would not use mutexes to deal with pubsubclient.

Well at the time it was the easiest workaround, only wrapping a mutex around the two calls.

TD-er wrote: ↑31 Dec 2024, 12:50 That library is constantly modifying the content of the single buffer, so it is really hard to make it multi-threaded without an almost complete rewrite.

I was actually thinking of doing that. A lot of code needs to be changed, but essentially, it's down to just having two buffers instead of one, right?
It was quite a surprise to see that buffer being reused in the code, likely to do with the era that memory was very limited on devices like the Atmega328.

TD-er wrote: ↑31 Dec 2024, 12:50 Better have a single RTOS task dealing with MQTT.
I did this in ESPEasy for all controllers. Each has its own queue system to deal with sending and receiving.
Any received command via MQTT is put into a command queue and sent messages are being processed at their own send interval.

That is my future plan as well, to implement a send queue for MQTT. That way I can also prevent 'burst messages' that I would like to avoid on the MQTT broker.
I've not checked ESP Easy code recently, is there any use of an additional RTOS task already?

TD-er wrote: ↑31 Dec 2024, 12:50 N.B. I have a modified version of PubSubClient in ESPEasy which includes quite a few fixes and also adds a streaming option to send out really large messages.

Yes I've noticed when I was debugging the "Stack canary watchpoint triggered (MQTT)" a while ago, that you also were involved with the issue mentioned by Arjen Hiemstra:
https://github.com/knolleary/pubsubclient/issues/832
That's actually were I noticed the root cause: "Incoming and outgoing messages use the same buffer."

TD-er · #24 Post by **TD-er** » 31 Dec 2024, 15:27

Yep, incoming and outgoing use the same buffer, but on top of that there is also some shifting of data going on between topic and payload so you can send it in a single call.
This is not really needed, as MQTT can also send in chunks. As long as you send exactly the amount of data you announced to send in the headers.

And no ESPEasy is not using RTOS tasks right now in ESPEasy code.
The ESP-IDF5.x/Arduino3.x is using some RTOS specific stuff and like I said the WiFi stuff is handled in a different RTOS task.
Most is done using callback functions, which is quite a lot less complicated compared to separate tasks which may run on different cores as not all memory is directly accessible by both cores, which makes it harder to share data between tasks on different cores.
RTOS/ESP-IDF does have some nice tools for this, but it is simpler if done by Arduino as you may otherwise bypass some specific administrative code in Arduino/ESP-IDF to manage hardware resources.

For example ESP-IDF5.x is now using a way more structured design pattern using "drivers" to access some hardware.
Meaning a "driver" allocates exclusive access to some resource like a GPIO (or other resources like I2C, etc).
If you try to access this bypassing those resource managers, you may end up with undefined behavior.

Exploring RTOS plugin possibilities

Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Re: Exploring RTOS plugin possibilities

Who is online