Caching things smartly can make systems efficient.
Embedded systems have limited resources. We need to carefully manage how messages are being sent and how we handle state changes.
In this writeup (and personal reflection), I highlight a real example But greatly simplified to show how caching state can lead to performance gains.
Sending signals
For some context, this (toy) example was from a Real-time Operating System (CG2271) course where we had to build a race car and complete an obstacle course in the fastest time possible.
Here’s the communication problem that my team is trying to solve in our RTOS project:
For the race car to be fast and responsive, the message passing between the communication links needs to be fast.
How the links work are that:
- The PS4 controller sends data (like what buttons are pressed) to our ESP32.
- The ESP32 then uses that data to tell our main microcontroller unit via UART that controls our motors what to do. This is done by sending a command packet.
The issue most teams faced was that there was significant latency between either/both of these links:
- PS4 <——> ESP32, or
- ESP32 <——> KL25
For my team, the first link (Bluetooth) is fine - the controller sends out data really fast, continuously. Other teams that used other race car controllers mediums (like a mobile app) had intrinsic speed issues.
The second link was slightly more problematic.
Too many messages
We initially implemented a naiive solution.
For each main loop iteration:
Check the PS4 controller state, and send the appropriate command packet. Simple!
When the command packet is sent, the KL25 microcontroller (running on an RTOS) uses a message queue to receive the command packet from the ESP32.
Unfortunately, our implementation was sending out command packets to the KL25 MCU at every loop iteration, which has an upper bound of 80MHz (the lower bound of clock speeds for the ESP).
TLDR; we are spamming our KL25 with command packets.
Here’s the first cut of our code (abridged and redacted HEAVILY for readability)…
|
|
Our message queue buffer on the KL25 side was choking and could not keep up with the rate of messages being sent.
Even on ‘idle’ when nothing is being sent, the KL25 had to deal with the torrent of DXN_STOP
messages.
In practice, this meant that there was some latency between button pushes and our race car responding. We have even seen command packets dropping (button push ignored) occasionally (queue buffer overflow?).
Caching Controller State
We are graded on speed. This cannot suffice. Every delay in communicating our command packets adds up.
Here’s where caching/memoizing comes in.
When trying to optimize the code, I realised that many redundant states were being sent.
For example, when the racecar is idling, the state will always be DXN_STOP
. Every loop iter will send DXN_STOP
packets. The duplicate packets/messages being sent was useless!
Partially inspired by dynamic programming, I introduced a simple optimization.
|
|
We track the PS4 controller’s state with prev
.
We do the usual control flow to assign the current state curr
of the controller.
Then, just compare the previous state prev
with the current state curr
and ONLY sending it if there is a change.
This is admittedly pretty trivial in the example but the impact was notable1.
The upper bound of messages being sent is now O(how fast your fingers go)2 😏 It would be impossible to flood our message queue now.
We got much better reliability and the responsiveness improved from some lag (that feels like 100s of milliseconds) to virtually real-time with almost no latency.
Why this happens3 is because:
-
No packet loss
The KL25’s message queue was not being clogged up as there are much fewer messages being sent. -
Efficient CPU use
In our RTOS application, the message handler thread is not active as often, giving more clock cycles to other mission critical threads, such as the motor control threads.
All this in practice leads to:
Not pictured is our dear gamer friend button-mashing the controller 😉
Caching works, I guess.
Notes for observant people
The code snippets were heavily redacted and abridged to make it more ‘toy’-like for ease of reading.
We used a global variable. Evil, but the program is single-threaded and state was well controlled enough to make it a non-issue for our specific usecase :).
If > 1 button was pressed on the PS4 controller, the ordering of the if statements matter. But this is not an issue for us as we decided that certain directions SHOULD take precedence if multiple actions were pressed (left/right more important as these are evasive movements).
-
It was striking enough for me to be inspired to write about it. Theory can only get you so far: the experimental/practical responsiveness from the improvement was pretty darn good. The change from moderately laggy to real-time level responsiveness just felt really really good. ↩︎
-
To make the optimization useless you’d need to spam buttons at 80MHz thereabouts :D ↩︎
-
My own postulation, other explanations may be plausible too! ↩︎