Caching makes things fast - an embedded example (CG2271)

| ⌛ 5 minutes read

📋 Tags: CG2271 RTOS


Caching things smartly can make systems efficient.

Embedded systems have limited resources. We need to carefully manage how messages are being sent and how we handle state changes.

In this writeup (and personal reflection), I highlight a real example But greatly simplified ​ to show how caching state can lead to performance gains.

Sending signals

For some context, this (toy) example was from a Real-time Operating System (CG2271) course where we had to build a race car and complete an obstacle course in the fastest time possible.

Here’s the communication problem that my team is trying to solve in our RTOS project:

Message flow

For the race car to be fast and responsive, the message passing between the communication links needs to be fast.

How the links work are that:

  • The PS4 controller sends data (like what buttons are pressed) to our ESP32.
  • The ESP32 then uses that data to tell our main microcontroller unit via UART that controls our motors what to do. This is done by sending a command packet.

The issue most teams faced was that there was significant latency between either/both of these links:

  • PS4 <——> ESP32, or
  • ESP32 <——> KL25

For my team, the first link (Bluetooth) is fine - the controller sends out data really fast, continuously. Other teams that used other race car controllers mediums (like a mobile app) had intrinsic speed issues.

The second link was slightly more problematic.

Too many messages

We initially implemented a naiive solution.

For each main loop iteration:
Check the PS4 controller state, and send the appropriate command packet. Simple!

When the command packet is sent, the KL25 microcontroller (running on an RTOS) uses a message queue to receive the command packet from the ESP32.

Unfortunately, our implementation was sending out command packets to the KL25 MCU at every loop iteration, which has an upper bound of 80MHz (the lower bound of clock speeds for the ESP).

TLDR; we are spamming our KL25 with command packets.

Here’s the first cut of our code (abridged and redacted HEAVILY for readability)…

 0
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
// We were using ESP32 Arduino, hence the arduino loop() 
void loop() {
    if (PS4.isConnected()) {
        // if PS4 current input is a certain direction, 
        // send that command to KL25 via Serial2
        if (PS4.Up()) {
            Serial2.write(DXN_FWD);
        }
        else if (PS4.Down()) {
            Serial2.write(DXN_BACK);
        }
        else if (PS4.Left()) {
            Serial2.write(DXN_LEFT);
        }
        else if (PS4.Right()) {
            Serial2.write(DXN_RIGHT);
        }
        else { // fallback
            Serial2.write(DXN_STOP);
        }
    }
}

Our message queue buffer on the KL25 side was choking and could not keep up with the rate of messages being sent.

Even on ‘idle’ when nothing is being sent, the KL25 had to deal with the torrent of DXN_STOP messages.

In practice, this meant that there was some latency between button pushes and our race car responding. We have even seen command packets dropping (button push ignored) occasionally (queue buffer overflow?).

ESP32 stuffing messages into the queue (visualized)

Caching Controller State

We are graded on speed. This cannot suffice. Every delay in communicating our command packets adds up.

Here’s where caching/memoizing comes in.

When trying to optimize the code, I realised that many redundant states were being sent.

For example, when the racecar is idling, the state will always be DXN_STOP. Every loop iter will send DXN_STOP packets. The duplicate packets/messages being sent was useless!

Partially inspired by dynamic programming, I introduced a simple optimization.

 0
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Our state tracker (cache)
uint8 prev = DXN_INIT;

void loop() {
    // default direction
    uint8 curr = DXN_STOP;

    if (PS4.isConnected()) {
        if (PS4.Up()) {
            curr = DXN_FWD;
        }
        else if (PS4.Down()) {
            curr = DXN_BACK;
        }
        else if (PS4.Left()) {
            curr = DXN_LEFT;
        }
        else if (PS4.Right()) {
            curr = DXN_RIGHT;
        }

        // Only send a packet if there is a change in state
        if (prev != curr) {
            Serial2.write(curr);
            prev = curr;
        }
    }
}

We track the PS4 controller’s state with prev.
We do the usual control flow to assign the current state curr of the controller.
Then, just compare the previous state prev with the current state curr and ONLY sending it if there is a change.

This is admittedly pretty trivial in the example but the impact was notable1.

The upper bound of messages being sent is now O(how fast your fingers go)2 😏 It would be impossible to flood our message queue now.

We got much better reliability and the responsiveness improved from some lag (that feels like 100s of milliseconds) to virtually real-time with almost no latency.

Why this happens3 is because:

  1. No packet loss
    The KL25’s message queue was not being clogged up as there are much fewer messages being sent.

  2. Efficient CPU use
    In our RTOS application, the message handler thread is not active as often, giving more clock cycles to other mission critical threads, such as the motor control threads.

All this in practice leads to:

Speedy loop the loop! (An early iteration with the caching)

Not pictured is our dear gamer friend button-mashing the controller 😉

Caching works, I guess.

Notes for observant people

The code snippets were heavily redacted and abridged to make it more ‘toy’-like for ease of reading.

We used a global variable. Evil, but the program is single-threaded and state was well controlled enough to make it a non-issue for our specific usecase :).

If > 1 button was pressed on the PS4 controller, the ordering of the if statements matter. But this is not an issue for us as we decided that certain directions SHOULD take precedence if multiple actions were pressed (left/right more important as these are evasive movements).


  1. It was striking enough for me to be inspired to write about it. Theory can only get you so far: the experimental/practical responsiveness from the improvement was pretty darn good. The change from moderately laggy to real-time level responsiveness just felt really really good. ↩︎

  2. To make the optimization useless you’d need to spam buttons at 80MHz thereabouts :D ↩︎

  3. My own postulation, other explanations may be plausible too! ↩︎