TinyML: Deployment
Flashing a model to the device is five lines of firmware code wrapped in a fair amount of setup. Get the setup right once and the rest follows.
The Firmware Structure
A TFLite Micro inference sketch has four responsibilities:
- Load the model from flash
- Allocate the tensor arena and run
AllocateTensors() - On each inference: fill the input tensor, call
Invoke(), read the output tensor - Act on the result (serial print, GPIO, BLE notification, etc.)
The gesture classifier sketch below covers all four.
Complete Gesture Classifier Sketch
// gesture_infer/gesture_infer.ino
#include <Arduino_LSM6DSOX.h>
#include <TensorFlowLite.h>
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/all_ops_resolver.h"
#include "tensorflow/lite/schema/schema_generated.h"
#include "model_data.h"
// --- Configuration ---
const char* LABELS[] = {"punch", "flex", "idle"};
const int NUM_CLASSES = 3;
const int SAMPLE_RATE = 100; // Hz
const int WINDOW_SIZE = 100; // samples
const int NUM_FEATURES = WINDOW_SIZE * 3; // aX, aY, aZ per sample
// Normalization parameters (must match training/mean.npy, training/std.npy)
// Replace these with your actual values from numpy output
const float FEATURE_MEAN[NUM_FEATURES] = { /* ... 300 floats ... */ };
const float FEATURE_STD[NUM_FEATURES] = { /* ... 300 floats ... */ };
// --- TFLite setup ---
constexpr int kTensorArenaSize = 8 * 1024;
alignas(16) uint8_t tensor_arena[kTensorArenaSize];
const tflite::Model* tfl_model = nullptr;
tflite::MicroInterpreter* interpreter = nullptr;
TfLiteTensor* input_tensor = nullptr;
TfLiteTensor* output_tensor = nullptr;
// Quantization parameters (read at setup)
float in_scale = 1.0f;
int in_zp = 0;
float out_scale = 1.0f;
int out_zp = 0;
// --- Sample buffer ---
float sample_buffer[NUM_FEATURES];
int buffer_pos = 0;
bool buffer_ready = false;
void setup() {
Serial.begin(115200);
while (!Serial);
// Init IMU
if (!IMU.begin()) {
Serial.println("IMU init failed");
while (true);
}
// Load model
tfl_model = tflite::GetModel(g_gesture_model_data);
if (tfl_model->version() != TFLITE_SCHEMA_VERSION) {
Serial.println("Model schema version mismatch");
while (true);
}
// Create interpreter
static tflite::AllOpsResolver resolver;
static tflite::MicroInterpreter static_interpreter(
tfl_model, resolver, tensor_arena, kTensorArenaSize);
interpreter = &static_interpreter;
if (interpreter->AllocateTensors() != kTfLiteOk) {
Serial.println("AllocateTensors failed: increase kTensorArenaSize");
while (true);
}
input_tensor = interpreter->input(0);
output_tensor = interpreter->output(0);
// Read quantization params
in_scale = input_tensor->params.scale;
in_zp = input_tensor->params.zero_point;
out_scale = output_tensor->params.scale;
out_zp = output_tensor->params.zero_point;
Serial.println("Ready");
}
void loop() {
if (!IMU.accelerationAvailable()) return;
float ax, ay, az;
IMU.readAcceleration(ax, ay, az);
// Fill the sample buffer column by column (aX0, aY0, aZ0, aX1, aY1, aZ1, ...)
int base = buffer_pos * 3;
sample_buffer[base + 0] = ax;
sample_buffer[base + 1] = ay;
sample_buffer[base + 2] = az;
buffer_pos++;
if (buffer_pos < WINDOW_SIZE) return;
// Buffer is full; run inference
buffer_pos = 0;
buffer_ready = true;
// Normalize and quantize each feature, write to input tensor
for (int i = 0; i < NUM_FEATURES; i++) {
float norm = (sample_buffer[i] - FEATURE_MEAN[i]) / FEATURE_STD[i];
int8_t q = (int8_t)constrain(
round(norm / in_scale) + in_zp, -128, 127);
input_tensor->data.int8[i] = q;
}
if (interpreter->Invoke() != kTfLiteOk) {
Serial.println("Invoke failed");
return;
}
// Find the predicted class
float max_score = -1e9f;
int max_class = -1;
for (int i = 0; i < NUM_CLASSES; i++) {
float score = (output_tensor->data.int8[i] - out_zp) * out_scale;
if (score > max_score) {
max_score = score;
max_class = i;
}
}
Serial.print("Prediction: ");
Serial.print(LABELS[max_class]);
Serial.print(" (");
Serial.print(max_score, 3);
Serial.println(")");
}
Embedding the Normalization Parameters
The FEATURE_MEAN and FEATURE_STD arrays need the actual values from your training run. A small Python script generates the Arduino initializer syntax:
# gen_norm_header.py
import numpy as np
mean = np.load("training/mean.npy")
std = np.load("training/std.npy")
def fmt_array(name, arr):
vals = ", ".join(f"{v:.6f}f" for v in arr)
return f"const float {name}[{len(arr)}] = {{{vals}}};"
with open("firmware/gesture_infer/norm_params.h", "w") as f:
f.write("#pragma once\n\n")
f.write(fmt_array("FEATURE_MEAN", mean) + "\n")
f.write(fmt_array("FEATURE_STD", std) + "\n")
Replace the placeholder arrays in the sketch with #include "norm_params.h" and remove the manual declarations.
Sizing the Tensor Arena
kTensorArenaSize = 8 * 1024 is a guess. To find the actual minimum, TFLite Micro provides a helper (available in recent versions):
// After AllocateTensors():
size_t used = interpreter->arena_used_bytes();
Serial.print("Arena used: ");
Serial.print(used);
Serial.println(" bytes");
Set kTensorArenaSize to used + 512 as a safety margin. Oversizing wastes RAM; undersizing causes AllocateTensors() to fail silently (or crash).
On the Nano 33 BLE Sense, 8 KB is enough for the gesture model from chapter 5. For a MobileNetV1 (image), expect 200-300 KB.
Compiling and Flashing
# From the repo root
arduino-cli compile \
--fqbn arduino:mbed_nano:nano33ble \
firmware/gesture_infer/
arduino-cli upload \
--fqbn arduino:mbed_nano:nano33ble \
-p /dev/ttyACM0 \
firmware/gesture_infer/
arduino-cli monitor -p /dev/ttyACM0 --config baudrate=115200
Once running, perform a punch gesture and watch the Serial Monitor. The output should look like:
Ready
Prediction: idle (0.874)
Prediction: idle (0.912)
Prediction: punch (0.967)
Prediction: idle (0.843)
ESP32-S3 Notes
On the ESP32-S3, the FQBN and sensor library differ:
arduino-cli compile \
--fqbn esp32:esp32:esp32s3 \
firmware/gesture_infer/
The tensor arena can be larger (ESP32-S3 has 512 KB RAM), so you can afford a bigger model. Also, the ESP32-S3 has vector instructions that accelerate dot products (the core operation in dense layers). TFLite Micro uses them automatically when compiled with -O3 -mfpu=fp-armv8, which is the default for ESP32 Arduino builds.
For sensors on ESP32-S3, you'll wire an external MPU6050 or LSM6DS3 over I2C:
#include <Adafruit_MPU6050.h>
Adafruit_MPU6050 mpu;
void setup() {
Wire.begin(SDA_PIN, SCL_PIN); // set your pin assignments
mpu.begin();
mpu.setAccelerometerRange(MPU6050_RANGE_8_G);
}
Debugging Inference
Model outputs all the same value. Check the input quantization. If in_scale is 0.0 (happens when you load a float model as if it were int8), every input quantizes to the zero point and the model predicts uniform probability.
AllocateTensors() returns kTfLiteError. The tensor arena is too small. Double kTensorArenaSize and try again. Print arena_used_bytes() after a successful allocation on a larger arena to find the exact minimum.
Predictions are correct but inverted. Your label array order doesn't match the output neuron order. Fix the LABELS array, not the model.
Serial output is garbled. Baud rate mismatch. The monitor must match the Serial.begin() argument.
Confidence Thresholding
Rejecting low-confidence predictions avoids acting on noise:
const float CONFIDENCE_THRESHOLD = 0.70f;
if (max_score < CONFIDENCE_THRESHOLD) {
Serial.println("Uncertain: ignoring");
return;
}
Set the threshold by testing with real sensor data and examining the score distribution. Idle periods near boundaries between gestures often produce scores around 0.5. A threshold of 0.7 usually filters them out without rejecting clear positives.
Next Steps
Continue to 08-inference-engine.md for a deeper look at how TFLite Micro allocates memory, resolves operations, and handles the interpreter lifecycle.