TinyML: ML Foundations | TinyML Tutorial

The ML pipeline is the same whether you're training GPT or a gesture classifier. What changes is how aggressively you constrain every step.

The Core Problem

A neural network is a function that maps inputs to outputs. You define the shape of the function (the architecture) and then find the weights that make it accurate (training). Once trained, you freeze the weights and ship them.

On a server, you ship a 500 MB file and run it on a GPU. On a microcontroller, you ship a 50 KB file and run it on a CPU with no FPU (often). The math is the same; the budget is not.

Classification vs. Regression

Most TinyML tasks fall into two buckets:

Classification assigns an input to one of N categories.

Input:   1 second of accelerometer data (3 axes × 100 Hz = 300 floats)
Output:  [0.03, 0.91, 0.06]  → class index 1 → "swipe right"

Regression predicts a continuous value.

Input:   temperature + humidity reading (2 floats)
Output:  0.73  → estimated soil moisture level

TinyML handles both, but classification is more common because it maps cleanly to discrete actions (turn on the fan, play the sound, reject the part).

Neural Network Basics

A neural network is a stack of layers. Each layer applies a linear transformation followed by a nonlinear activation.

input (300 floats)
  │
  ▼
Dense(64, relu)     64 neurons, each sees all 300 inputs
  │
  ▼
Dense(32, relu)     32 neurons, each sees all 64 from the layer above
  │
  ▼
Dense(3, softmax)   3 outputs, one per class, sum to 1.0

Training adjusts the weights in each layer by computing loss (how wrong was the prediction?) and backpropagating gradients to reduce it. You don't do any of this on the microcontroller. Training happens on your laptop or in the cloud.

Why Small Networks Work for Sensor Data

Sensor signals are low-dimensional and repetitive. A 3-axis accelerometer at 100 Hz produces 300 values per second. The patterns that distinguish "idle" from "walking" from "running" are simple enough that a few hundred parameters capture them.

Contrast this with image classification, where a 224×224 RGB image has 150 thousand input values and the patterns are complex. You can do image classification on a microcontroller, but you need a purpose-built architecture like MobileNet, and you need more RAM.

The Training-Deployment Split

This is the key mental model for TinyML:

Training time (laptop/cloud):
  - Collect and clean data
  - Design architecture
  - Train until accuracy is acceptable
  - Quantize and export to .tflite

Inference time (microcontroller):
  - Load the frozen .tflite file
  - Feed one input at a time
  - Read the output
  - Act on it

Nothing from the training graph runs on the device. TFLite Micro runs only the forward pass through a stripped-down inference engine. No backprop, no optimizer, no variable allocation at runtime.

Quantization

Full-precision training uses 32-bit floats (4 bytes per weight). A 10-layer network might have 50,000 weights, which is 200 KB of floats. That fits in the Nano's 256 KB of RAM, but barely, and there's no room for the input buffer or the activation tensors.

Quantization maps each float to an 8-bit integer:

float32 weight:  0.347  (4 bytes)
int8 weight:     44     (1 byte, scaled by a per-tensor factor)

A 4x reduction in model size, usually with less than 2% accuracy loss on sensor tasks.

Two flavors matter for this tutorial:

Post-training quantization (PTQ): convert a trained float model to int8 after the fact. Fast and simple. Works for most sensor applications.

Quantization-aware training (QAT): simulate quantization noise during training so the model learns to be robust to it. Better accuracy at higher compression. More work.

Chapter 6 covers PTQ in detail. Chapter 11 covers QAT.

Memory Layout on the Device

TFLite Micro manages four memory regions:

Region          What it holds
─────────────────────────────────────────────────────────────
Model data      The .tflite file, stored in flash (read-only)
Tensor arena    Input tensor, output tensor, intermediate activations
Op resolver     Table of operation implementations (code, in flash)
Interpreter     Metadata, bookkeeping (small, a few hundred bytes)

The tensor arena is the critical one. It lives in RAM and must be large enough for the biggest activation tensor that appears during inference. Sizing it is covered in chapter 8.

What "Good Enough" Means

A production keyword spotter might achieve 92% accuracy in a quiet room and 75% in a noisy office. For "turn on the lights" that's fine: a false negative means you say it again. For "unlock the door" the bar is higher.

The accuracy target shapes every subsequent decision: how much data to collect, how large the model can be, whether you need QAT. Fix your accuracy requirement before you start training. Otherwise you'll spend weeks optimizing toward an undefined goal.

Evaluation Metrics for Embedded Targets

Beyond classification accuracy, embedded models are evaluated on:

Metric	Why it matters
Model size (KB)	Must fit in flash
Peak RAM usage (KB)	Must fit in tensor arena
Inference latency (ms)	Determines max sample rate
Power consumption (mW)	Battery life

You'll measure all four during the optimization chapter. Start tracking them from the first training run so you have a baseline.

Next Steps

Continue to 03-development-environment.md to install the tools and verify the full pipeline works end to end before writing any model code.