TinyML: Introduction

Machine learning has been running in your pocket for years. Now it fits inside a chip the size of your thumbnail.

What TinyML Is

TinyML is the practice of running trained ML inference on microcontrollers and other severely resource-constrained hardware, typically devices with less than 1 MB of RAM and no operating system.

The term covers the whole pipeline: collect data on the device, train a model on a desktop or in the cloud, compress it, and deploy it to the hardware where it runs inference in real time.

A few examples from actual products (as of 2026):

  • Keyword detection ("Hey Siri", "OK Google") running on a Cortex-M0+ consuming under 1 mW
  • Gesture recognition on a fitness tracker with a 3-axis accelerometer
  • Predictive maintenance on an industrial motor controller detecting bearing failure from vibration
  • Anomaly detection on a $3 air quality sensor that never phones home

The common thread: the computation happens on the sensor. No cloud. No connectivity required.

Why It Matters

Running inference locally removes three problems that cloud-based ML doesn't solve well:

Latency. A keyword spotter has to respond in under 200 ms to feel instant. A round trip to a server takes 100 to 500 ms before you've done anything.

Privacy. Industrial or medical sensors often can't legally send raw data off-device. Local inference processes the signal and discards it.

Cost. Running inference in the cloud costs money per request. A microcontroller draws 10 mW and processes thousands of inferences per second for free (after the hardware cost).

The tradeoff is accuracy. A model that fits in 100 KB will not be as accurate as one that uses 500 MB. TinyML is about finding the point where "good enough" and "fits on the device" overlap.

The Hardware Landscape

Reference Board for This Tutorial

The Arduino Nano 33 BLE Sense Rev2 is the reference hardware throughout. It has:

CPU:        Nordic nRF52840 (Cortex-M4F @ 64 MHz)
Flash:      1 MB
RAM:        256 KB
Sensors:    IMU (LSM6DSOX), microphone (MP34DT06JTR),
            barometric pressure, humidity, temperature,
            color + gesture
Wireless:   BLE 5.0
Price:      ~$25

It is the board the TensorFlow Lite Micro examples were originally written for, so documentation coverage is good.

Other Supported Boards

Notes for the ESP32-S3 are included where the workflow differs. Its specs:

CPU:        Xtensa LX7 dual-core @ 240 MHz
Flash:      up to 16 MB
RAM:        512 KB + up to 8 MB PSRAM
Extras:     vector instructions for ML acceleration
Price:      ~$5 to $15 depending on module

The ESP32-S3 is faster and cheaper than the Nano but lacks the built-in sensor suite, so sensor chapters assume you've wired up external modules.

Other boards that work with TFLite Micro: Raspberry Pi Pico, STM32 Nucleo series, SparkFun Edge 2, Seeed Xiao nRF52840 Sense.

Your First Inference

Install the Arduino IDE and the TensorFlow Lite library before continuing.

# Install arduino-cli (macOS/Linux)
curl -fsSL https://raw.githubusercontent.com/arduino/arduino-cli/master/install.sh | sh

# Add the mbed core for Nano 33 BLE
arduino-cli core install arduino:mbed_nano

# Install TFLite Micro library
arduino-cli lib install "Arduino_TensorFlowLite"

The simplest possible TFLite Micro program runs a model that just adds 3 to its input. It verifies the toolchain works before you commit hours to a real project.

// hello_ml.ino
#include <TensorFlowLite.h>
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/all_ops_resolver.h"
#include "tensorflow/lite/schema/schema_generated.h"
#include "sine_model_data.h"  // bundled with the library examples

constexpr int kTensorArenaSize = 2 * 1024;
alignas(16) uint8_t tensor_arena[kTensorArenaSize];

void setup() {
  Serial.begin(115200);
  while (!Serial);

  const tflite::Model* model = tflite::GetModel(g_sine_model_data);
  if (model->version() != TFLITE_SCHEMA_VERSION) {
    Serial.println("Model schema mismatch");
    return;
  }

  tflite::AllOpsResolver resolver;
  tflite::MicroInterpreter interpreter(
      model, resolver, tensor_arena, kTensorArenaSize);

  if (interpreter.AllocateTensors() != kTfLiteOk) {
    Serial.println("AllocateTensors failed");
    return;
  }

  TfLiteTensor* input = interpreter.input(0);
  input->data.f[0] = 0.0f;  // feed x = 0 radians

  if (interpreter.Invoke() != kTfLiteOk) {
    Serial.println("Invoke failed");
    return;
  }

  TfLiteTensor* output = interpreter.output(0);
  Serial.print("sin(0) predicted: ");
  Serial.println(output->data.f[0]);  // should be near 0.0
}

void loop() {}

Flash it and open the Serial Monitor at 115200 baud. You should see something like:

sin(0) predicted: 0.0023

Not exactly zero because the model is an approximation trained on floating-point data. That tiny error is normal and worth understanding: neural networks are probability machines, not calculators.

The TinyML Pipeline

Every TinyML project follows the same five stages:

1. Data Collection    Capture sensor readings labeled by class or value
2. Model Training     Train a small neural network on a desktop or cloud
3. Conversion         Quantize and convert to TFLite flat buffer (.tflite)
4. Deployment         Embed the model as a C array and flash to device
5. Inference          Run the interpreter loop, read outputs, act on them

Chapters 4-8 map directly to these stages. Chapters 9-11 cover the variations that real projects encounter.

Common Pitfalls Early On

Forgetting to align the tensor arena. alignas(16) is required for many platforms. Skip it and you'll get intermittent crashes that are hard to trace.

Building a model that's too large. A 1 MB model won't fit in 256 KB of RAM. Check model size at training time, not after flashing.

Using ADC2 pins on ESP32 while WiFi is active. See the ESP32 tutorial for the full ADC gotcha list. (If you're on Nano 33, ignore this.)

Not verifying the schema version. TFLite Micro and the converter must be in sync. The check in the example above is not optional boilerplate.

Next Steps

Continue to 02-ml-foundations.md to review the ML concepts that TinyML depends on and see how the standard pipeline changes for constrained hardware.