TinyML: Sensor Integration | TinyML Tutorial

The gesture classifier in the previous chapters used a clean, pre-buffered array. Real sensors are messier: they arrive over interrupts, have timing jitter, and sometimes fail to produce a reading on the cycle you expect.

The Three Sensor Categories

Most TinyML applications use one of three input modalities:

Modality	Typical sensor	Preprocessing	Memory cost
Motion/vibration	Accelerometer, gyroscope	Windowing, normalization	Low
Audio	PDM microphone	MFCC or mel spectrogram	Medium
Image	QVGA camera (OV2640)	Resize, normalize	High

Image classification is addressed at the end with notes on memory constraints.

Motion: Sliding Window Inference

The chapter 7 gesture classifier used a tumbling window (non-overlapping): collect 100 samples, run inference, collect the next 100. This misses gestures that start in the second half of a window.

A sliding window processes every N new samples instead of every 100:

// sliding_window.ino (relevant section)
const int WINDOW_SIZE = 100;
const int HOP_SIZE    = 20;   // run inference every 20 new samples

float window[WINDOW_SIZE][3];
int   samples_since_last = 0;

void loop() {
  if (!IMU.accelerationAvailable()) return;

  float ax, ay, az;
  IMU.readAcceleration(ax, ay, az);

  // Shift the window left by 1 and insert new sample at the end
  memmove(window, window + 1, (WINDOW_SIZE - 1) * 3 * sizeof(float));
  window[WINDOW_SIZE - 1][0] = ax;
  window[WINDOW_SIZE - 1][1] = ay;
  window[WINDOW_SIZE - 1][2] = az;
  samples_since_last++;

  if (samples_since_last < HOP_SIZE) return;
  samples_since_last = 0;

  // Flatten and normalize
  for (int i = 0; i < WINDOW_SIZE; i++) {
    for (int j = 0; j < 3; j++) {
      int idx  = i * 3 + j;
      float norm = (window[i][j] - FEATURE_MEAN[idx]) / FEATURE_STD[idx];
      int8_t q   = (int8_t)constrain(
          round(norm / in_scale) + in_zp, -128, 127);
      input_tensor->data.int8[idx] = q;
    }
  }

  interpreter->Invoke();
  // read output as before
}

The memmove approach is simple but slow for large windows. For production code, use a circular buffer:

// Circular buffer for the window
float ring[WINDOW_SIZE][3];
int   ring_head = 0;   // index of the oldest sample

void push_sample(float ax, float ay, float az) {
  ring[ring_head][0] = ax;
  ring[ring_head][1] = ay;
  ring[ring_head][2] = az;
  ring_head = (ring_head + 1) % WINDOW_SIZE;
}

// To fill the input tensor, iterate starting from ring_head (oldest)
void fill_input() {
  for (int i = 0; i < WINDOW_SIZE; i++) {
    int idx_in_ring = (ring_head + i) % WINDOW_SIZE;
    for (int j = 0; j < 3; j++) {
      int feat_idx = i * 3 + j;
      float norm   = (ring[idx_in_ring][j] - FEATURE_MEAN[feat_idx])
                     / FEATURE_STD[feat_idx];
      input_tensor->data.int8[feat_idx] =
          (int8_t)constrain(round(norm / in_scale) + in_zp, -128, 127);
    }
  }
}

Audio: Reading the PDM Microphone

The Nano 33 BLE Sense has a PDM microphone accessible via the Arduino_PDM library. It delivers 16-bit PCM samples at 16 kHz via a DMA-backed interrupt buffer.

// audio_input.ino
#include <PDM.h>

const int SAMPLE_RATE    = 16000;
const int FRAME_SIZE     = 512;    // samples per DMA callback

short     audio_buf[FRAME_SIZE];
volatile bool buf_ready = false;

void onPDMdata() {
  int available = PDM.available();
  PDM.read(audio_buf, available);
  buf_ready = true;
}

void setup() {
  Serial.begin(115200);
  PDM.onReceive(onPDMdata);
  PDM.setGain(20);          // 0–80 dB; 20 is a reasonable default
  PDM.begin(1, SAMPLE_RATE);
}

void loop() {
  if (!buf_ready) return;
  buf_ready = false;

  // audio_buf now contains FRAME_SIZE int16 PCM samples
  // Feed them into your feature extractor
  process_audio_frame(audio_buf, FRAME_SIZE);
}

Audio Preprocessing: MFCC

Raw PCM is not a useful model input for keyword spotting. The standard preprocessing pipeline extracts Mel-Frequency Cepstral Coefficients (MFCCs):

PCM samples (16 kHz int16)
  │
  ▼ Frame into 25 ms windows with 10 ms hop
  │
  ▼ Apply Hann window to each frame
  │
  ▼ Compute FFT (256 point)
  │
  ▼ Apply mel filterbank (40 filters)
  │
  ▼ Log of filterbank energies
  │
  ▼ DCT → keep first 13 coefficients (MFCCs)
  │
  ▼ Stack frames → (N_frames × 13) matrix
  │
  ▼ Normalize → model input

Implementing MFCC from scratch is about 100 lines of C++. Several Arduino libraries wrap this. The TFLite Micro examples for keyword spotting include a reference implementation:

// From TFLite Micro keyword_spotting example (simplified interface)
#include "micro_features_micro_features_generator.h"
#include "micro_features_micro_model_settings.h"

int8_t feature_data[kFeatureElementCount];  // kFeatureElementCount = kFeatureSliceCount × kFeatureSliceSize

TfLiteStatus status = GenerateMicroFeatures(
    nullptr,           // error reporter
    audio_buf,         // int16 PCM
    FRAME_SIZE,        // number of samples
    kFeatureSliceSize, // 40 (mel bins)
    feature_data,      // output buffer
    nullptr            // feature size pointer
);

The resulting feature_data is a quantized mel spectrogram ready to feed into the model.

Audio: Building a Simple Keyword Spotter

For a "yes" vs "no" vs "unknown" classifier, the input is a 1-second window (16000 samples) processed into a (49 × 40) mel spectrogram (49 frames × 40 mel bins = 1960 features).

On the Python side, use TensorFlow's speech commands dataset preprocessing:

# audio_features.py
import tensorflow as tf
import numpy as np

def compute_mel_spectrogram(waveform, sample_rate=16000, n_mels=40,
                            frame_length=400, frame_step=160):
    # waveform: float32 array, shape (16000,)
    stfts      = tf.signal.stft(waveform, frame_length=frame_length,
                                 frame_step=frame_step)
    spectrogram = tf.abs(stfts)
    num_spectrogram_bins = stfts.shape[-1]

    lower_edge_hz, upper_edge_hz = 80.0, 7600.0
    linear_to_mel_weight_matrix = tf.signal.linear_to_mel_weight_matrix(
        n_mels, num_spectrogram_bins, sample_rate,
        lower_edge_hz, upper_edge_hz)

    mel = tf.tensordot(spectrogram, linear_to_mel_weight_matrix, 1)
    log_mel = tf.math.log(mel + 1e-6)
    return log_mel   # shape: (99, 40)

Image: Camera Input on ESP32-S3

Image classification needs more RAM than the Nano 33 BLE has. Use the ESP32-S3 with an OV2640 camera module.

// camera_input.ino (ESP32-S3 + OV2640)
#include "esp_camera.h"

camera_config_t config = {
  .pin_pwdn  = -1,
  .pin_reset = -1,
  .pin_xclk  = 21,
  .pin_sscb_sda = 26,
  .pin_sscb_scl = 27,
  .pin_d7 = 35, .pin_d6 = 34, .pin_d5 = 39,
  .pin_d4 = 36, .pin_d3 = 19, .pin_d2 = 18,
  .pin_d1 =  5, .pin_d0 =  4,
  .pin_vsync = 25,
  .pin_href  = 23,
  .pin_pclk  = 22,
  .xclk_freq_hz = 20000000,
  .pixel_format = PIXFORMAT_GRAYSCALE,
  .frame_size   = FRAMESIZE_96X96,   // 96×96 for MobileNet
  .jpeg_quality = 0,
  .fb_count     = 1,
};

esp_camera_init(&config);

camera_fb_t* fb = esp_camera_fb_get();
// fb->buf contains 96×96 grayscale pixels as uint8
// Normalize to float or int8 for the model
esp_camera_fb_return(fb);

The MobileNetV1 0.25 model (96×96 input) expects values in [-128, 127] after int8 quantization. Subtract 128 from each pixel:

for (int i = 0; i < 96 * 96; i++) {
  input_tensor->data.int8[i] = (int8_t)(fb->buf[i] - 128);
}

Synchronizing Sensor Timing

On a bare microcontroller, sensor timing is your responsibility. Common strategies:

Polling in the main loop: simple, but loop() must be fast enough not to miss samples. Avoid any delay() calls when polling at 100 Hz.

Timer interrupt: the most accurate approach for consistent sample rates. Sets a flag; the main loop acts on the flag.

hw_timer_t* timer = NULL;
volatile bool sample_flag = false;

void IRAM_ATTR onTimer() {
  sample_flag = true;
}

void setup() {
  timer = timerBegin(0, 80, true);              // 80 MHz / 80 = 1 MHz tick
  timerAttachInterrupt(timer, &onTimer, true);
  timerAlarmWrite(timer, 10000, true);          // 10 ms = 100 Hz
  timerAlarmEnable(timer);
}

void loop() {
  if (!sample_flag) return;
  sample_flag = false;
  // read sensor here
}

DMA-backed interrupt (audio, camera): the hardware fills a buffer and fires an interrupt. Your code processes the buffer when it's ready. This is what PDM.onReceive() does internally.

Next Steps

Continue to 10-edge-impulse.md to see how Edge Impulse's cloud workflow handles data collection, preprocessing, and model training without writing all the glue code manually.