TinyML: Sensor Integration

The gesture classifier in the previous chapters used a clean, pre-buffered array. Real sensors are messier: they arrive over interrupts, have timing jitter, and sometimes fail to produce a reading on the cycle you expect.

The Three Sensor Categories

Most TinyML applications use one of three input modalities:

ModalityTypical sensorPreprocessingMemory cost
Motion/vibrationAccelerometer, gyroscopeWindowing, normalizationLow
AudioPDM microphoneMFCC or mel spectrogramMedium
ImageQVGA camera (OV2640)Resize, normalizeHigh

Image classification is addressed at the end with notes on memory constraints.

Motion: Sliding Window Inference

The chapter 7 gesture classifier used a tumbling window (non-overlapping): collect 100 samples, run inference, collect the next 100. This misses gestures that start in the second half of a window.

A sliding window processes every N new samples instead of every 100:

// sliding_window.ino (relevant section)
const int WINDOW_SIZE = 100;
const int HOP_SIZE    = 20;   // run inference every 20 new samples

float window[WINDOW_SIZE][3];
int   samples_since_last = 0;

void loop() {
  if (!IMU.accelerationAvailable()) return;

  float ax, ay, az;
  IMU.readAcceleration(ax, ay, az);

  // Shift the window left by 1 and insert new sample at the end
  memmove(window, window + 1, (WINDOW_SIZE - 1) * 3 * sizeof(float));
  window[WINDOW_SIZE - 1][0] = ax;
  window[WINDOW_SIZE - 1][1] = ay;
  window[WINDOW_SIZE - 1][2] = az;
  samples_since_last++;

  if (samples_since_last < HOP_SIZE) return;
  samples_since_last = 0;

  // Flatten and normalize
  for (int i = 0; i < WINDOW_SIZE; i++) {
    for (int j = 0; j < 3; j++) {
      int idx  = i * 3 + j;
      float norm = (window[i][j] - FEATURE_MEAN[idx]) / FEATURE_STD[idx];
      int8_t q   = (int8_t)constrain(
          round(norm / in_scale) + in_zp, -128, 127);
      input_tensor->data.int8[idx] = q;
    }
  }

  interpreter->Invoke();
  // read output as before
}

The memmove approach is simple but slow for large windows. For production code, use a circular buffer:

// Circular buffer for the window
float ring[WINDOW_SIZE][3];
int   ring_head = 0;   // index of the oldest sample

void push_sample(float ax, float ay, float az) {
  ring[ring_head][0] = ax;
  ring[ring_head][1] = ay;
  ring[ring_head][2] = az;
  ring_head = (ring_head + 1) % WINDOW_SIZE;
}

// To fill the input tensor, iterate starting from ring_head (oldest)
void fill_input() {
  for (int i = 0; i < WINDOW_SIZE; i++) {
    int idx_in_ring = (ring_head + i) % WINDOW_SIZE;
    for (int j = 0; j < 3; j++) {
      int feat_idx = i * 3 + j;
      float norm   = (ring[idx_in_ring][j] - FEATURE_MEAN[feat_idx])
                     / FEATURE_STD[feat_idx];
      input_tensor->data.int8[feat_idx] =
          (int8_t)constrain(round(norm / in_scale) + in_zp, -128, 127);
    }
  }
}

Audio: Reading the PDM Microphone

The Nano 33 BLE Sense has a PDM microphone accessible via the Arduino_PDM library. It delivers 16-bit PCM samples at 16 kHz via a DMA-backed interrupt buffer.

// audio_input.ino
#include <PDM.h>

const int SAMPLE_RATE    = 16000;
const int FRAME_SIZE     = 512;    // samples per DMA callback

short     audio_buf[FRAME_SIZE];
volatile bool buf_ready = false;

void onPDMdata() {
  int available = PDM.available();
  PDM.read(audio_buf, available);
  buf_ready = true;
}

void setup() {
  Serial.begin(115200);
  PDM.onReceive(onPDMdata);
  PDM.setGain(20);          // 0–80 dB; 20 is a reasonable default
  PDM.begin(1, SAMPLE_RATE);
}

void loop() {
  if (!buf_ready) return;
  buf_ready = false;

  // audio_buf now contains FRAME_SIZE int16 PCM samples
  // Feed them into your feature extractor
  process_audio_frame(audio_buf, FRAME_SIZE);
}

Audio Preprocessing: MFCC

Raw PCM is not a useful model input for keyword spotting. The standard preprocessing pipeline extracts Mel-Frequency Cepstral Coefficients (MFCCs):

PCM samples (16 kHz int16)
  │
  ▼ Frame into 25 ms windows with 10 ms hop
  │
  ▼ Apply Hann window to each frame
  │
  ▼ Compute FFT (256 point)
  │
  ▼ Apply mel filterbank (40 filters)
  │
  ▼ Log of filterbank energies
  │
  ▼ DCT → keep first 13 coefficients (MFCCs)
  │
  ▼ Stack frames → (N_frames × 13) matrix
  │
  ▼ Normalize → model input

Implementing MFCC from scratch is about 100 lines of C++. Several Arduino libraries wrap this. The TFLite Micro examples for keyword spotting include a reference implementation:

// From TFLite Micro keyword_spotting example (simplified interface)
#include "micro_features_micro_features_generator.h"
#include "micro_features_micro_model_settings.h"

int8_t feature_data[kFeatureElementCount];  // kFeatureElementCount = kFeatureSliceCount × kFeatureSliceSize

TfLiteStatus status = GenerateMicroFeatures(
    nullptr,           // error reporter
    audio_buf,         // int16 PCM
    FRAME_SIZE,        // number of samples
    kFeatureSliceSize, // 40 (mel bins)
    feature_data,      // output buffer
    nullptr            // feature size pointer
);

The resulting feature_data is a quantized mel spectrogram ready to feed into the model.

Audio: Building a Simple Keyword Spotter

For a "yes" vs "no" vs "unknown" classifier, the input is a 1-second window (16000 samples) processed into a (49 × 40) mel spectrogram (49 frames × 40 mel bins = 1960 features).

On the Python side, use TensorFlow's speech commands dataset preprocessing:

# audio_features.py
import tensorflow as tf
import numpy as np

def compute_mel_spectrogram(waveform, sample_rate=16000, n_mels=40,
                            frame_length=400, frame_step=160):
    # waveform: float32 array, shape (16000,)
    stfts      = tf.signal.stft(waveform, frame_length=frame_length,
                                 frame_step=frame_step)
    spectrogram = tf.abs(stfts)
    num_spectrogram_bins = stfts.shape[-1]

    lower_edge_hz, upper_edge_hz = 80.0, 7600.0
    linear_to_mel_weight_matrix = tf.signal.linear_to_mel_weight_matrix(
        n_mels, num_spectrogram_bins, sample_rate,
        lower_edge_hz, upper_edge_hz)

    mel = tf.tensordot(spectrogram, linear_to_mel_weight_matrix, 1)
    log_mel = tf.math.log(mel + 1e-6)
    return log_mel   # shape: (99, 40)

Image: Camera Input on ESP32-S3

Image classification needs more RAM than the Nano 33 BLE has. Use the ESP32-S3 with an OV2640 camera module.

// camera_input.ino (ESP32-S3 + OV2640)
#include "esp_camera.h"

camera_config_t config = {
  .pin_pwdn  = -1,
  .pin_reset = -1,
  .pin_xclk  = 21,
  .pin_sscb_sda = 26,
  .pin_sscb_scl = 27,
  .pin_d7 = 35, .pin_d6 = 34, .pin_d5 = 39,
  .pin_d4 = 36, .pin_d3 = 19, .pin_d2 = 18,
  .pin_d1 =  5, .pin_d0 =  4,
  .pin_vsync = 25,
  .pin_href  = 23,
  .pin_pclk  = 22,
  .xclk_freq_hz = 20000000,
  .pixel_format = PIXFORMAT_GRAYSCALE,
  .frame_size   = FRAMESIZE_96X96,   // 96×96 for MobileNet
  .jpeg_quality = 0,
  .fb_count     = 1,
};

esp_camera_init(&config);

camera_fb_t* fb = esp_camera_fb_get();
// fb->buf contains 96×96 grayscale pixels as uint8
// Normalize to float or int8 for the model
esp_camera_fb_return(fb);

The MobileNetV1 0.25 model (96×96 input) expects values in [-128, 127] after int8 quantization. Subtract 128 from each pixel:

for (int i = 0; i < 96 * 96; i++) {
  input_tensor->data.int8[i] = (int8_t)(fb->buf[i] - 128);
}

Synchronizing Sensor Timing

On a bare microcontroller, sensor timing is your responsibility. Common strategies:

Polling in the main loop: simple, but loop() must be fast enough not to miss samples. Avoid any delay() calls when polling at 100 Hz.

Timer interrupt: the most accurate approach for consistent sample rates. Sets a flag; the main loop acts on the flag.

hw_timer_t* timer = NULL;
volatile bool sample_flag = false;

void IRAM_ATTR onTimer() {
  sample_flag = true;
}

void setup() {
  timer = timerBegin(0, 80, true);              // 80 MHz / 80 = 1 MHz tick
  timerAttachInterrupt(timer, &onTimer, true);
  timerAlarmWrite(timer, 10000, true);          // 10 ms = 100 Hz
  timerAlarmEnable(timer);
}

void loop() {
  if (!sample_flag) return;
  sample_flag = false;
  // read sensor here
}

DMA-backed interrupt (audio, camera): the hardware fills a buffer and fires an interrupt. Your code processes the buffer when it's ready. This is what PDM.onReceive() does internally.

Next Steps

Continue to 10-edge-impulse.md to see how Edge Impulse's cloud workflow handles data collection, preprocessing, and model training without writing all the glue code manually.