TinyML: Sensor Integration
The gesture classifier in the previous chapters used a clean, pre-buffered array. Real sensors are messier: they arrive over interrupts, have timing jitter, and sometimes fail to produce a reading on the cycle you expect.
The Three Sensor Categories
Most TinyML applications use one of three input modalities:
| Modality | Typical sensor | Preprocessing | Memory cost |
|---|---|---|---|
| Motion/vibration | Accelerometer, gyroscope | Windowing, normalization | Low |
| Audio | PDM microphone | MFCC or mel spectrogram | Medium |
| Image | QVGA camera (OV2640) | Resize, normalize | High |
Image classification is addressed at the end with notes on memory constraints.
Motion: Sliding Window Inference
The chapter 7 gesture classifier used a tumbling window (non-overlapping): collect 100 samples, run inference, collect the next 100. This misses gestures that start in the second half of a window.
A sliding window processes every N new samples instead of every 100:
// sliding_window.ino (relevant section)
const int WINDOW_SIZE = 100;
const int HOP_SIZE = 20; // run inference every 20 new samples
float window[WINDOW_SIZE][3];
int samples_since_last = 0;
void loop() {
if (!IMU.accelerationAvailable()) return;
float ax, ay, az;
IMU.readAcceleration(ax, ay, az);
// Shift the window left by 1 and insert new sample at the end
memmove(window, window + 1, (WINDOW_SIZE - 1) * 3 * sizeof(float));
window[WINDOW_SIZE - 1][0] = ax;
window[WINDOW_SIZE - 1][1] = ay;
window[WINDOW_SIZE - 1][2] = az;
samples_since_last++;
if (samples_since_last < HOP_SIZE) return;
samples_since_last = 0;
// Flatten and normalize
for (int i = 0; i < WINDOW_SIZE; i++) {
for (int j = 0; j < 3; j++) {
int idx = i * 3 + j;
float norm = (window[i][j] - FEATURE_MEAN[idx]) / FEATURE_STD[idx];
int8_t q = (int8_t)constrain(
round(norm / in_scale) + in_zp, -128, 127);
input_tensor->data.int8[idx] = q;
}
}
interpreter->Invoke();
// read output as before
}
The memmove approach is simple but slow for large windows. For production code, use a circular buffer:
// Circular buffer for the window
float ring[WINDOW_SIZE][3];
int ring_head = 0; // index of the oldest sample
void push_sample(float ax, float ay, float az) {
ring[ring_head][0] = ax;
ring[ring_head][1] = ay;
ring[ring_head][2] = az;
ring_head = (ring_head + 1) % WINDOW_SIZE;
}
// To fill the input tensor, iterate starting from ring_head (oldest)
void fill_input() {
for (int i = 0; i < WINDOW_SIZE; i++) {
int idx_in_ring = (ring_head + i) % WINDOW_SIZE;
for (int j = 0; j < 3; j++) {
int feat_idx = i * 3 + j;
float norm = (ring[idx_in_ring][j] - FEATURE_MEAN[feat_idx])
/ FEATURE_STD[feat_idx];
input_tensor->data.int8[feat_idx] =
(int8_t)constrain(round(norm / in_scale) + in_zp, -128, 127);
}
}
}
Audio: Reading the PDM Microphone
The Nano 33 BLE Sense has a PDM microphone accessible via the Arduino_PDM library. It delivers 16-bit PCM samples at 16 kHz via a DMA-backed interrupt buffer.
// audio_input.ino
#include <PDM.h>
const int SAMPLE_RATE = 16000;
const int FRAME_SIZE = 512; // samples per DMA callback
short audio_buf[FRAME_SIZE];
volatile bool buf_ready = false;
void onPDMdata() {
int available = PDM.available();
PDM.read(audio_buf, available);
buf_ready = true;
}
void setup() {
Serial.begin(115200);
PDM.onReceive(onPDMdata);
PDM.setGain(20); // 0–80 dB; 20 is a reasonable default
PDM.begin(1, SAMPLE_RATE);
}
void loop() {
if (!buf_ready) return;
buf_ready = false;
// audio_buf now contains FRAME_SIZE int16 PCM samples
// Feed them into your feature extractor
process_audio_frame(audio_buf, FRAME_SIZE);
}
Audio Preprocessing: MFCC
Raw PCM is not a useful model input for keyword spotting. The standard preprocessing pipeline extracts Mel-Frequency Cepstral Coefficients (MFCCs):
PCM samples (16 kHz int16)
│
▼ Frame into 25 ms windows with 10 ms hop
│
▼ Apply Hann window to each frame
│
▼ Compute FFT (256 point)
│
▼ Apply mel filterbank (40 filters)
│
▼ Log of filterbank energies
│
▼ DCT → keep first 13 coefficients (MFCCs)
│
▼ Stack frames → (N_frames × 13) matrix
│
▼ Normalize → model input
Implementing MFCC from scratch is about 100 lines of C++. Several Arduino libraries wrap this. The TFLite Micro examples for keyword spotting include a reference implementation:
// From TFLite Micro keyword_spotting example (simplified interface)
#include "micro_features_micro_features_generator.h"
#include "micro_features_micro_model_settings.h"
int8_t feature_data[kFeatureElementCount]; // kFeatureElementCount = kFeatureSliceCount × kFeatureSliceSize
TfLiteStatus status = GenerateMicroFeatures(
nullptr, // error reporter
audio_buf, // int16 PCM
FRAME_SIZE, // number of samples
kFeatureSliceSize, // 40 (mel bins)
feature_data, // output buffer
nullptr // feature size pointer
);
The resulting feature_data is a quantized mel spectrogram ready to feed into the model.
Audio: Building a Simple Keyword Spotter
For a "yes" vs "no" vs "unknown" classifier, the input is a 1-second window (16000 samples) processed into a (49 × 40) mel spectrogram (49 frames × 40 mel bins = 1960 features).
On the Python side, use TensorFlow's speech commands dataset preprocessing:
# audio_features.py
import tensorflow as tf
import numpy as np
def compute_mel_spectrogram(waveform, sample_rate=16000, n_mels=40,
frame_length=400, frame_step=160):
# waveform: float32 array, shape (16000,)
stfts = tf.signal.stft(waveform, frame_length=frame_length,
frame_step=frame_step)
spectrogram = tf.abs(stfts)
num_spectrogram_bins = stfts.shape[-1]
lower_edge_hz, upper_edge_hz = 80.0, 7600.0
linear_to_mel_weight_matrix = tf.signal.linear_to_mel_weight_matrix(
n_mels, num_spectrogram_bins, sample_rate,
lower_edge_hz, upper_edge_hz)
mel = tf.tensordot(spectrogram, linear_to_mel_weight_matrix, 1)
log_mel = tf.math.log(mel + 1e-6)
return log_mel # shape: (99, 40)
Image: Camera Input on ESP32-S3
Image classification needs more RAM than the Nano 33 BLE has. Use the ESP32-S3 with an OV2640 camera module.
// camera_input.ino (ESP32-S3 + OV2640)
#include "esp_camera.h"
camera_config_t config = {
.pin_pwdn = -1,
.pin_reset = -1,
.pin_xclk = 21,
.pin_sscb_sda = 26,
.pin_sscb_scl = 27,
.pin_d7 = 35, .pin_d6 = 34, .pin_d5 = 39,
.pin_d4 = 36, .pin_d3 = 19, .pin_d2 = 18,
.pin_d1 = 5, .pin_d0 = 4,
.pin_vsync = 25,
.pin_href = 23,
.pin_pclk = 22,
.xclk_freq_hz = 20000000,
.pixel_format = PIXFORMAT_GRAYSCALE,
.frame_size = FRAMESIZE_96X96, // 96×96 for MobileNet
.jpeg_quality = 0,
.fb_count = 1,
};
esp_camera_init(&config);
camera_fb_t* fb = esp_camera_fb_get();
// fb->buf contains 96×96 grayscale pixels as uint8
// Normalize to float or int8 for the model
esp_camera_fb_return(fb);
The MobileNetV1 0.25 model (96×96 input) expects values in [-128, 127] after int8 quantization. Subtract 128 from each pixel:
for (int i = 0; i < 96 * 96; i++) {
input_tensor->data.int8[i] = (int8_t)(fb->buf[i] - 128);
}
Synchronizing Sensor Timing
On a bare microcontroller, sensor timing is your responsibility. Common strategies:
Polling in the main loop: simple, but loop() must be fast enough not to miss samples. Avoid any delay() calls when polling at 100 Hz.
Timer interrupt: the most accurate approach for consistent sample rates. Sets a flag; the main loop acts on the flag.
hw_timer_t* timer = NULL;
volatile bool sample_flag = false;
void IRAM_ATTR onTimer() {
sample_flag = true;
}
void setup() {
timer = timerBegin(0, 80, true); // 80 MHz / 80 = 1 MHz tick
timerAttachInterrupt(timer, &onTimer, true);
timerAlarmWrite(timer, 10000, true); // 10 ms = 100 Hz
timerAlarmEnable(timer);
}
void loop() {
if (!sample_flag) return;
sample_flag = false;
// read sensor here
}
DMA-backed interrupt (audio, camera): the hardware fills a buffer and fires an interrupt. Your code processes the buffer when it's ready. This is what PDM.onReceive() does internally.
Next Steps
Continue to 10-edge-impulse.md to see how Edge Impulse's cloud workflow handles data collection, preprocessing, and model training without writing all the glue code manually.