TinyML: Model Training | TinyML Tutorial

The goal is a model small enough to fit in tens of kilobytes and fast enough to run in under 10 ms. Both constraints push you toward simpler architectures than you'd use on a server.

Design Constraints Before Writing Code

Before opening your editor, answer three questions:

What is the input shape? For the gesture data from chapter 4: (300,) (100 time steps × 3 axes, flattened).
How many output classes? Three: punch, flex, idle.
What is the RAM budget? The Nano has 256 KB total. A reasonable target is under 50 KB for the tensor arena.

These answers constrain the architecture. A 256-unit dense layer has 256 × 300 = 76,800 weights, which is 300 KB in float32. That's already over budget. Keep layers small.

A Minimal Dense Model

Start with the smallest model that could plausibly work, then add capacity only if accuracy is insufficient.

# training/train.py
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split

# Load preprocessed data
X = np.load("data/processed/X.npy")   # shape: (N, 300)
y = np.load("data/processed/y.npy")   # shape: (N,)

NUM_CLASSES = 3
y_cat = tf.keras.utils.to_categorical(y, NUM_CLASSES)

X_train, X_val, y_train, y_val = train_test_split(
    X, y_cat, test_size=0.2, random_state=42, stratify=y
)

# Normalize: zero mean, unit variance per feature
mean = X_train.mean(axis=0)
std  = X_train.std(axis=0) + 1e-8   # avoid div-by-zero on constant features
X_train = (X_train - mean) / std
X_val   = (X_val   - mean) / std

# Save normalization parameters for use on the device
np.save("training/mean.npy", mean)
np.save("training/std.npy",  std)

# Model definition
model = tf.keras.Sequential([
    tf.keras.layers.InputLayer(input_shape=(300,)),
    tf.keras.layers.Dense(32, activation="relu"),
    tf.keras.layers.Dense(16, activation="relu"),
    tf.keras.layers.Dense(NUM_CLASSES, activation="softmax"),
], name="gesture_classifier")

model.compile(
    optimizer="adam",
    loss="categorical_crossentropy",
    metrics=["accuracy"],
)

model.summary()

The summary should show parameter counts. Two dense layers of 32 and 16 units produce:

Layer                Output Shape    Params
─────────────────────────────────────────────
dense (Dense)        (None, 32)      9,632     (300×32 + 32 bias)
dense_1 (Dense)      (None, 16)      528       (32×16 + 16 bias)
dense_2 (Dense)      (None, 3)       51        (16×3 + 3 bias)
─────────────────────────────────────────────
Total params:        10,211
Trainable params:    10,211

10K parameters × 4 bytes = ~40 KB in float32. After int8 quantization that drops to ~10 KB. Well within budget.

Training Loop

# continuation of train.py
callbacks = [
    tf.keras.callbacks.EarlyStopping(
        monitor="val_accuracy",
        patience=15,
        restore_best_weights=True,
    ),
    tf.keras.callbacks.ReduceLROnPlateau(
        monitor="val_loss",
        factor=0.5,
        patience=8,
        min_lr=1e-5,
    ),
]

history = model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=200,
    batch_size=16,
    callbacks=callbacks,
    verbose=2,
)

# Save the trained model
model.save("training/gesture_model.keras")
print(f"Best val accuracy: {max(history.history['val_accuracy']):.3f}")

With 60 total samples (20 per class) and this architecture, expect 85-95% validation accuracy. Below 80% means either the data is noisy, the classes are not discriminative, or you need more samples.

Evaluating the Model

# evaluate.py
import numpy as np
import tensorflow as tf
from sklearn.metrics import classification_report, confusion_matrix

LABELS = ["punch", "flex", "idle"]

X     = np.load("data/processed/X.npy")
y     = np.load("data/processed/y.npy")
mean  = np.load("training/mean.npy")
std   = np.load("training/std.npy")

X_norm = (X - mean) / std

model = tf.keras.models.load_model("training/gesture_model.keras")
probs = model.predict(X_norm, verbose=0)
preds = probs.argmax(axis=1)

print(classification_report(y, preds, target_names=LABELS))
print("\nConfusion matrix:")
print(confusion_matrix(y, preds))

Read the confusion matrix carefully. If punch is frequently misclassified as flex, the two gestures look similar to the accelerometer. You can fix this by:

Modifying the gesture so the two classes are more distinct, or
Collecting more data to teach the model the subtle differences, or
Adding a window of consecutive frames to give the model more context.

Visualizing Training Progress

# plot_history.py
import json
import numpy as np
import matplotlib.pyplot as plt

history = model.history.history   # or load from a saved JSON

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

ax1.plot(history["accuracy"],     label="train")
ax1.plot(history["val_accuracy"], label="val")
ax1.set_title("Accuracy")
ax1.set_xlabel("Epoch")
ax1.legend()

ax2.plot(history["loss"],     label="train")
ax2.plot(history["val_loss"], label="val")
ax2.set_title("Loss")
ax2.set_xlabel("Epoch")
ax2.legend()

plt.tight_layout()
plt.savefig("training/history.png", dpi=150)

The two common failure modes visible on these plots:

Overfitting: training accuracy climbs to 99%, validation accuracy stops at 70%. The gap widens and stays wide. Fix: more data, dropout layer, or a smaller model.

Underfitting: both training and validation accuracy plateau below target. Fix: larger model, more epochs, or better features.

Adding a Conv1D Architecture

For the same input data kept in its original (100, 3) shape (not flattened), a 1D convolutional model often generalizes better because it explicitly exploits the temporal structure.

model_conv = tf.keras.Sequential([
    tf.keras.layers.InputLayer(input_shape=(100, 3)),
    tf.keras.layers.Conv1D(filters=8, kernel_size=5, activation="relu"),
    tf.keras.layers.MaxPooling1D(pool_size=2),
    tf.keras.layers.Conv1D(filters=16, kernel_size=3, activation="relu"),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(16, activation="relu"),
    tf.keras.layers.Dense(3, activation="softmax"),
], name="gesture_conv")

This model has ~3,000 parameters and often outperforms the dense model on short time-series. Adjust the preprocessing to keep the (100, 3) shape by removing the reshape(-1) call in preprocess.py.

Checking Model Size Early

Don't wait until conversion to check size. Estimate during training:

def model_size_bytes(model):
    total_params = sum(w.numpy().nbytes for w in model.trainable_weights)
    return total_params

print(f"Float32 size: {model_size_bytes(model) / 1024:.1f} KB")
print(f"Int8 size (est.): {model_size_bytes(model) / 4 / 1024:.1f} KB")

If the int8 estimate is over 200 KB, the model probably won't fit alongside the firmware. Reduce layer sizes before converting.

Common Pitfalls

Not saving the normalization parameters. The device must apply the same mean/std normalization to each input before inference. If you forget to save them, you'll have to retrain.

Using model.save("model.h5") instead of the Keras format. The .keras format handles subclassed models and custom layers correctly. Use it unless you have a specific reason not to.

Training on raw floats, expecting int8 accuracy. Post-training quantization introduces error. If your float model has 92% accuracy and you need 90% after quantization, test conversion early. Some models degrade more than expected.

Next Steps

Continue to 06-model-conversion.md to quantize the trained model and export it to the TFLite flat buffer format.