Pattern Recognition Through Time¶
Overview¶
Temporal pattern recognition covers signals that evolve over time:
Speech recognition: audio waveform → phonemes → words.
Handwriting recognition: pen stroke sequences → characters.
Video-based face recognition: temporal consistency improves accuracy over single frames.
These domains share a language-like structure — sequences of discrete or continuous observations generated by underlying hidden states.
Dynamic Time Warping¶
DTW aligns two temporal sequences that may vary in speed or duration.
Computes an optimal warping path through a cost matrix \(D(i, j)\):
\[D(i, j) = d(x_i, y_j) + \min\{D(i-1, j),\; D(i, j-1),\; D(i-1, j-1)\}\]where \(d(x_i, y_j)\) is a local distance measure (e.g., Euclidean).
Constraints: monotonicity, continuity, boundary conditions (start and end must align).
Complexity: \(O(nm)\) for sequences of length \(n\) and \(m\).
Used in speech recognition (template matching), gesture recognition, and time-series classification.
HMMs vs. DTW¶
DTW: deterministic alignment, no probabilistic model, good for template matching with few classes.
HMMs: probabilistic generative model, handles variability better, scales to large vocabularies.
Modern systems often use HMMs (or their neural successors like CTC/attention models) for large-scale recognition.