Pattern Recognition Through Time

Overview

Temporal pattern recognition covers signals that evolve over time:

  • Speech recognition: audio waveform → phonemes → words.

  • Handwriting recognition: pen stroke sequences → characters.

  • Video-based face recognition: temporal consistency improves accuracy over single frames.

  • These domains share a language-like structure — sequences of discrete or continuous observations generated by underlying hidden states.

Hidden Markov Models

Definition

An HMM is defined by:

  • \(S = \{s_1, \ldots, s_N\}\) — set of hidden states

  • \(O = \{o_1, \ldots, o_M\}\) — set of observable symbols

  • \(A = [a_{ij}]\) — state transition probabilities, \(a_{ij} = P(q_{t+1} = s_j | q_t = s_i)\)

  • \(B = [b_j(k)]\) — emission probabilities, \(b_j(k) = P(o_t = v_k | q_t = s_j)\)

  • \(\pi = [\pi_i]\) — initial state distribution, \(\pi_i = P(q_1 = s_i)\)

Three Fundamental Problems

  1. Evaluation\(P(O | \lambda)\): probability of observation sequence given model. Solved by the Forward algorithm (or Backward): dynamic programming over time steps.

    \[\alpha_t(j) = \left[ \sum_i \alpha_{t-1}(i) \, a_{ij} \right] b_j(o_t)\]
  2. Decoding — most likely state sequence given observations. Solved by the Viterbi algorithm: like Forward but replaces summation with max.

    \[\delta_t(j) = \max_i \left[ \delta_{t-1}(i) \, a_{ij} \right] b_j(o_t)\]

    Backtrack through stored argmax pointers to recover the optimal path.

  3. Learning — adjust model parameters \(\lambda = (A, B, \pi)\) to maximize \(P(O | \lambda)\). Solved by the Baum-Welch algorithm (special case of EM): iteratively re-estimates transition and emission probabilities using forward-backward statistics.

Dynamic Time Warping

DTW aligns two temporal sequences that may vary in speed or duration.

  • Computes an optimal warping path through a cost matrix \(D(i, j)\):

    \[D(i, j) = d(x_i, y_j) + \min\{D(i-1, j),\; D(i, j-1),\; D(i-1, j-1)\}\]

    where \(d(x_i, y_j)\) is a local distance measure (e.g., Euclidean).

  • Constraints: monotonicity, continuity, boundary conditions (start and end must align).

  • Complexity: \(O(nm)\) for sequences of length \(n\) and \(m\).

  • Used in speech recognition (template matching), gesture recognition, and time-series classification.

HMMs vs. DTW

  • DTW: deterministic alignment, no probabilistic model, good for template matching with few classes.

  • HMMs: probabilistic generative model, handles variability better, scales to large vocabularies.

  • Modern systems often use HMMs (or their neural successors like CTC/attention models) for large-scale recognition.