Generalized Hough Transform

Non-Analytic Models

The standard Hough transform requires an analytic equation (lines, circles). The Generalized Hough Transform (Ballard, 1981) extends voting to arbitrary shapes without analytic descriptions, using a displacement table (R-table).

R-Table Construction (Training)

Given a template shape with a designated reference point (e.g., centroid):

  1. For each boundary point \(P_i\), compute:

    • The gradient direction \(\theta_i\) at that point.

    • The displacement vector \(\mathbf{r}_i\) from \(P_i\) to the reference point.

  2. Store \(\mathbf{r}_i\) in a table indexed by \(\theta_i\).

Multiple boundary points sharing the same \(\theta\) contribute multiple displacement vectors to the same table entry.

Detection (Known Orientation and Scale)

When the object’s orientation and scale are fixed:

  1. For each edge point in the test image, compute gradient direction \(\theta\).

  2. Look up all displacement vectors \(\{\mathbf{r}\}\) stored under \(\theta\) in the R-table.

  3. Vote for the reference point location \((x, y)\) by adding each \(\mathbf{r}\) to the edge point’s position.

  4. Peaks in the 2D accumulator \(H[x, y]\) give the object location.

Handling Unknown Orientation

Add orientation \(\theta^*\) as a parameter. For each candidate \(\theta^*\):

  • Compute adjusted gradient: \(\theta' = \theta - \theta^*\)

  • Retrieve displacements for \(\theta'\) and vote.

The accumulator becomes 3D: \(H[x, y, \theta^*]\). Peaks give both location and orientation.

Handling Unknown Scale

Add scale \(s\) as a parameter. For each candidate \(s\):

  • Retrieve displacements for the edge gradient direction.

  • Vote with scaled displacement vectors \(s \cdot \mathbf{r}\).

The accumulator becomes 3D: \(H[x, y, s]\).

If both orientation and scale are unknown, the accumulator is 4D: \(H[x, y, \theta^*, s]\)\(O(k^4)\) bins, which is very expensive.

Visual Codeword Approach

A modern extension (Leibe et al., ECCV 2004) replaces edge gradients with visual codewords — learned local feature patches.

Training

  1. Run an interest point detector on training images to extract local patches.

  2. Cluster patches to form a codebook; cluster centers are the visual codewords.

  3. For each interest point in training images:

    • Assign it to the nearest codeword.

    • Record the displacement vector from that point to the object’s reference point, indexed by codeword.

Detection

  1. Detect interest points in the test image and match each to a codeword.

  2. For each matched codeword, retrieve all associated displacement vectors and vote for the reference point.

  3. Peaks in the accumulator indicate object locations.

This approach is more discriminative than edge-based voting and forms the basis of methods like Hough Forests, which combine displacement voting with random forest classifiers.