Generalized Hough Transform¶
Non-Analytic Models¶
The standard Hough transform requires an analytic equation (lines, circles). The Generalized Hough Transform (Ballard, 1981) extends voting to arbitrary shapes without analytic descriptions, using a displacement table (R-table).
R-Table Construction (Training)¶
Given a template shape with a designated reference point (e.g., centroid):
For each boundary point \(P_i\), compute:
The gradient direction \(\theta_i\) at that point.
The displacement vector \(\mathbf{r}_i\) from \(P_i\) to the reference point.
Store \(\mathbf{r}_i\) in a table indexed by \(\theta_i\).
Multiple boundary points sharing the same \(\theta\) contribute multiple displacement vectors to the same table entry.
Detection (Known Orientation and Scale)¶
When the object’s orientation and scale are fixed:
For each edge point in the test image, compute gradient direction \(\theta\).
Look up all displacement vectors \(\{\mathbf{r}\}\) stored under \(\theta\) in the R-table.
Vote for the reference point location \((x, y)\) by adding each \(\mathbf{r}\) to the edge point’s position.
Peaks in the 2D accumulator \(H[x, y]\) give the object location.
Handling Unknown Orientation¶
Add orientation \(\theta^*\) as a parameter. For each candidate \(\theta^*\):
Compute adjusted gradient: \(\theta' = \theta - \theta^*\)
Retrieve displacements for \(\theta'\) and vote.
The accumulator becomes 3D: \(H[x, y, \theta^*]\). Peaks give both location and orientation.
Handling Unknown Scale¶
Add scale \(s\) as a parameter. For each candidate \(s\):
Retrieve displacements for the edge gradient direction.
Vote with scaled displacement vectors \(s \cdot \mathbf{r}\).
The accumulator becomes 3D: \(H[x, y, s]\).
If both orientation and scale are unknown, the accumulator is 4D: \(H[x, y, \theta^*, s]\) — \(O(k^4)\) bins, which is very expensive.
Visual Codeword Approach¶
A modern extension (Leibe et al., ECCV 2004) replaces edge gradients with visual codewords — learned local feature patches.
Training¶
Run an interest point detector on training images to extract local patches.
Cluster patches to form a codebook; cluster centers are the visual codewords.
For each interest point in training images:
Assign it to the nearest codeword.
Record the displacement vector from that point to the object’s reference point, indexed by codeword.
Detection¶
Detect interest points in the test image and match each to a codeword.
For each matched codeword, retrieve all associated displacement vectors and vote for the reference point.
Peaks in the accumulator indicate object locations.
This approach is more discriminative than edge-based voting and forms the basis of methods like Hough Forests, which combine displacement voting with random forest classifiers.