Supervised Learning¶

Definition - Supervised learning: learn from labeled examples (each input has a known “right answer”). - Algorithm generalizes to new inputs.

Regression (continuous output) - Example: predict housing price from size (Portland dataset). - Fit linear vs quadratic model → different predictions for new house (e.g. 750 sq ft). - Regression problem: predict a continuous-valued output (price as real/scalar).

Classification (discrete output) - Example: breast tumor malignant (1) vs benign (0) from tumor size. - Can extend to multi-class (0 = benign, 1/2/3 = cancer types). - Plot alternatives: malignant (X) vs benign (O) on a line; or age + tumor size in 2D with a separating line. - Real datasets use many features (clump thickness, cell size uniformity, cell shape, …). - Some problems need an infinite number of features; SVM uses a math trick so the computer need not store them all.

Recap - Supervised: every training example has a label (price, malignant/benign, …). - Regression → continuous output; classification → discrete output.

Problem type quiz - Sell count of identical items in 3 months → regression (continuous count). - Per-account hacked or not → classification (0/1 discrete labels).

Unsupervised Learning¶

Definition - Data without labels (or all same label); algorithm finds structure in data. - Opposite of supervised: no per-example “correct answer” given.

Clustering - Partition data into groups (clusters). - Applications:

Google News: cluster articles on same story (e.g. BP oil spill).

Genomics / DNA microarrays: group individuals by gene expression patterns.

Data centers: which machines work together.

Social networks: cohesive friend groups.

Market segmentation: discover customer segments from unlabeled data.

Astronomy: galaxy-formation theories from clustering.

Cocktail party problem (source separation) - Multiple speakers, two microphones → overlapping recordings. - Unsupervised algorithm separates sources (e.g. English vs Spanish counting). - Implementable in one line in Octave/Matlab (after research); prototype in Octave, port to C++/Java once working. - Silicon Valley workflow: prototype in Octave/Matlab (fast iteration, built-in linear algebra e.g. SVD).

Unsupervised vs supervised (examples) - Spam with labeled spam/non-spam → supervised. - News clustering → unsupervised. - Market segments from customer data only → unsupervised. - Diabetes yes/no (like tumor labels) → supervised.

Supervised Learning — Reference Detail¶

Housing regression setup - Input $x$ = size (sq ft); output $y$ = price ($1000s). - Training set gives $(x^{(i)}, y^{(i)})$ with correct prices. - New house $x_{\mathrm{new}}$ → predict $y_{\mathrm{new}}$ via learned function (line, polynomial, etc.).

Classification setup - Output $y \in \{0,1\}$ (benign/malignant) or $y \in \{0,1,2,3\}$ (multi-class cancer types). - 1D: tumor size only; 2D: size + patient age; high-D: clump thickness, uniformity of cell size/shape, marginal adhesion, … - Support Vector Machine: kernel trick handles very large (conceptually infinite) feature spaces without explicit storage.

Infinite features - More features → more cues for prediction. - Challenge: storing infinite feature vectors. - SVM: mathematical trick (covered later) to work in rich feature spaces efficiently.

Unsupervised Learning — Reference Detail¶

Clustering vs classification - Clustering: no labels; discover groups (e.g. news topics, customer segments). - Classification: labeled training examples required.

Cocktail party / blind source separation - $m$ microphones, $n$ speakers → mixed recordings. - Unsupervised algorithm infers separate sources (ICA-style). - Octave prototype: research distilled to short code; production often reimplemented in C++/Java after validation.

When to use which - Labeled outcomes (price, spam, disease) → supervised. - Structure discovery only (segments, article groups, gene types) → unsupervised.