Lecture 001

feature: input labels: output

Trivial Algorithm:

Majority Vote: predict most likely label
Memorizer: if exact feature match in dataset, predict answer, otherwise majority vote
Decision stump: binary classify based only on one feature in dataset (most likely)

Notations:

$\mathscr{x}$ : feature space
$\mathscr{y}$ : label space
$h: \mathscr{x} \to \mathscr{y}$ : the algorithm we are building
$c^* : \mathscr{x} \to \mathscr{y}$ : target perfect algorithm
$l: \mathscr{y} \times \mathscr{y} \to \mathbb{R}$ : loss function
$\hat{y} = h(x)$ : predicted label
$y = c*(x)$ : true label
$\mathscr{D} = \{(x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)}), ..., (x^{(N)}, y^{(N)})\}$ : training dataset
$(x^{(n)}, y^{(n)}) = (x^{(n)}_1, x^{(n)}_2, ..., x^{(n)}_D, y^{(n)})$ : datapoint
$N$ : total number of data point
$D$ : total number of feature in single data point

Table of Content