feature: input labels: output
Trivial Algorithm:
Majority Vote: predict most likely label
Memorizer: if exact feature match in dataset, predict answer, otherwise majority vote
Decision stump: binary classify based only on one feature in dataset (most likely)
Notations:
\mathscr{x}: feature space
\mathscr{y}: label space
h: \mathscr{x} \to \mathscr{y}: the algorithm we are building
c^* : \mathscr{x} \to \mathscr{y}: target perfect algorithm
l: \mathscr{y} \times \mathscr{y} \to \mathbb{R}: loss function
\hat{y} = h(x): predicted label
y = c*(x): true label
\mathscr{D} = \{(x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)}), ..., (x^{(N)}, y^{(N)})\}: training dataset
(x^{(n)}, y^{(n)}) = (x^{(n)}_1, x^{(n)}_2, ..., x^{(n)}_D, y^{(n)}): datapoint
N: total number of data point
D: total number of feature in single data point
Table of Content