We need a function h(X) to minimize Generalization Error:

R(h) = Pr\{h(X) \neq Y\}

Bayes Classifier Given Distribution

Assume we have the joint distribution f_{X, Y}(x, y) = f_{X | Y = y}(x)f_{Y}(y) (or likelihood f_{X | Y}(x) and prior Pr\{Y = 1\}, Pr\{Y = 0\}), binary classification is equivalent to hypothesis testing using MAP decision rule.

H_0: Y = 0, H_1: Y = 1

Bayes Classifier: Given that we know joint p.d.f. We can compair the join distribution Pr\{X = x | Y = 0\} with Pr\{X = x | Y = 1\}.

This is random because the training data is random

Testing Error: since we need a way to estimate error, we need a test set that is independent from training set. We hope test error is an estimation of generalization error.