Viewpoints of Linear Classifiers
- Algebraic Viewpoint
- Visual Viewpoint
- Geometric Viewpoint
Loss Functions
- How well does the classifier work?
Multi-Class SVM (Hinge Loss)
The score of the correct class should be higher than all the other scores
- Given example $(x_i, y_i)$, let $s = f(x_i, W)$ be scores, then SVM loss has the form:
$$
L_i=\sum_{j\neq y_i}max(0, s_j - s_{y_i}+1)
$$
- when implementing network, think what type of loss when all scores are random
- If loss looks weird, then there’s probably a bug somewhere.
Cross Entropy Loss
Want to interpret raw classifier scores as probability
- Only way to get 0 loss is to get exact answer.
- Use the Softmax function
$$
s = f(x_i;W), P(Y=k|X=x_i)=\frac{e^sk}{\sum_je^{s_j}}
$$
|
Class A |
Class B |
Class C |
Unnormalized log-probability (logit) |
3.2 |
5.1 |
-1.7 |
Unnormalized Probabilities (Exp) |
24.5 |
164.0 |
0.18 |
Normalized Probabilities (Softmax) |
0.13 |
0.87 |
0.00 |
Correct Probabilities |
1 |
0 |
0 |
- Once we get the normalized probability, we get the Loss
- If the correct class was Class A, the lost would be $L_i=-log(0.13)=2.04$
$$
L_i=-log\,P(Y=y_i|X=x_i)
$$
Regularization
- Added to the loss function to prevent the model from doing too well on training data
- Express preferences in among models beyond “minimize training error”
- Avoid Overfitting: Prefer simple models that generalize better
- Improve optimization by adding curvature
$$
L(W)=\frac{1}{N}\sum^N_{i=1}L_i(f(x_i,W),y_i)+\lambda R(W)
$$