Regularization

A unified treatment of regularization in deep learning — from the bias-variance tradeoff through explicit penalties (L1, L2, weight decay), dropout and its variants (DropConnect, MC Dropout, Stochastic Depth), normalization layers (BatchNorm, LayerNorm, RMSNorm), early stopping, data augmentation (Mixup, CutOut, CutMix, RandAugment), output regularizers (label smoothing, confidence penalty), and implicit regularization from initialization, SGD noise, and spectral normalization.

Intermediate 6h estimated 7 readings 2 quizzes 2 labs 2 drill decks

Readings

Overfitting and the Bias-Variance Tradeoff

Underfitting vs. overfitting, bias-variance decomposition, the definition of regularization, and the double-descent phenomenon in overparameterized models.

13 min

L1 and L2 Weight Penalties

L2 weight decay derivation, L1 sparsity geometry, elastic net, MAP estimation interpretation, and the AdamW decoupling fix.

15 min

Dropout and Stochastic Regularization

Inverted dropout, ensemble interpretation, co-adaptation prevention, DropConnect, MC Dropout for uncertainty, and Stochastic Depth / DropPath.

15 min

Normalization Layers