Normalization in Deep Learning
A comprehensive treatment of normalization techniques — from why they work to how to choose between them. Covers the covariate shift and loss landscape views, BatchNorm internals (running stats, train/eval modes, SyncBN), the LayerNorm family (RMSNorm, DeepNorm, pre/post-norm), weight and spectral normalization, small-batch alternatives (GroupNorm, InstanceNorm), and adaptive/conditional normalization (AdaIN, SPADE, FiLM, adaLN-Zero in DiT).