Weight Initialization
A ground-up treatment of all PyTorch and TensorFlow/Keras weight initializers — from constant and random baselines to variance-scaling methods (Xavier/Glorot, He/Kaiming, LeCun) and orthogonal initialization. Covers the variance-propagation derivations, default layer behaviors, and a practical selection guide by architecture and activation.