Neural Network Architectures
A bottom-up tour of the core neural network architectures — how they work, why they are designed the way they are, and when to use each. Covers MLPs, convolutional layers and ResNets, vanilla RNNs, LSTMs and GRUs, scaled dot-product and multi-head attention, and the transformer (encoder-only, decoder-only, encoder-decoder). Builds the architectural vocabulary assumed by every other supplement.