Catalog

Supplement

Neural Network Architectures

A bottom-up tour of the core neural network architectures — how they work, why they are designed the way they are, and when to use each. Covers MLPs, convolutional layers and ResNets, vanilla RNNs, LSTMs and GRUs, scaled dot-product and multi-head attention, and the transformer (encoder-only, decoder-only, encoder-decoder). Builds the architectural vocabulary assumed by every other supplement.

Intermediate 7h estimated 7 readings 2 quizzes 2 labs 2 drill decks

Readings

The MLP — Layers, Activations, and Universal Approximation

Affine + non-linearity, the MLP forward pass, Universal Approximation Theorem, depth vs. width, layer normalization, and dropout.

Convolutional Layers and the Vision Inductive Bias

Convolution operation, weight sharing, translation equivariance, kernel size / stride / padding, receptive fields, pooling variants, and depthwise separable convolutions.

Going Deeper — ResNets and Residual Connections

The degradation problem, residual blocks (F(x)+x), the gradient highway, basic vs. bottleneck blocks, projection shortcuts, and residual connections in transformers.

Vanilla RNNs and the Vanishing Gradient

Recurrent forward pass, weight sharing across time, BPTT, the vanishing and exploding gradient problems, gradient clipping, and RNN configurations (many-to-one, seq2seq).

LSTM and GRU — Gating Solutions to Long-Range Memory

LSTM's four gates, cell state as a gradient highway, GRU's reset and update gates, LSTM vs. GRU parameter count and performance comparison.

Attention Mechanisms

Query-key-value abstraction, scaled dot-product attention, the √d_k scaling, multi-head attention, self-attention vs. masked self-attention vs. cross-attention.

The Transformer

Encoder and decoder block structure, FFN sub-layer, positional encodings (sinusoidal, learned, RoPE), encoder-only / decoder-only / encoder-decoder families, and efficiency improvements (GQA, FlashAttention).

Quizzes

MLP, CNN, and ResNet

6 questions · 70% to pass

RNN, LSTM, Attention, and Transformer

6 questions · 70% to pass

Labs

Neural Network Architectures in PyTorch

Neural Network Architectures in TensorFlow

Practice

MLPs, CNNs, and ResNets

15 cards · 12 min

RNNs, Attention, and Transformers

15 cards · 14 min