Catalog
Supplement
Optimizers
A comprehensive guide to all 13 PyTorch optimizers and 15 learning rate schedulers — SGD, Adam, AdamW, adaptive methods, quasi-Newton, and the full lr_scheduler suite — with update-rule derivations, hyperparameter intuition, and side-by-side PyTorch and TensorFlow implementations.
Readings
1
What Is an Optimizer?
Gradient descent mechanics, the PyTorch optimizer interface, param groups, gradient clipping, and a practical selection guide.
12 min
2
SGD & Momentum Methods
Vanilla SGD, momentum, Nesterov acceleration, ASGD averaging, and Rprop's sign-based per-parameter step sizes.
14 min
3
Adaptive Learning Rate Methods
Adagrad's accumulated squared gradients, RMSprop's exponential moving average fix, and Adadelta's learning-rate-free design.
14 min
4
The Adam Family
Adam, AdamW, Adamax, NAdam, RAdam, and SparseAdam — moment estimates, bias correction, decoupled weight decay, and variance rectification.
16 min
5
Second-Order & Advanced Techniques
LBFGS quasi-Newton with closure, param groups for layer-wise learning rates, gradient clipping, and fused optimizer kernels.
13 min
6
Learning Rate Schedulers
StepLR, MultiStepLR, ExponentialLR, PolynomialLR, ConstantLR, LinearLR, CosineAnnealingLR, ReduceLROnPlateau, LambdaLR, and MultiplicativeLR.
15 min
7
Advanced Schedulers & Composition
CosineAnnealingWarmRestarts, CyclicLR, OneCycleLR super-convergence, SequentialLR, ChainedScheduler, and warmup-then-decay patterns.
14 min
Quizzes
SGD, Adaptive Methods & Adam
6 questions · 70% to pass
Advanced Techniques & Schedulers
6 questions · 70% to pass
Labs
Optimizers in PyTorch
Implement all 13 optimizers, visualize trajectories on the Rosenbrock function, compare convergence on CIFAR-10, and benchmark OneCycleLR super-convergence.
55 min
Optimizers in TensorFlow
TensorFlow/Keras optimizer equivalents, custom RAdam optimizer subclass, LR scheduling with callbacks, and parity check vs PyTorch.
40 min
Practice