Catalog

Supplement

Optimizers

A comprehensive guide to all 13 PyTorch optimizers and 15 learning rate schedulers — SGD, Adam, AdamW, adaptive methods, quasi-Newton, and the full lr_scheduler suite — with update-rule derivations, hyperparameter intuition, and side-by-side PyTorch and TensorFlow implementations.

Intermediate 3h estimated 7 readings 2 quizzes 2 labs 2 drill decks

Readings

What Is an Optimizer?

Gradient descent mechanics, the PyTorch optimizer interface, param groups, gradient clipping, and a practical selection guide.

SGD & Momentum Methods

Vanilla SGD, momentum, Nesterov acceleration, ASGD averaging, and Rprop's sign-based per-parameter step sizes.

Adaptive Learning Rate Methods

Adagrad's accumulated squared gradients, RMSprop's exponential moving average fix, and Adadelta's learning-rate-free design.

The Adam Family

Adam, AdamW, Adamax, NAdam, RAdam, and SparseAdam — moment estimates, bias correction, decoupled weight decay, and variance rectification.

Second-Order & Advanced Techniques

LBFGS quasi-Newton with closure, param groups for layer-wise learning rates, gradient clipping, and fused optimizer kernels.

Learning Rate Schedulers

StepLR, MultiStepLR, ExponentialLR, PolynomialLR, ConstantLR, LinearLR, CosineAnnealingLR, ReduceLROnPlateau, LambdaLR, and MultiplicativeLR.

Advanced Schedulers & Composition

CosineAnnealingWarmRestarts, CyclicLR, OneCycleLR super-convergence, SequentialLR, ChainedScheduler, and warmup-then-decay patterns.

Quizzes

SGD, Adaptive Methods & Adam

6 questions · 70% to pass

Advanced Techniques & Schedulers

6 questions · 70% to pass

Labs

Optimizers in PyTorch

Implement all 13 optimizers, visualize trajectories on the Rosenbrock function, compare convergence on CIFAR-10, and benchmark OneCycleLR super-convergence.

Optimizers in TensorFlow

TensorFlow/Keras optimizer equivalents, custom RAdam optimizer subclass, LR scheduling with callbacks, and parity check vs PyTorch.

Practice

SGD, Adaptive Methods & the Adam Family

12 cards · 10 min

Learning Rate Schedulers

12 cards · 10 min