Upskilled Consulting & Training

Deep technical courses for ML practitioners and engineers.

Supplements

Topic-focused mini-courses that fill in specific knowledge gaps for ML practitioners.

Supplement
Neural Network Architectures
A bottom-up tour of the core neural network architectures — MLPs, convolutional layers and ResNets, vanilla RNNs, LSTMs, GRUs, scaled dot-product attention, multi-head attention, and the transformer. Builds the architectural vocabulary assumed by every other supplement.
Deep LearningCNNsRNNsTransformersAttention
Supplement
Activation Functions
A comprehensive guide to all 31 PyTorch activation functions — ReLU variants, saturating activations, smooth modern activations, gating mechanisms, shrinkage functions, and advanced NLP activations — with equation breakdowns, gradient analysis, and side-by-side PyTorch and TensorFlow implementations.
Deep LearningNeural NetworksPyTorchTensorFlow
Supplement
Loss Functions
A ground-up tour of all 20 PyTorch loss functions — regression, classification, distribution, ranking, embedding, and metric learning — with equation breakdowns, derivations from probability theory, and side-by-side PyTorch and TensorFlow implementations.
Deep LearningOptimizationPyTorchTensorFlow
Supplement
Optimizers
A comprehensive guide to all 13 PyTorch optimizers and 15 learning rate schedulers — SGD, Adam, AdamW, adaptive methods, quasi-Newton, and the full lr_scheduler suite — with update-rule derivations, hyperparameter intuition, and side-by-side PyTorch and TensorFlow implementations.
Deep LearningOptimizationPyTorchTensorFlow
Supplement
Weight Initialization
A ground-up treatment of all PyTorch and TensorFlow/Keras weight initializers — from constant and random baselines to variance-scaling methods (Xavier/Glorot, He/Kaiming, LeCun) and orthogonal initialization. Covers variance-propagation derivations, default layer behaviors, and a practical selection guide by architecture and activation.
Deep LearningTrainingPyTorchTensorFlow
Supplement
Normalization in Deep Learning
A comprehensive treatment of normalization techniques — from why they work to how to choose between them. Covers BatchNorm internals (running stats, train/eval modes, SyncBN), the LayerNorm family (RMSNorm, DeepNorm, pre/post-norm), weight and spectral normalization, small-batch alternatives (GroupNorm, InstanceNorm), and adaptive/conditional normalization (AdaIN, SPADE, FiLM, adaLN-Zero in DiT).
Deep LearningTrainingTransformersGANsDiffusion Models
Supplement
Regularization
A unified treatment of regularization in deep learning — bias-variance tradeoff, L1/L2 weight penalties, dropout and its variants (MC Dropout, Stochastic Depth), normalization layers (BatchNorm, LayerNorm, RMSNorm), early stopping, data augmentation (Mixup, CutOut, CutMix), label smoothing, and implicit regularization from initialization and SGD noise.
Deep LearningTrainingGeneralizationCNNsTransformers