Catalog
Supplement
Activation Functions
A comprehensive guide to all 31 PyTorch activation functions — from foundational ReLU variants to smooth modern activations, gating mechanisms, and advanced NLP functions. Covers formulas, gradient analysis, and side-by-side PyTorch and TensorFlow implementations.
Readings
1
What Is an Activation Function?
The role of non-linearity, backpropagation chain rule, vanishing/dying gradient problems, and a practical selection guide.
12 min
2
The ReLU Family
ReLU, LeakyReLU, PReLU, RReLU, and ReLU6 — gradients, dying neurons, and quantization-friendly bounds.
15 min
3
Saturating Activations
Sigmoid, Tanh, Hardsigmoid, Hardtanh, Softsign, and LogSigmoid — vanishing gradients and zero-centering.
14 min
4
Smooth Modern Activations
GELU, SiLU/Swish, Mish, ELU, CELU, and SELU — smooth gates, self-normalization, and modern architecture choices.
16 min
5
Gating & Normalization
GLU, Hardswish, Softmax, LogSoftmax, Softmax2d, and Softmin — gating mechanisms and probability normalization.
15 min
6
Shrinkage & Threshold Functions
Hardshrink, Softshrink, Tanhshrink, Threshold, and Softplus — sparsity promotion and smooth ReLU approximations.
13 min
7
NLP & Advanced Activations
LogSigmoid, AdaptiveLogSoftmax, MultiheadAttention, and SwiGLU — large-vocabulary NLP and transformer architectures.
14 min
Quizzes
ReLU & Saturating Activations
6 questions · 70% to pass
Smooth, Gating & Specialized Activations
6 questions · 70% to pass
Labs
Activation Functions in PyTorch
Implement all activations from scratch, run the dying ReLU experiment, benchmark compute costs, and compare convergence on CIFAR-10.
50 min
Activation Functions in TensorFlow
Custom Keras layers for Mish, GLU, and SwiGLU, parity check vs PyTorch, and Fashion-MNIST activation comparison.
40 min
Practice