Supplement · Activation Functions
Activation Functions in PyTorch
Google Colab Notebook
Activation Functions in PyTorch
Lab Objectives
1
Implement all 31 PyTorch activation functions from scratch using tensor operations and verify against nn.* equivalents
2
Visualize activation functions and their gradients side-by-side to build intuition for output range, smoothness, and saturation
3
Measure the dying ReLU phenomenon experimentally and compare with LeakyReLU, PReLU, and ELU
4
Implement and verify the SwiGLU feedforward block from scratch, then benchmark against a ReLU FFN on a small classification task
5
Profile the compute cost of smooth activations (GELU, Mish, SiLU) vs piecewise activations (ReLU, Hardswish) using torch.utils.benchmark
Lab Overview
This notebook ties every formula from the readings to runnable, verifiable code. For each activation function you will:
- Implement from scratch using basic PyTorch tensor ops
- Verify numerically against the corresponding
torch.nnmodule - Inspect the gradient via
.backward()and compare to the analytical derivative - Visualize the function and its derivative over
Sections
| Section | Content |
|---|---|
| 1 | ReLU family: ReLU, LeakyReLU, PReLU, RReLU, ReLU6 — from scratch + gradient comparison |
| 2 | Saturating activations: Sigmoid, Tanh, Hardsigmoid, Hardtanh, Softsign, LogSigmoid |
| 3 | Smooth modern: GELU (exact and tanh approx), SiLU/Swish, Mish, ELU, CELU, SELU |
| 4 | Gating: GLU, Hardswish, Softmax, LogSoftmax, Softmax2d, Softmin |
| 5 | Shrinkage: Hardshrink, Softshrink, Tanhshrink, Threshold, Softplus |
| 6 | Dying ReLU experiment: track dead neuron count across training steps for ReLU vs LeakyReLU vs ELU |
| 7 | SwiGLU FFN: implement from scratch, verify against nn.SiLU + chunk, benchmark vs ReLU FFN |
| 8 | Compute benchmarks: torch.utils.benchmark for all activations on CPU and CUDA |
| 9 | End-to-end: train a small MLP on CIFAR-10 with 5 activation functions and compare convergence curves |