Supplement · Activation Functions

Activation Functions in PyTorch

Colab Notebook · ~50 min

Google Colab Notebook

Python · ~50 min

Lab Objectives

Implement all 31 PyTorch activation functions from scratch using tensor operations and verify against nn.* equivalents

Visualize activation functions and their gradients side-by-side to build intuition for output range, smoothness, and saturation

Measure the dying ReLU phenomenon experimentally and compare with LeakyReLU, PReLU, and ELU

Implement and verify the SwiGLU feedforward block from scratch, then benchmark against a ReLU FFN on a small classification task

Profile the compute cost of smooth activations (GELU, Mish, SiLU) vs piecewise activations (ReLU, Hardswish) using torch.utils.benchmark

Lab Overview

This notebook ties every formula from the readings to runnable, verifiable code. For each activation function you will:

Implement from scratch using basic PyTorch tensor ops
Verify numerically against the corresponding torch.nn module
Inspect the gradient via .backward() and compare to the analytical derivative
Visualize the function and its derivative over $[-5, 5]$

Section	Content
1	ReLU family: ReLU, LeakyReLU, PReLU, RReLU, ReLU6 — from scratch + gradient comparison
2	Saturating activations: Sigmoid, Tanh, Hardsigmoid, Hardtanh, Softsign, LogSigmoid
3	Smooth modern: GELU (exact and tanh approx), SiLU/Swish, Mish, ELU, CELU, SELU
4	Gating: GLU, Hardswish, Softmax, LogSoftmax, Softmax2d, Softmin
5	Shrinkage: Hardshrink, Softshrink, Tanhshrink, Threshold, Softplus
6	Dying ReLU experiment: track dead neuron count across training steps for ReLU vs LeakyReLU vs ELU
7	SwiGLU FFN: implement from scratch, verify against `nn.SiLU` + `chunk`, benchmark vs ReLU FFN
8	Compute benchmarks: `torch.utils.benchmark` for all activations on CPU and CUDA
9	End-to-end: train a small MLP on CIFAR-10 with 5 activation functions and compare convergence curves

Previous Next →