Catalog

Supplement

Activation Functions

A comprehensive guide to all 31 PyTorch activation functions — from foundational ReLU variants to smooth modern activations, gating mechanisms, and advanced NLP functions. Covers formulas, gradient analysis, and side-by-side PyTorch and TensorFlow implementations.

Intermediate 3h estimated 7 readings 2 quizzes 2 labs 2 drill decks

Readings

What Is an Activation Function?

The role of non-linearity, backpropagation chain rule, vanishing/dying gradient problems, and a practical selection guide.

The ReLU Family

ReLU, LeakyReLU, PReLU, RReLU, and ReLU6 — gradients, dying neurons, and quantization-friendly bounds.

Saturating Activations

Sigmoid, Tanh, Hardsigmoid, Hardtanh, Softsign, and LogSigmoid — vanishing gradients and zero-centering.

Smooth Modern Activations

GELU, SiLU/Swish, Mish, ELU, CELU, and SELU — smooth gates, self-normalization, and modern architecture choices.

Gating & Normalization

GLU, Hardswish, Softmax, LogSoftmax, Softmax2d, and Softmin — gating mechanisms and probability normalization.

Shrinkage & Threshold Functions

Hardshrink, Softshrink, Tanhshrink, Threshold, and Softplus — sparsity promotion and smooth ReLU approximations.

NLP & Advanced Activations

LogSigmoid, AdaptiveLogSoftmax, MultiheadAttention, and SwiGLU — large-vocabulary NLP and transformer architectures.

Quizzes

ReLU & Saturating Activations

6 questions · 70% to pass

Smooth, Gating & Specialized Activations

6 questions · 70% to pass

Labs

Activation Functions in PyTorch

Implement all activations from scratch, run the dying ReLU experiment, benchmark compute costs, and compare convergence on CIFAR-10.

Activation Functions in TensorFlow

Custom Keras layers for Mish, GLU, and SwiGLU, parity check vs PyTorch, and Fashion-MNIST activation comparison.

Practice

ReLU Family & Saturating Activations

12 cards · 10 min

Smooth, Gating & Advanced Activations

12 cards · 10 min