Supplement · Activation Functions

Activation Functions in TensorFlow

Colab Notebook · ~40 min
Google Colab Notebook
Activation Functions in TensorFlow
Python · ~40 min
Open in Colab
Lab Objectives
1
Implement the core activation functions using tf.keras.activations, tf.nn, and custom tf.keras.layers.Layer subclasses
2
Verify numerical parity between PyTorch and TensorFlow outputs for all shared activation functions
3
Implement custom activations (Mish, Softshrink, GLU variants) as tf.keras.layers.Layer subclasses and use them in model.compile workflows
4
Profile activation compute cost using tf.function and tf.test.Benchmark
5
Train a Keras model end-to-end with five different activation functions and compare validation accuracy curves

Lab Overview

This notebook is the TensorFlow companion to the PyTorch lab. For every activation function implemented in PyTorch you will find the TensorFlow/Keras equivalent — or implement it from scratch when no built-in exists.

PyTorch vs TensorFlow API Differences

Concept PyTorch TensorFlow/Keras
ReLU nn.ReLU() tf.keras.layers.ReLU() or 'relu'
GELU nn.GELU() tf.keras.activations.gelu(x)
SiLU/Swish nn.SiLU() tf.keras.activations.swish(x)
Sigmoid nn.Sigmoid() tf.keras.activations.sigmoid(x)
Softmax nn.Softmax(dim=-1) tf.keras.activations.softmax(x)
PReLU nn.PReLU() tf.keras.layers.PReLU()
Hardsigmoid nn.Hardsigmoid() tf.keras.activations.hard_sigmoid(x)
Custom nn.Module subclass tf.keras.layers.Layer subclass

Key TF-Specific Notes

  • Hardsigmoid formula differs: PyTorch uses (x+3)/6(x+3)/6 clamped to [0,1][0,1]; TensorFlow uses clip(0.2x+0.5,0,1)\text{clip}(0.2x + 0.5, 0, 1). Different slopes and centers — verify numerically.
  • Mish, GLU, Softshrink, Hardshrink, Tanhshrink: No built-in TF equivalent — implement as custom tf.keras.layers.Layer subclasses.
  • @tf.function for JIT: Wrapping custom activations with @tf.function enables XLA compilation and significant speedups.
  • tf.GradientTape: Used to verify analytical gradients against tape.gradient().

Sections

Section Content
1 ReLU family: layers.ReLU, layers.LeakyReLU, layers.PReLU + custom ReLU6
2 Saturating: sigmoid, tanh, hardsigmoid (TF vs PT formula diff), hardtanh custom
3 Smooth modern: gelu, swish, custom Mish layer, ELU/SELU
4 Gating: custom GLU layer, Softmax, LogSoftmax, Softmin
5 Shrinkage: custom Hardshrink, Softshrink, Tanhshrink, Softplus
6 Numerical parity check: PyTorch vs TensorFlow on all shared functions (requires import torch)
7 Custom activation as Keras layer: implement SwiGLU as tf.keras.layers.Layer with model.compile
8 End-to-end: train on CIFAR-10 with 5 activations using model.fit, compare learning curves