Supplement · Regularization

Regularization in TensorFlow

Colab Notebook · ~45 min

Google Colab Notebook

Python · ~45 min

Lab Objectives

Apply L1 and L2 penalties via kernel_regularizer and compare against optimizer-level weight decay in AdamW; verify parity with PyTorch results from Lab 1

Use tf.keras.layers.Dropout in a custom training loop with explicit training=True/False flags; implement MC Dropout by forcing training=True at inference time

Build a custom BatchNorm layer subclassing tf.keras.layers.Layer using tf.GradientTape; demonstrate the training argument difference in model(x, training=True/False)

Implement Mixup and CutMix as custom tf.keras.layers.Layer preprocessing layers that operate inside model.fit; train on CIFAR-10 and reproduce the Lab 1 accuracy comparison

Use the label_smoothing argument in SparseCategoricalCrossentropy; implement temperature scaling as a post-training logit rescaling step and evaluate calibration using Expected Calibration Error

Apply tf.keras.layers.SpectralNormalization to a discriminator; use clipnorm and clipvalue on the optimizer and compare their effects on gradient magnitude distributions

Lab Overview

This notebook is the TensorFlow/Keras companion to the PyTorch lab. Every technique is re-implemented using the TF API, with emphasis on Keras-specific patterns: kernel_regularizer, training= flags, custom layer subclassing, and model.compile integration.

Key API Differences vs PyTorch

Concept	PyTorch	TensorFlow / Keras
L2 regularization	`optimizer weight_decay` or manual penalty	`kernel_regularizer=tf.keras.regularizers.L2(lam)`
AdamW	`torch.optim.AdamW`	`tf.keras.optimizers.AdamW`
Dropout training mode	`model.train()` / `model.eval()`	`layer(x, training=True/False)`
BatchNorm mode	`model.train()` / `model.eval()`	`layer(x, training=True/False)`
Label smoothing	custom or `nn.CrossEntropyLoss(label_smoothing=)`	`SparseCategoricalCrossentropy(label_smoothing=)`
Spectral norm	`nn.utils.spectral_norm(layer)`	`tf.keras.layers.SpectralNormalization(layer)`
Gradient clipping	`clip_grad_norm_` before `optimizer.step()`	`optimizer = Adam(clipnorm=1.0)`

Sections

Section	Topic	Key experiment
1	`kernel_regularizer`, AdamW	L2 via regularizer vs optimizer weight_decay
2	Dropout, MC Dropout	`training=True` at inference for uncertainty
3	BatchNorm custom layer	Reproduce eval-mode bug; GradientTape training loop
4	Mixup & CutMix as Keras layers	CIFAR-10 accuracy comparison
5	Label smoothing, temperature scaling	ECE calibration curves
6	SpectralNormalization, gradient clipping	Lipschitz verification; clipnorm vs clipvalue

Section 1 — Weight Penalties in Keras

The cleanest Keras pattern uses kernel_regularizer at the layer level:

tf.keras.layers.Dense(64, kernel_regularizer=tf.keras.regularizers.L2(1e-4))

The regularization loss is automatically summed into model.losses and included in model.fit. Contrast this with manual penalty-in-loss, and with tf.keras.optimizers.AdamW(weight_decay=1e-4) which applies decay directly to the parameter update — the TF equivalent of PyTorch's AdamW.

Section 2 — Dropout and MC Dropout

Keras Dropout is controlled by the training argument, not a global mode flag. In a custom training loop:

with tf.GradientTape() as tape:
    logits = model(x, training=True)   # dropout active
preds = model(x, training=False)       # dropout inactive

For MC Dropout, force training=True at inference and collect T=100 predictions, then compute the mean and variance of the softmax output — identical conceptually to the PyTorch version.

Section 3 — Batch Normalization

Implement MyBatchNorm(num_features) as a tf.keras.layers.Layer with self.gamma, self.beta, self.running_mean, self.running_var. In call(self, x, training=False), branch on training to use batch vs running statistics.

The TF-specific gotcha: when using model.fit, Keras automatically passes the correct training flag. In a custom tf.GradientTape loop you must pass it explicitly — forgetting to do so is the TF analogue of forgetting model.eval() in PyTorch.

Section 4 — Mixup and CutMix as Keras Preprocessing Layers

Implement both as tf.keras.layers.Layer subclasses that operate on (image, label) pairs inside a tf.data pipeline:

dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
dataset = dataset.batch(128).map(mixup_layer)

Compare CIFAR-10 validation accuracy after 30 epochs across the same four conditions as Lab 1 (baseline, flips+crops, +Mixup, +CutMix).

Section 5 — Label Smoothing and Temperature Scaling

SparseCategoricalCrossentropy(label_smoothing=0.1) applies smoothing automatically. Verify that the output logit gap is bounded after training, consistent with the theoretical bound from the readings.

For temperature scaling, implement a thin calibration wrapper:

calibrated_logits = raw_logits / T   # T is a scalar you tune post-training

Evaluate calibration with Expected Calibration Error (ECE) on a validation set before and after temperature scaling. Plot reliability diagrams (confidence vs accuracy per bin) to visualise miscalibration.

Section 6 — Spectral Normalization and Gradient Clipping

tf.keras.layers.SpectralNormalization(layer) is the TF equivalent of nn.utils.spectral_norm. Wrap every dense layer in a discriminator and verify the spectral norm stays ≤ 1.0 after training.

For gradient clipping, compare the two Keras modes on a deep model:

Adam(clipnorm=1.0)    # clips global gradient norm
Adam(clipvalue=0.5)   # clips each component independently

clipnorm preserves gradient direction (only scales magnitude); clipvalue can distort direction by clipping components independently. Plot gradient norm distributions to see the difference.

Regularization in TensorFlow

Lab Overview

Key API Differences vs PyTorch

Sections

Section 1 — Weight Penalties in Keras

Section 2 — Dropout and MC Dropout

Section 3 — Batch Normalization

Section 4 — Mixup and CutMix as Keras Preprocessing Layers

Section 5 — Label Smoothing and Temperature Scaling

Section 6 — Spectral Normalization and Gradient Clipping

Privacy Policy

What we collect

What we don't collect

Your choices

Contact