Gradient Descent from Scratch
Lab 2: Gradient Descent from Scratch
Gradient descent is the algorithm that makes neural network training possible. In this lab you will implement it without any framework machinery — just the update rule applied repeatedly — and build the intuition for why learning rate is the most important hyperparameter.
What You'll Build
- A 1D gradient descent loop on , logging and at each step and plotting the convergence trajectory overlaid on the function
- A learning rate comparison: three runs with too small, just right, and too large — with annotated plots showing slow convergence, clean convergence, and divergence/oscillation
- A 2D gradient descent on the quadratic bowl , with contour plots and gradient arrows showing the path to the origin
- A linear regression via GD: fit a line to noisy data by minimizing MSE, implementing the parameter updates and by hand
- A loss curve comparison between the hand-rolled update and
torch.optim.SGDconfirming they converge identically
Key Concepts Practiced
By the end you will have built an intuition for the loss surface, understand why the negative gradient direction decreases the function (from the linear approximation argument), and see directly how the learning rate determines whether training converges, diverges, or oscillates.