Prerequisite · Linear Algebra

SVD, PCA, and the Geometry of Neural Representations

Colab Notebook · ~60 min
Google Colab Notebook
SVD, PCA, and the Geometry of Neural Representations
Python · ~60 min
Open in Colab
Lab Objectives
1
Compute the full SVD of a GloVe embedding slice and plot the singular value spectrum
2
Reconstruct the embedding matrix at ranks k=1,5,10,25,50k = 1, 5, 10, 25, 50 and plot Frobenius-norm error vs rank
3
Implement PCA from scratch via SVD: centre the data, compute SVD, project onto top-kk components — verify it matches sklearn.decomposition.PCA
4
Compute the effective rank (number of singular values above 1%1\% of σ1\sigma_1) and compare to the nominal embedding dimension
5
In PyTorch: use torch.linalg.svd for decomposition; in TensorFlow: use tf.linalg.svd — compare outputs and timing

Lab Overview

The singular value decomposition is arguably the most important computational tool in applied linear algebra — and it sits at the heart of PCA, matrix factorisation, dimensionality reduction, and the emerging theory of why large language models generalise. In this capstone lab you will compute SVDs on real data, perform low-rank approximation, verify the SVD–PCA equivalence, and analyse the spectral structure of a pre-trained embedding matrix.

What You Will Build

A notebook that (1) reconstructs GloVe embeddings at varying ranks and measures Frobenius-norm error, (2) implements PCA from scratch via SVD and verifies it matches sklearn, (3) plots the singular value spectrum and identifies the effective rank, and (4) probes implicit low-rank structure in a pre-trained embedding matrix — connecting the theory of gradient descent to empirical observations in modern LLMs.