Prerequisite · Linear Algebra

Orthogonality and Projection

18 min read
By the end of this reading you will be able to:
  • Compute the orthogonal projection of a vector onto a line and verify that the residual is perpendicular to the line
  • Apply Gram-Schmidt orthogonalisation to produce an orthonormal basis from an arbitrary spanning set
  • Build the projection matrix P = QQ^T for a subspace and verify it is idempotent and symmetric
  • Derive the least-squares solution via the normal equations A^T A x = A^T b and interpret the residual geometrically
  • Interpret the hat matrix H = A(A^T A)^{-1} A^T as an orthogonal projector and explain what it means for fitted values and residuals

Orthogonality

Two vectors u,vRn\vec{u}, \vec{v} \in \mathbb{R}^n are orthogonal if their dot product is zero:

uv    uv=i=1nuivi=0\vec{u} \perp \vec{v} \iff \vec{u} \cdot \vec{v} = \sum_{i=1}^n u_i v_i = 0

The norm (length) of a vector is v=vv\|\vec{v}\| = \sqrt{\vec{v} \cdot \vec{v}}. A unit vector has v^=1\|\hat{v}\| = 1; any nonzero vector can be normalized: v^=v/v\hat{v} = \vec{v}/\|\vec{v}\|.

A set {q1,,qk}\{\vec{q}_1, \ldots, \vec{q}_k\} is orthonormal if qiqj=δij\vec{q}_i \cdot \vec{q}_j = \delta_{ij} (1 if i=ji=j, 0 otherwise). Orthonormal sets are automatically linearly independent.

Orthogonal Projection onto a Line

Given a line through the origin spanned by v\vec{v}, the orthogonal projection of u\vec{u} onto this line is the point on the line closest to u\vec{u}:

projv(u)=uvvvv\text{proj}_{\vec{v}}(\vec{u}) = \frac{\vec{u} \cdot \vec{v}}{\vec{v} \cdot \vec{v}}\,\vec{v}

The scalar uvv2\frac{\vec{u} \cdot \vec{v}}{\|\vec{v}\|^2} is the signed component of u\vec{u} along v\vec{v}. The error (residual) uprojv(u)\vec{u} - \text{proj}_{\vec{v}}(\vec{u}) is orthogonal to v\vec{v} — this is the defining geometric property of orthogonal projection.

As a matrix: for a unit vector v^\hat{v}, the projection is v^(v^u)=(v^v^)u\hat{v}(\hat{v}'\vec{u}) = (\hat{v}\hat{v}')\vec{u}. The matrix P=v^v^P = \hat{v}\hat{v}' is the rank-1 projection matrix onto the line. It satisfies P2=PP^2 = P (idempotent) and P=PP = P' (symmetric).

Gram-Schmidt Orthogonalization

Gram-Schmidt converts any basis {v1,,vn}\{\vec{v}_1, \ldots, \vec{v}_n\} into an orthonormal basis {q1,,qn}\{\vec{q}_1, \ldots, \vec{q}_n\} spanning the same space:

Step 1: u1=v1\vec{u}_1 = \vec{v}_1, then q1=u1/u1\vec{q}_1 = \vec{u}_1/\|\vec{u}_1\|

Step kk (for k=2,3,,nk = 2, 3, \ldots, n): subtract the projections onto all previous qi\vec{q}_i:

uk=vki=1k1(vkqi)qi,qk=uk/uk\vec{u}_k = \vec{v}_k - \sum_{i=1}^{k-1} (\vec{v}_k \cdot \vec{q}_i)\,\vec{q}_i, \qquad \vec{q}_k = \vec{u}_k / \|\vec{u}_k\|

Each uk\vec{u}_k is the component of vk\vec{v}_k orthogonal to the subspace spanned by v1,,vk1\vec{v}_1, \ldots, \vec{v}_{k-1}.

QR decomposition: Gram-Schmidt applied to the columns of AA produces A=QRA = QR where QQ has orthonormal columns and RR is upper triangular. This is used in least-squares solvers and eigenvalue algorithms.

Projection into a Subspace

Let WW be a subspace of Rn\mathbb{R}^n with orthonormal basis {q1,,qk}\{\vec{q}_1, \ldots, \vec{q}_k\}. The orthogonal projection of u\vec{u} onto WW is:

projW(u)=i=1k(uqi)qi=QQu\text{proj}_W(\vec{u}) = \sum_{i=1}^k (\vec{u} \cdot \vec{q}_i)\,\vec{q}_i = QQ'\vec{u}

where Q=[q1qk]Q = [\vec{q}_1 \mid \cdots \mid \vec{q}_k]. The projection matrix PW=QQP_W = QQ' satisfies:

  • PW2=PWP_W^2 = P_W (projecting twice does nothing)
  • PW=PWP_W' = P_W (it is symmetric)
  • rank(PW)=k=dim(W)\text{rank}(P_W) = k = \dim(W)

The complement uPWu\vec{u} - P_W \vec{u} is orthogonal to every vector in WW.

Least Squares: The Geometry

When the system Ax=bA\vec{x} = \vec{b} has no solution (as happens in overdetermined systems with more equations than unknowns), the best we can do is find x^\hat{x} minimizing Ax^b2\|A\hat{x} - \vec{b}\|^2.

Geometrically: Ax^A\hat{x} is the projection of b\vec{b} onto the column space of AA. The residual bAx^\vec{b} - A\hat{x} must be orthogonal to all columns of AA:

A(bAx^)=0    AAx^=AbA'(\vec{b} - A\hat{x}) = \vec{0} \implies A'A\hat{x} = A'\vec{b}

These are the normal equations. When AA has full column rank, AAA'A is invertible and the unique least-squares solution is:

x^=(AA)1Ab\hat{x} = (A'A)^{-1}A'\vec{b}

The matrix P=A(AA)1AP = A(A'A)^{-1}A' is the projection matrix onto the column space of AA — the hat matrix familiar from regression. This is the linear algebra foundation of ordinary least squares, and by extension of any optimization problem whose solution is a projection.

References
Hefferon 2020 — Linear Algebra, Ch. Three §VI: Projection, Gram-Schmidt