Prerequisite · Linear Algebra

Norms, Dot Products, and Angles

15 min read

By the end of this reading you will be able to:

Compute the l1, l2, and l-inf norms of a vector and explain what each penalises geometrically
Calculate the dot product of two vectors and use it to find the angle between them
State the Cauchy-Schwarz inequality and explain why it justifies defining angle via the dot product
Compute cosine similarity as a normalised dot product and interpret the score for embedding retrieval tasks
Identify orthogonality from a zero dot product and explain its significance for projections
Contrast l1 and l2 regularisation in terms of sparsity, geometry, and gradient behaviour

Length of a Vector

Given a vector $\vec{v} = (v_1, v_2, \ldots, v_n)^T \in \mathbb{R}^n$ , its Euclidean length (or $\ell^2$ norm) is:

$|\vec{v}| = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2} = \sqrt{\sum_{i=1}^{n} v_i^2}$

This generalises the familiar Pythagorean theorem to $n$ dimensions. A vector $\vec{v}$ has $|\vec{v}| = 0$ if and only if $\vec{v} = \vec{0}$ .

Scaling and the Unit Vector

For any scalar $c \geq 0$ , scaling a vector scales its length: $|c\vec{v}| = c|\vec{v}|$ . A unit vector has length exactly 1. Given any nonzero $\vec{v}$ , the vector

$\hat{v} = \frac{\vec{v}}{|\vec{v}|}$

is the unit vector in the same direction — this process is called normalisation. Unit vectors matter because they carry only directional information, with magnitude removed.

The Dot Product

The dot product of $\vec{u}, \vec{v} \in \mathbb{R}^n$ is:

$\vec{u} \cdot \vec{v} = u_1 v_1 + u_2 v_2 + \cdots + u_n v_n = \sum_{i=1}^{n} u_i v_i$

Notice that $\vec{v} \cdot \vec{v} = |\vec{v}|^2$ : the dot product of a vector with itself is the square of its length.

Algebraic Properties

For all $\vec{u}, \vec{v}, \vec{w} \in \mathbb{R}^n$ and $c \in \mathbb{R}$ :

Property	Statement
Commutativity	$\vec{u} \cdot \vec{v} = \vec{v} \cdot \vec{u}$
Distributivity	$(\vec{u} + \vec{v}) \cdot \vec{w} = \vec{u} \cdot \vec{w} + \vec{v} \cdot \vec{w}$
Scalar pull-out	$(c\vec{u}) \cdot \vec{v} = c(\vec{u} \cdot \vec{v})$
Positive-definiteness	$\vec{v} \cdot \vec{v} \geq 0$ , with equality iff $\vec{v} = \vec{0}$

These four properties define a general inner product on a vector space. The Euclidean dot product is the canonical example.

The Cauchy–Schwarz Inequality

For any $\vec{u}, \vec{v} \in \mathbb{R}^n$ :

$|\vec{u} \cdot \vec{v}| \leq |\vec{u}|\,|\vec{v}|$

Equality holds if and only if one vector is a scalar multiple of the other (they are parallel). Cauchy–Schwarz is the foundational inequality of linear algebra — it underlies the triangle inequality, the definition of angles, and the theory of Fourier series.

Angles Between Vectors

Because $|\vec{u} \cdot \vec{v}| \leq |\vec{u}||\vec{v}|$ , the ratio $\frac{\vec{u} \cdot \vec{v}}{|\vec{u}||\vec{v}|}$ always lies in $[-1, 1]$ . The angle $\theta$ between two nonzero vectors is the unique $\theta \in [0, \pi]$ satisfying:

$\cos\theta = \frac{\vec{u} \cdot \vec{v}}{|\vec{u}|\,|\vec{v}|}$

Key cases:

$\theta = 0$ : vectors point in the same direction, $\vec{u} \cdot \vec{v} = |\vec{u}||\vec{v}|$
$\theta = \pi/2$ : vectors are orthogonal (perpendicular), $\vec{u} \cdot \vec{v} = 0$
$\theta = \pi$ : vectors point in opposite directions, $\vec{u} \cdot \vec{v} = -|\vec{u}||\vec{v}|$

Orthogonality

Two vectors are orthogonal if $\vec{u} \cdot \vec{v} = 0$ . Orthogonality is the algebraic definition of perpendicularity — it requires no notion of angle, just the dot product. A set of vectors is orthogonal if every pair is orthogonal, and orthonormal if additionally every vector has unit length.

The standard basis $\{\vec{e}_1, \ldots, \vec{e}_n\}$ is orthonormal: $\vec{e}_i \cdot \vec{e}_j = \delta_{ij}$ (the Kronecker delta).

General Norms

Beyond Euclidean length, ML regularly uses:

Norm	Formula	Intuition
$\ell^1$ (Manhattan)	$\\|\vec{v}\\|_1 = \sum_i \|v_i\|$	Sum of absolute values; sparsity-promoting
$\ell^2$ (Euclidean)	$\\|\vec{v}\\|_2 = \sqrt{\sum_i v_i^2}$	Geometric length
$\ell^\infty$ (Max)	$\\|\vec{v}\\|_\infty = \max_i \|v_i\|$	Largest component
$\ell^p$	$\\|\vec{v}\\|_p = (\sum_i \|v_i\|^p)^{1/p}$	Continuous family; $\ell^2$ is $p=2$

All norms satisfy the triangle inequality $\|\vec{u}+\vec{v}\| \leq \|\vec{u}\| + \|\vec{v}\|$ .

ML Connections

Cosine Similarity

The cosine of the angle between two vectors depends only on direction, not magnitude:

$\text{cosine\_sim}(\vec{u}, \vec{v}) = \frac{\vec{u} \cdot \vec{v}}{|\vec{u}||\vec{v}|} = \hat{u} \cdot \hat{v}$

In embedding models (word2vec, BERT, CLIP), each token or image is encoded as a high-dimensional vector. Cosine similarity measures semantic relatedness: $+1$ means identical direction (synonyms), $0$ means unrelated (orthogonal), $-1$ means opposite meaning (antonyms).

Regularisation

$\ell^2$ regularisation (weight decay) penalises $\lambda\|\vec{w}\|_2^2$ to prevent large weights. $\ell^1$ regularisation penalises $\lambda\|\vec{w}\|_1$ and promotes sparsity (many weights exactly zero). Both are norm-based geometric constraints on the weight space.

Distance Metrics

Euclidean distance between two points $\vec{x}, \vec{y}$ is $|\vec{x} - \vec{y}|$ . $k$ -NN, $k$ -means, and many other algorithms are expressed in terms of Euclidean or $\ell^p$ distances — which is why feature scaling (normalising columns to unit variance) matters: it prevents one feature's scale from dominating the norm.

Computing Norms and Similarities

import numpy as np

u = np.array([3., 1., -2.])
v = np.array([1., 4.,  1.])

# L2 norm
norm_u = np.linalg.norm(u)          # sqrt(9+1+4) = sqrt(14)
norm_v = np.linalg.norm(v)          # sqrt(1+16+1) = sqrt(18)
print(f'|u| = {norm_u:.4f}')        # 3.7417
print(f'|v| = {norm_v:.4f}')        # 4.2426

# Dot product
dot = np.dot(u, v)                  # 3*1 + 1*4 + (-2)*1 = 5
print(f'u · v = {dot}')             # 5

# Angle
cosine = dot / (norm_u * norm_v)
angle_rad = np.arccos(np.clip(cosine, -1, 1))
angle_deg = np.degrees(angle_rad)
print(f'cos θ = {cosine:.4f}')      # 0.3143
print(f'θ     = {angle_deg:.2f}°') # 71.67°

# Unit vectors (normalisation)
u_hat = u / norm_u
v_hat = v / norm_v
print(f'|û| = {np.linalg.norm(u_hat):.6f}')  # 1.000000

# Cosine similarity via unit-vector dot product
cos_sim = np.dot(u_hat, v_hat)
print(f'cosine similarity = {cos_sim:.4f}')  # 0.3143 (same as cosine above)

# L1 and Linf norms
print(f'||u||_1   = {np.linalg.norm(u, ord=1)}')   # 6.0
print(f'||u||_inf = {np.linalg.norm(u, ord=np.inf)}') # 3.0

# Cauchy-Schwarz check: |u · v| <= |u| * |v|
print(f'|u · v| = {abs(dot):.4f}  <=  |u||v| = {norm_u*norm_v:.4f}')  # True

References

Hefferon 2020 — Linear Algebra, Ch. One §II: Linear Geometry of n-Space

Previous Take Quiz →