Norms, Dot Products, and Angles
- Compute the l1, l2, and l-inf norms of a vector and explain what each penalises geometrically
- Calculate the dot product of two vectors and use it to find the angle between them
- State the Cauchy-Schwarz inequality and explain why it justifies defining angle via the dot product
- Compute cosine similarity as a normalised dot product and interpret the score for embedding retrieval tasks
- Identify orthogonality from a zero dot product and explain its significance for projections
- Contrast l1 and l2 regularisation in terms of sparsity, geometry, and gradient behaviour
Length of a Vector
Given a vector , its Euclidean length (or norm) is:
This generalises the familiar Pythagorean theorem to dimensions. A vector has if and only if .
Scaling and the Unit Vector
For any scalar , scaling a vector scales its length: . A unit vector has length exactly 1. Given any nonzero , the vector
is the unit vector in the same direction — this process is called normalisation. Unit vectors matter because they carry only directional information, with magnitude removed.
The Dot Product
The dot product of is:
Notice that : the dot product of a vector with itself is the square of its length.
Algebraic Properties
For all and :
| Property | Statement |
|---|---|
| Commutativity | |
| Distributivity | |
| Scalar pull-out | |
| Positive-definiteness | , with equality iff |
These four properties define a general inner product on a vector space. The Euclidean dot product is the canonical example.
The Cauchy–Schwarz Inequality
For any :
Equality holds if and only if one vector is a scalar multiple of the other (they are parallel). Cauchy–Schwarz is the foundational inequality of linear algebra — it underlies the triangle inequality, the definition of angles, and the theory of Fourier series.
Angles Between Vectors
Because , the ratio always lies in . The angle between two nonzero vectors is the unique satisfying:
Key cases:
- : vectors point in the same direction,
- : vectors are orthogonal (perpendicular),
- : vectors point in opposite directions,
Orthogonality
Two vectors are orthogonal if . Orthogonality is the algebraic definition of perpendicularity — it requires no notion of angle, just the dot product. A set of vectors is orthogonal if every pair is orthogonal, and orthonormal if additionally every vector has unit length.
The standard basis is orthonormal: (the Kronecker delta).
General Norms
Beyond Euclidean length, ML regularly uses:
| Norm | Formula | Intuition |
|---|---|---|
| (Manhattan) | Sum of absolute values; sparsity-promoting | |
| (Euclidean) | Geometric length | |
| (Max) | Largest component | |
| Continuous family; is |
All norms satisfy the triangle inequality .
ML Connections
Cosine Similarity
The cosine of the angle between two vectors depends only on direction, not magnitude:
In embedding models (word2vec, BERT, CLIP), each token or image is encoded as a high-dimensional vector. Cosine similarity measures semantic relatedness: means identical direction (synonyms), means unrelated (orthogonal), means opposite meaning (antonyms).
Regularisation
regularisation (weight decay) penalises to prevent large weights. regularisation penalises and promotes sparsity (many weights exactly zero). Both are norm-based geometric constraints on the weight space.
Distance Metrics
Euclidean distance between two points is . -NN, -means, and many other algorithms are expressed in terms of Euclidean or distances — which is why feature scaling (normalising columns to unit variance) matters: it prevents one feature's scale from dominating the norm.
Computing Norms and Similarities
import numpy as np
u = np.array([3., 1., -2.])
v = np.array([1., 4., 1.])
# L2 norm
norm_u = np.linalg.norm(u) # sqrt(9+1+4) = sqrt(14)
norm_v = np.linalg.norm(v) # sqrt(1+16+1) = sqrt(18)
print(f'|u| = {norm_u:.4f}') # 3.7417
print(f'|v| = {norm_v:.4f}') # 4.2426
# Dot product
dot = np.dot(u, v) # 3*1 + 1*4 + (-2)*1 = 5
print(f'u · v = {dot}') # 5
# Angle
cosine = dot / (norm_u * norm_v)
angle_rad = np.arccos(np.clip(cosine, -1, 1))
angle_deg = np.degrees(angle_rad)
print(f'cos θ = {cosine:.4f}') # 0.3143
print(f'θ = {angle_deg:.2f}°') # 71.67°
# Unit vectors (normalisation)
u_hat = u / norm_u
v_hat = v / norm_v
print(f'|û| = {np.linalg.norm(u_hat):.6f}') # 1.000000
# Cosine similarity via unit-vector dot product
cos_sim = np.dot(u_hat, v_hat)
print(f'cosine similarity = {cos_sim:.4f}') # 0.3143 (same as cosine above)
# L1 and Linf norms
print(f'||u||_1 = {np.linalg.norm(u, ord=1)}') # 6.0
print(f'||u||_inf = {np.linalg.norm(u, ord=np.inf)}') # 3.0
# Cauchy-Schwarz check: |u · v| <= |u| * |v|
print(f'|u · v| = {abs(dot):.4f} <= |u||v| = {norm_u*norm_v:.4f}') # True