Integration and the Fundamental Theorem
- Explain what a definite integral computes geometrically and how it is defined as a limit of Riemann sums
- Apply the Fundamental Theorem of Calculus to evaluate definite integrals using antiderivatives
- Connect integration to ML: interpret probability density functions as distributions whose integral equals 1, and express expected value and entropy as integrals
The Accumulation Problem
Derivatives answer the question of rates of change. Integration answers the complementary question: given a rate of change, how much has accumulated over an interval?
If you know a car's speed at every moment, integration tells you the total distance traveled. If you know the density of a probability distribution at every point, integration tells you the total probability in any region. Both are the same mathematical operation.
Area Under a Curve
The definite integral of from to is written:
Geometrically, it equals the signed area between the curve and the -axis over . Regions above the axis contribute positive area; regions below contribute negative area.
Riemann Sums
The precise definition: divide into subintervals of width . In each subinterval, evaluate at some sample point and form a rectangle of height and width .
The sum of all rectangle areas approximates the integral:
As (rectangles get infinitely thin), the approximation becomes exact. This limit always exists for continuous functions.
The notation in the integral is the limiting version of — an infinitesimally thin strip width. It signals which variable we are integrating over.
Antiderivatives
An antiderivative of is any function such that . We can reverse the power rule:
The constant — the constant of integration — appears because differentiation destroys constant terms: any has the same derivative .
Key antiderivatives:
| () | |
The Fundamental Theorem of Calculus
The most important result in calculus connects differentiation and integration:
where is any antiderivative of (). To compute a definite integral, you do not need to set up and take a limit of Riemann sums — you find an antiderivative, evaluate it at the two endpoints, and subtract.
Example.
is an antiderivative of since .
Geometrically: the area under from 1 to 3 is the trapezoid with vertices at , , , — area . ✓
Example.
Why This Matters for ML
Probability density functions. The probability foundation module (r1) stated that a PDF must satisfy . This normalization condition is an integral — and verifying it for distributions like the Gaussian requires the fundamental theorem applied to the antiderivative of .
For any PDF, the probability of falling in is:
Expected value. The expected value of a continuous random variable is an integral:
This is the continuous analogue of the discrete weighted average — infinitely many values, each weighted by their probability density.
Entropy. The entropy of a continuous distribution:
KL divergence between distributions and :
All the probabilistic quantities that appear in the probability foundations module are, at their core, integrals. The notation in the discrete case and in the continuous case are the same concept — summing a quantity weighted by probability — just over different kinds of domains.
The Relationship Between Derivatives and Integrals
The Fundamental Theorem has a second part that makes explicit the inverse relationship:
Differentiating the accumulated area function recovers the original function. Differentiation and integration undo each other — they are inverse operations, like multiplication and division.
This inverse relationship is why:
- Gradient descent (differentiation) navigates the loss surface
- Probability densities (integration) describe uncertainty
are both indispensable tools in ML despite being different operations.
PyTorch and TensorFlow
Numerical integration appears in ML for computing normalizing constants, estimating expectations via Monte Carlo, and evaluating metrics.
import torch
import numpy as np
# Numerical integration: P(X in [-1, 1]) for N(0,1)
# using the trapezoidal rule
x = torch.linspace(-1, 1, 1000)
f = torch.exp(-x**2 / 2) / (2 * torch.pi)**0.5 # N(0,1) PDF
prob = torch.trapezoid(f, x)
print(prob.item()) # ≈ 0.6827 (the 68% rule)
# Monte Carlo expected value: E[X²] for X ~ U(0,1)
samples = torch.rand(100_000)
print((samples**2).mean().item()) # ≈ 1/3 (exact: ∫₀¹ x² dx = 1/3)
import tensorflow as tf
import numpy as np
# Numerical integration using numpy (TF tensors use same approach)
x = np.linspace(-1, 1, 1000)
f = np.exp(-x**2 / 2) / np.sqrt(2 * np.pi)
prob = np.trapz(f, x)
print(prob) # ≈ 0.6827
# Monte Carlo estimate of E[X²] for X ~ U(0,1)
samples = tf.random.uniform((100_000,))
print(tf.reduce_mean(samples**2).numpy()) # ≈ 0.333
Monte Carlo integration — estimating integrals by averaging function values at random points — is one of the most powerful and widely-used techniques in probabilistic ML. It converts the integral into a sample average, making continuous expectations tractable even in high dimensions.