Prerequisite · Calculus Foundations

Integration and the Fundamental Theorem

15 min read

Audio overview generated with

By the end of this reading you will be able to:

Explain what a definite integral computes geometrically and how it is defined as a limit of Riemann sums
Apply the Fundamental Theorem of Calculus to evaluate definite integrals using antiderivatives
Connect integration to ML: interpret probability density functions as distributions whose integral equals 1, and express expected value and entropy as integrals

The Accumulation Problem

Derivatives answer the question of rates of change. Integration answers the complementary question: given a rate of change, how much has accumulated over an interval?

If you know a car's speed at every moment, integration tells you the total distance traveled. If you know the density of a probability distribution at every point, integration tells you the total probability in any region. Both are the same mathematical operation.

Area Under a Curve

The definite integral of $f$ from $a$ to $b$ is written:

$\int_a^b f(x)\, dx$

Geometrically, it equals the signed area between the curve $f(x)$ and the $x$ -axis over $[a, b]$ . Regions above the axis contribute positive area; regions below contribute negative area.

Riemann Sums

The precise definition: divide $[a, b]$ into $n$ subintervals of width $\Delta x = (b-a)/n$ . In each subinterval, evaluate $f$ at some sample point $x_i^*$ and form a rectangle of height $f(x_i^*)$ and width $\Delta x$ .

The sum of all rectangle areas approximates the integral:

$\int_a^b f(x)\,dx = \lim_{n \to \infty} \sum_{i=1}^n f(x_i^*)\,\Delta x$

As $n \to \infty$ (rectangles get infinitely thin), the approximation becomes exact. This limit always exists for continuous functions.

The notation $dx$ in the integral is the limiting version of $\Delta x$ — an infinitesimally thin strip width. It signals which variable we are integrating over.

Antiderivatives

An antiderivative of $f(x)$ is any function $F(x)$ such that $F'(x) = f(x)$ . We can reverse the power rule:

$\int x^n\,dx = \frac{x^{n+1}}{n+1} + C \qquad (n \neq -1)$

The constant $C$ — the constant of integration — appears because differentiation destroys constant terms: any $F + C$ has the same derivative $f$ .

Key antiderivatives:

$f(x)$	$F(x) = \int f(x)\,dx$
$x^n$ ( $n \neq -1$ )	$\frac{x^{n+1}}{n+1} + C$
$e^x$	$e^x + C$
$\frac{1}{x}$	$\ln\|x\| + C$
$\cos x$	$\sin x + C$
$\sin x$	$-\cos x + C$

The Fundamental Theorem of Calculus

The most important result in calculus connects differentiation and integration:

$\int_a^b f(x)\,dx = F(b) - F(a)$

where $F$ is any antiderivative of $f$ ( $F' = f$ ). To compute a definite integral, you do not need to set up and take a limit of Riemann sums — you find an antiderivative, evaluate it at the two endpoints, and subtract.

Example. $\int_1^3 2x\,dx$

$F(x) = x^2$ is an antiderivative of $2x$ since $(x^2)' = 2x$ . $\int_1^3 2x\,dx = F(3) - F(1) = 9 - 1 = 8$

Geometrically: the area under $2x$ from 1 to 3 is the trapezoid with vertices at $(1,2)$ , $(3,6)$ , $(3,0)$ , $(1,0)$ — area $= \frac{1}{2}(2+6)(2) = 8$ . ✓

Example. $\int_0^1 e^x\,dx = e^1 - e^0 = e - 1 \approx 1.718$

Why This Matters for ML

Probability density functions. The probability foundation module (r1) stated that a PDF $f(x)$ must satisfy $\int_{-\infty}^{\infty} f(x)\,dx = 1$ . This normalization condition is an integral — and verifying it for distributions like the Gaussian requires the fundamental theorem applied to the antiderivative of $e^{-x^2}$ .

For any PDF, the probability of $X$ falling in $[a,b]$ is: $P(a \leq X \leq b) = \int_a^b f(x)\,dx$

Expected value. The expected value of a continuous random variable is an integral: $\mathbb{E}[X] = \int_{-\infty}^{\infty} x \cdot f(x)\,dx$

This is the continuous analogue of the discrete weighted average — infinitely many values, each weighted by their probability density.

Entropy. The entropy of a continuous distribution: $H = -\int_{-\infty}^{\infty} f(x) \ln f(x)\,dx$

KL divergence between distributions $p$ and $q$ : $D_{\text{KL}}(p \| q) = \int_{-\infty}^{\infty} p(x) \ln \frac{p(x)}{q(x)}\,dx$

All the probabilistic quantities that appear in the probability foundations module are, at their core, integrals. The notation $\sum$ in the discrete case and $\int$ in the continuous case are the same concept — summing a quantity weighted by probability — just over different kinds of domains.

The Relationship Between Derivatives and Integrals

The Fundamental Theorem has a second part that makes explicit the inverse relationship:

$\frac{d}{dx}\int_a^x f(t)\,dt = f(x)$

Differentiating the accumulated area function recovers the original function. Differentiation and integration undo each other — they are inverse operations, like multiplication and division.

This inverse relationship is why:

Gradient descent (differentiation) navigates the loss surface
Probability densities (integration) describe uncertainty

are both indispensable tools in ML despite being different operations.

PyTorch and TensorFlow

Numerical integration appears in ML for computing normalizing constants, estimating expectations via Monte Carlo, and evaluating metrics.

import torch
import numpy as np

# Numerical integration: P(X in [-1, 1]) for N(0,1)
# using the trapezoidal rule
x = torch.linspace(-1, 1, 1000)
f = torch.exp(-x**2 / 2) / (2 * torch.pi)**0.5  # N(0,1) PDF
prob = torch.trapezoid(f, x)
print(prob.item())  # ≈ 0.6827  (the 68% rule)

# Monte Carlo expected value: E[X²] for X ~ U(0,1)
samples = torch.rand(100_000)
print((samples**2).mean().item())  # ≈ 1/3  (exact: ∫₀¹ x² dx = 1/3)

import tensorflow as tf
import numpy as np

# Numerical integration using numpy (TF tensors use same approach)
x = np.linspace(-1, 1, 1000)
f = np.exp(-x**2 / 2) / np.sqrt(2 * np.pi)
prob = np.trapz(f, x)
print(prob)  # ≈ 0.6827

# Monte Carlo estimate of E[X²] for X ~ U(0,1)
samples = tf.random.uniform((100_000,))
print(tf.reduce_mean(samples**2).numpy())  # ≈ 0.333

Monte Carlo integration — estimating integrals by averaging function values at random points — is one of the most powerful and widely-used techniques in probabilistic ML. It converts the integral $\mathbb{E}_p[f(x)]$ into a sample average, making continuous expectations tractable even in high dimensions.

References

MIT 18.01SC — Sessions 43, 48 — Definite Integrals and the Fundamental Theorem of Calculus

Previous Take Quiz →

Integration and the Fundamental Theorem

The Accumulation Problem

Area Under a Curve

Riemann Sums

Antiderivatives

The Fundamental Theorem of Calculus

Why This Matters for ML

The Relationship Between Derivatives and Integrals

PyTorch and TensorFlow

Privacy Policy

What we collect

What we don't collect

Your choices

Contact