Probabilistic Regression: Poisson & Gaussian NLL
- Derive any regression loss from the negative log-likelihood of an assumed output distribution
- Apply PoissonNLLLoss to count-valued targets and explain when the log_input flag must be set
- Use GaussianNLLLoss to train a model that predicts both mean and variance and interpret the heteroscedastic output
- Explain why predicting variance alongside mean improves model calibration and enables uncertainty quantification
From Point Estimates to Distributions
L1 and MSE assume the model outputs a single number — a point estimate of the target. Probabilistic losses go further: the model outputs the parameters of a probability distribution over the target, and the loss is the negative log-likelihood (NLL) of the observed target under that distribution.
This lets the model express uncertainty: when the target is ambiguous, the predicted variance should be large.
Deriving a Loss from Maximum Likelihood
Suppose the target is drawn from a distribution parameterised by the model output . Maximum likelihood estimation (MLE) maximises
Taking the logarithm (which is monotone, so maximising log-likelihood is equivalent) and negating to turn maximisation into minimisation:
This is the negative log-likelihood objective. Every probabilistic loss in PyTorch is a special case.
nn.PoissonNLLLoss — Count Data
When targets are non-negative integer counts (word frequencies, photon arrivals, events per interval), the natural model is a Poisson distribution:
The network predicts (to ensure ). The NLL is:
Dropping the constant (it does not depend on the model) gives the default loss:
With full=True, PyTorch adds Stirling's approximation for the factorial term: .
loss = nn.PoissonNLLLoss(log_input=True) # input is log(λ)
x = torch.tensor([0.5, 1.2]) # log(λ)
y = torch.tensor([1.0, 3.0]) # count targets
output = loss(x, y)
# ℓ₁ = exp(0.5) − 1·0.5 = 1.649 − 0.5 = 1.149
# ℓ₂ = exp(1.2) − 3·1.2 = 3.320 − 3.6 = −0.280
When to use: NLP word counts; medical event rates; any target that is a non-negative integer following a Poisson process.
nn.GaussianNLLLoss — Heteroscedastic Regression
When the target is real-valued but the observation noise varies across samples, model the target as
The network predicts both (the mean, called input) and (the variance, called var). The NLL is:
Dropping the constant :
The model learns a trade-off: if it is very uncertain ( large), the squared-error term is down-weighted but the log-variance term increases. If it is very confident ( small), the log-variance is small but squared error is amplified.
loss = nn.GaussianNLLLoss()
mean = torch.tensor([1.0, 2.0]) # predicted μ
var = torch.tensor([0.5, 2.0]) # predicted σ² (must be > 0)
target = torch.tensor([1.2, 1.0])
output = loss(mean, target, var)
PyTorch adds a small eps to var for numerical stability.
When to use: Uncertainty-aware regression; weather forecasting; any setting where prediction confidence should be data-driven.
Why Predict Variance?
In standard MSE, the model always acts as though it is equally confident about every prediction. In GaussianNLL, a well-trained model learns:
- For easy, predictable samples → small → tight distribution
- For ambiguous, noisy samples → large → wide distribution
The calibrated uncertainty can then be used for downstream decisions (e.g., active learning, safety-critical rejection).