Metric Learning: Triplet Losses
- State the triplet constraint: the anchor-positive distance must be smaller than the anchor-negative distance by at least margin alpha
- Apply TripletMarginLoss with a chosen p-norm and margin, and verify that a satisfied constraint contributes zero loss
- Use TripletMarginWithDistanceLoss to plug in a custom distance function such as cosine distance
- Explain easy, hard, and semi-hard triplet mining and why random triplet selection leads to collapsed training signal
- Compare triplet loss, contrastive loss, and NT-Xent in terms of the number of negatives used per update and training efficiency
What Is Metric Learning?
The goal of metric learning is to train an embedding function such that semantically similar inputs map to nearby points in the embedding space, and dissimilar inputs map far apart.
Unlike classification losses, metric learning losses operate on triplets or pairs of samples — they encode relative rather than absolute correctness.
The Triplet Constraint
A triplet consists of:
- Anchor : the reference sample
- Positive : a sample from the same class/cluster as
- Negative : a sample from a different class/cluster
The embedding space should satisfy — anchor is closer to positive than to negative. The triplet loss enforces this with a margin :
The loss is zero when the anchor-negative distance exceeds the anchor-positive distance by at least . Otherwise it penalises the shortfall linearly.
nn.TripletMarginLoss
Default distance is norm: .
triplet_loss = nn.TripletMarginLoss(margin=1.0, p=2) # Euclidean distance
anchor = torch.randn(32, 128, requires_grad=True) # 32 samples, 128-dim
positive = torch.randn(32, 128, requires_grad=True)
negative = torch.randn(32, 128, requires_grad=True)
loss = triplet_loss(anchor, positive, negative)
# loss > 0 when d(a,p) - d(a,n) + 1 > 0 for any triplet
The swap parameter enables the distance swap heuristic: replace with , which tightens the constraint when the positive and negative are very close.
When to use: Face recognition (FaceNet); image retrieval (Pinterest, Google Images); few-shot learning (prototypical networks need similar training signal).
nn.TripletMarginWithDistanceLoss
Same triplet formulation but accepts any differentiable distance function:
# Cosine distance: 1 - cosine_similarity
def cosine_dist(u, v):
return 1.0 - F.cosine_similarity(u, v)
triplet_loss = nn.TripletMarginWithDistanceLoss(
distance_function=cosine_dist,
margin=0.2
)
loss = triplet_loss(anchor, positive, negative)
The custom distance function must accept two tensors of shape and return a tensor of shape . It must be differentiable (autograd-compatible).
When to use: Sentence embeddings (cosine distance more natural than Euclidean); normalised embeddings on a unit hypersphere; any domain where distance is inappropriate.
Triplet Mining
Naively sampling random triplets is inefficient — most will produce zero loss because random negatives are already far from the anchor. In practice, triplets are mined:
| Strategy | Description | Convergence |
|---|---|---|
| Easy triplets | — already satisfied | No gradient; skip |
| Semi-hard triplets | Moderate; recommended | |
| Hard triplets | — negative is closer than positive | Large gradient; can destabilise early training |
FaceNet found semi-hard mining the most effective for stable training.
Comparison: Triplet vs. Contrastive vs. NT-Xent
| Loss | Inputs per step | Key property |
|---|---|---|
| TripletMarginLoss | Anchor, Positive, Negative | Relative ordering; easy to interpret |
| HingeEmbeddingLoss | Single distance + label | Simpler; only binary similar/dissimilar |
| NT-Xent (SimCLR) | Full batch, all pairs | Uses all negatives in the batch; state-of-the-art for self-supervised |
For supervised metric learning with labelled data, TripletMarginWithDistanceLoss with cosine distance is a strong baseline.