Supplement · Loss Functions

Multi-label & Margin Losses

12 min read

By the end of this reading you will be able to:

Distinguish the {0, 1} and {-1, +1} label conventions and select the correct margin or logistic loss for each
Apply SoftMarginLoss and MultiLabelSoftMarginLoss for single-label and multi-label binary classification respectively
Use MultiMarginLoss as a multi-class hinge loss and explain the margin constraint it enforces between the correct and incorrect class scores
Select among SoftMarginLoss, MultiLabelSoftMarginLoss, MultiMarginLoss, and MultiLabelMarginLoss using the decision guide for label type and output cardinality

Label Conventions: {0,1} vs {−1,+1}

Classification losses use two incompatible label conventions:

{0, 1} — used by BCE-family losses. $y = 1$ means "positive", $y = 0$ means "negative".
{−1, +1} — used by margin-based losses. $y = +1$ means "positive", $y = -1$ means "negative".

Mixing these up is a silent bug: the wrong convention produces valid-looking numbers but trains in the wrong direction.

nn.SoftMarginLoss — Binary Logistic Loss with {−1,+1} Labels

SoftMarginLoss implements the logistic loss for labels $y \in \{-1, +1\}$ :

$\ell_i = \log\bigl(1 + e^{-y_i \cdot x_i}\bigr)$

This is equivalent to BCEWithLogitsLoss under the label remapping $y_{\{-1,+1\}} \to y_{\{0,1\}}$ . When $y_i = +1$ and $x_i$ is large, $e^{-x_i} \approx 0$ and $\ell_i \approx 0$ (correct and confident). When $y_i = +1$ and $x_i$ is very negative, $e^{-x_i}$ is huge and the loss is large.

loss = nn.SoftMarginLoss()
x = torch.tensor([2.0, -1.0, 0.5])
y = torch.tensor([1.0, -1.0, 1.0])  # labels in {-1, +1}
output = loss(x, y)

When to use: Binary classification where labels are naturally {−1, +1} (e.g., SVM-style data); direct replacement for BCEWithLogitsLoss when switching from {0,1} to {−1,+1} labeling.

nn.MultiLabelSoftMarginLoss — Independent Binary Classifiers

For multi-label problems where each sample can belong to any subset of $C$ classes, this loss applies SoftMarginLoss independently across all classes:

$\ell = -\frac{1}{C}\sum_{j=0}^{C-1} \left[ y_j \log \sigma(x_j) + (1 - y_j) \log(1 - \sigma(x_j)) \right]$

This is equivalent to averaging $C$ independent BCEWithLogitsLoss values. Labels $y \in \{0, 1\}$ (not {−1, +1} despite the "Soft Margin" name).

loss = nn.MultiLabelSoftMarginLoss()
x = torch.randn(3, 5)                      # 3 samples, 5 classes
y = torch.zeros(3, 5).random_(2)           # binary multi-label targets
output = loss(x, y)

When to use: Multi-label image classification (e.g., an image can be both "dog" and "outdoor"); tagging tasks; any setting where classes are not mutually exclusive.

nn.MultiMarginLoss — Multi-class Hinge Loss

Hinge loss for single-label $C$ -class classification. Encourages the correct class score to exceed all incorrect class scores by a margin $m$ :

$\ell = \frac{1}{C} \sum_{j \ne c} \max\bigl(0,\, m - x_c + x_j\bigr)^p$

where $c$ is the correct class, $p \in \{1, 2\}$ is a power exponent, and $m = 1$ by default. This is the multi-class SVM loss (also called "Crammer & Singer loss").

The loss is zero when $x_c > x_j + m$ for all wrong classes $j$ — i.e., when the correct class has a sufficient margin over all others. Otherwise it penalises proportionally.

loss = nn.MultiMarginLoss(p=1, margin=1.0)
x = torch.randn(4, 10)            # logits
y = torch.tensor([0, 3, 7, 2])    # correct class indices
output = loss(x, y)

When to use: When you want an SVM-style margin objective instead of softmax cross-entropy; structured prediction where inter-class margins matter.

nn.MultiLabelMarginLoss — Multi-label Pairwise Ranking

For multi-label problems where you have a set of correct class indices. It enforces that each positive class scores higher than each negative class by a margin of 1:

$\ell = \frac{1}{|\mathcal{Y}^+|} \sum_{i \in \mathcal{Y}^+} \sum_{j \in \mathcal{Y}^-} \max\bigl(0, 1 - x_i + x_j\bigr)$

where $\mathcal{Y}^+$ is the set of positive class indices and $\mathcal{Y}^-$ is the set of negative class indices.

Target format: a 1-D LongTensor of positive class indices, padded with $-1$ to a fixed length.

loss = nn.MultiLabelMarginLoss()
x = torch.FloatTensor([[0.1, 0.2, 0.4, 0.8]])   # 1 sample, 4 classes
y = torch.LongTensor([[3, 0, -1, -1]])           # classes 3 and 0 are positive
output = loss(x, y)

When to use: Document retrieval or recommendation where you have a small set of relevant items and many irrelevant ones; ranking-oriented multi-label tasks.

Decision Guide

Setting	Labels	Recommended Loss
Binary, one label per sample	$\{0,1\}$	BCEWithLogitsLoss
Binary, labels are ±1	$\{-1,+1\}$	SoftMarginLoss
Multi-label, independent per class	$\{0,1\}^C$	MultiLabelSoftMarginLoss
Multi-class, one label, SVM-style	Integer $c$	MultiMarginLoss
Multi-label, ranking objective	Positive index list	MultiLabelMarginLoss

References

[1] — nn.SoftMarginLoss — PyTorch docs

[2] — nn.MultiLabelSoftMarginLoss — PyTorch docs

[3] — nn.MultiMarginLoss — PyTorch docs

[4] — nn.MultiLabelMarginLoss — PyTorch docs

Previous Next →

Multi-label & Margin Losses

Label Conventions: {0,1} vs {−1,+1}

nn.SoftMarginLoss — Binary Logistic Loss with {−1,+1} Labels

nn.MultiLabelSoftMarginLoss — Independent Binary Classifiers

nn.MultiMarginLoss — Multi-class Hinge Loss

nn.MultiLabelMarginLoss — Multi-label Pairwise Ranking

Decision Guide

Privacy Policy

What we collect

What we don't collect

Your choices

Contact