Supplement · Loss Functions

Multi-label & Margin Losses

12 min read
By the end of this reading you will be able to:
  • Distinguish the {0, 1} and {-1, +1} label conventions and select the correct margin or logistic loss for each
  • Apply SoftMarginLoss and MultiLabelSoftMarginLoss for single-label and multi-label binary classification respectively
  • Use MultiMarginLoss as a multi-class hinge loss and explain the margin constraint it enforces between the correct and incorrect class scores
  • Select among SoftMarginLoss, MultiLabelSoftMarginLoss, MultiMarginLoss, and MultiLabelMarginLoss using the decision guide for label type and output cardinality

Label Conventions: {0,1} vs {−1,+1}

Classification losses use two incompatible label conventions:

  • {0, 1} — used by BCE-family losses. y=1y = 1 means "positive", y=0y = 0 means "negative".
  • {−1, +1} — used by margin-based losses. y=+1y = +1 means "positive", y=1y = -1 means "negative".

Mixing these up is a silent bug: the wrong convention produces valid-looking numbers but trains in the wrong direction.


nn.SoftMarginLoss — Binary Logistic Loss with {−1,+1} Labels

SoftMarginLoss implements the logistic loss for labels y{1,+1}y \in \{-1, +1\}:

i=log(1+eyixi)\ell_i = \log\bigl(1 + e^{-y_i \cdot x_i}\bigr)

This is equivalent to BCEWithLogitsLoss under the label remapping y{1,+1}y{0,1}y_{\{-1,+1\}} \to y_{\{0,1\}}. When yi=+1y_i = +1 and xix_i is large, exi0e^{-x_i} \approx 0 and i0\ell_i \approx 0 (correct and confident). When yi=+1y_i = +1 and xix_i is very negative, exie^{-x_i} is huge and the loss is large.

loss = nn.SoftMarginLoss()
x = torch.tensor([2.0, -1.0, 0.5])
y = torch.tensor([1.0, -1.0, 1.0])  # labels in {-1, +1}
output = loss(x, y)

When to use: Binary classification where labels are naturally {−1, +1} (e.g., SVM-style data); direct replacement for BCEWithLogitsLoss when switching from {0,1} to {−1,+1} labeling.


nn.MultiLabelSoftMarginLoss — Independent Binary Classifiers

For multi-label problems where each sample can belong to any subset of CC classes, this loss applies SoftMarginLoss independently across all classes:

=1Cj=0C1[yjlogσ(xj)+(1yj)log(1σ(xj))]\ell = -\frac{1}{C}\sum_{j=0}^{C-1} \left[ y_j \log \sigma(x_j) + (1 - y_j) \log(1 - \sigma(x_j)) \right]

This is equivalent to averaging CC independent BCEWithLogitsLoss values. Labels y{0,1}y \in \{0, 1\} (not {−1, +1} despite the "Soft Margin" name).

loss = nn.MultiLabelSoftMarginLoss()
x = torch.randn(3, 5)                      # 3 samples, 5 classes
y = torch.zeros(3, 5).random_(2)           # binary multi-label targets
output = loss(x, y)

When to use: Multi-label image classification (e.g., an image can be both "dog" and "outdoor"); tagging tasks; any setting where classes are not mutually exclusive.


nn.MultiMarginLoss — Multi-class Hinge Loss

Hinge loss for single-label CC-class classification. Encourages the correct class score to exceed all incorrect class scores by a margin mm:

=1Cjcmax(0,mxc+xj)p\ell = \frac{1}{C} \sum_{j \ne c} \max\bigl(0,\, m - x_c + x_j\bigr)^p

where cc is the correct class, p{1,2}p \in \{1, 2\} is a power exponent, and m=1m = 1 by default. This is the multi-class SVM loss (also called "Crammer & Singer loss").

The loss is zero when xc>xj+mx_c > x_j + m for all wrong classes jj — i.e., when the correct class has a sufficient margin over all others. Otherwise it penalises proportionally.

loss = nn.MultiMarginLoss(p=1, margin=1.0)
x = torch.randn(4, 10)            # logits
y = torch.tensor([0, 3, 7, 2])    # correct class indices
output = loss(x, y)

When to use: When you want an SVM-style margin objective instead of softmax cross-entropy; structured prediction where inter-class margins matter.


nn.MultiLabelMarginLoss — Multi-label Pairwise Ranking

For multi-label problems where you have a set of correct class indices. It enforces that each positive class scores higher than each negative class by a margin of 1:

=1Y+iY+jYmax(0,1xi+xj)\ell = \frac{1}{|\mathcal{Y}^+|} \sum_{i \in \mathcal{Y}^+} \sum_{j \in \mathcal{Y}^-} \max\bigl(0, 1 - x_i + x_j\bigr)

where Y+\mathcal{Y}^+ is the set of positive class indices and Y\mathcal{Y}^- is the set of negative class indices.

Target format: a 1-D LongTensor of positive class indices, padded with 1-1 to a fixed length.

loss = nn.MultiLabelMarginLoss()
x = torch.FloatTensor([[0.1, 0.2, 0.4, 0.8]])   # 1 sample, 4 classes
y = torch.LongTensor([[3, 0, -1, -1]])           # classes 3 and 0 are positive
output = loss(x, y)

When to use: Document retrieval or recommendation where you have a small set of relevant items and many irrelevant ones; ranking-oriented multi-label tasks.


Decision Guide

Setting Labels Recommended Loss
Binary, one label per sample {0,1}\{0,1\} BCEWithLogitsLoss
Binary, labels are ±1 {1,+1}\{-1,+1\} SoftMarginLoss
Multi-label, independent per class {0,1}C\{0,1\}^C MultiLabelSoftMarginLoss
Multi-class, one label, SVM-style Integer cc MultiMarginLoss
Multi-label, ranking objective Positive index list MultiLabelMarginLoss