1 Introduction
Deep learning models exhibit high vulnerability to adversarial examples, potentially leading to severe security issues [
1–
3]. Among various defenses, adversarial training is the most effective one that stands the test of time [
4,
5]. Incorporating adversarial examples into the training process can significantly enhance model robustness. However, existing adversarial training algorithms [
6–
8] mainly focus on the average (overall) robustness of all classes without considering class-wise robustness. Recent research has demonstrated that adversarially trained models exhibit unbalanced class-wise robustness across various settings, even in balanced datasets [
9,
10]. For instance, a robust model may demonstrate a robust error rate of 35.3% for class 1, in contrast to 83.8% for class 3 (Fig.1(a)). This unfair phenomenon, termed robust fairness [
10], poses two serious problems:
Fig.1 Demonstration of unfairness in adversarial training with a robust PreAct ResNet18 on CIFAR10. We show the class-wise/average robust errors of (a) PGD Adv. Training [6] and (b) our FairAT. Larger deviations in class-wise robustness relative to average robustness indicate more serious unfairness. The robust errors are evaluated by PGD-20 attack () following [10] |
Full size|PPT slide
● The barrel principle: The least robust class might become the primary target for attackers, as it represents the model’s weakest link. In this case, its robustness could be regarded as the actual robustness of the model. Given its high error rate, for example, 83.8%, the adversarially trained model is substantially vulnerable.
●
Ethical concerns: Unfairness in deep learning can lead to serious social problems [
11–
13]. Unbalanced class-wise robustness may also raise ethical concerns among different demographic groups, such as low-income populations, due to the differing protection levels afforded to them.
Therefore, unfair robustness is critical, potentially compromising the ethical and practical efficacy of adversarial training in real-world applications. Considering its harmful impact, researchers have recently begun to study this issue. Tian et al. [
9] conducted a detailed analysis, confirming that it is a widespread issue across various datasets and model structures. Simultaneously, researchers have proposed algorithms from diverse perspectives to enhance robust fairness, notably FRL [
10] and FAT [
14]. FRL, for instance, adjusts the weights and perturbation sizes for various classes, thereby achieving enhanced robust fairness compared to earlier adversarial training algorithms. FAT, on the other hand, introduces an additional loss term into existing adversarial training algorithms, aiming to reduce the variance among different classes. While these methods can slightly enhance robust fairness, they still fall short of achieving true fairness. Consequently, there is an urgent need for more effective algorithms.
Previous solutions (FRL [
10] and FAT [
14]) focus on hard classes, i.e., the classes with inferior robustness. By contrast, we explore enhancing robust fairness in a more fine-grained way, focusing on individual examples rather than classes. Our approach is based on the concept of “hard examples”, which refer to individual examples that are particularly hard for models to classify accurately, irrespective of their classes. This novel perspective brings us to two key challenges: 1)
how to identify hard examples? 2)
how to utilize them to improve robust fairness?
For the first question, hard classes are by definition the classes whose robustness is relatively lower than that of other classes [
10]. In other words, a model is more uncertain about them. By applying cross-entropy as a metric for assessing uncertainty, we can observe that the cross-entropy values calculated exclusively from the clean examples of each class exhibit a positive correlation with their robustness. Therefore, we can use the cross-entropy values of clean examples to identify hard classes without needing to visit adversarial examples. This observation further inspires us to explore the cross-entropy values of individual examples. Interestingly, our findings reveal that examples with larger cross-entropy values belong not only to hard classes but also to other classes. As these examples are identified by the metric used to identify hard classes, we hold that they are examples related to robust fairness. Additionally, higher cross-entropy values indicate that robust models struggle to learn these examples effectively. Therefore, we can categorize these examples as hard examples.
For the second question, hard examples are, by definition, difficult for models to learn. We explore two approaches to mitigate this learning difficulty. The first is to provide more information about the data distribution of hard classes, and the second is to directly remove the hardest examples. Owing to the limited dataset size, the first approach is effective, whereas the second is not. However, the first approach is implemented by collecting extra examples for hard classes, which is costly in practice. Considering that data augmentation can serve a similar purpose at a much smaller cost [
15–
17], we opt to apply data augmentation to these hard examples. By employing a carefully selected data augmentation method, Cutout [
15], we can enhance the robust fairness significantly.
Building on these findings, we propose Fair Adversarial Training (FairAT), a method that dynamically identifies and augments hard examples to enhance robust fairness. Extensive experiments on benchmark datasets and attacks validate the superior performance of FairAT. To summarize, our primary contributions are as follows:
● We investigate the relationship between class-wise robustness and the uncertainty of robust models about examples, indicating that hard examples with higher uncertainty could serve as more precise indicators of robust fairness.
● We discover that increasing the diversity of hard examples can improve robust fairness, leading us to propose FairAT, a more fine-grained method compared to traditional class-level solutions.
● We demonstrate the superiority of FairAT through extensive experiments on benchmark datasets and various attacks. Experimental results demonstrate that it outperforms state-of-the-art methods in terms of both overall robustness and fairness.
2 Related work
2.1 Adversarial training
Adversarial training [
6–
8,
18], widely recognized as the most effective defense against adversarial examples [
1,
19,
20], has stood the test of time [
5,
21]. It involves adversarial examples in the process of training. Following [
6], given a training set
, adversarial training for a model
can be defined as a min-max game:
where is the ground-truth label of a clean example , is an adversarial perturbation constrained by with norm, is the parameters of the model , is the loss function. Inner maximization is the process of finding adversarial examples. Outer minimization is to train a model to have the minimum adversarial risk.
Although adversarial training performs well regarding common metrics, it overly pursues the average robustness without examining the class-wise robustness. There exists a significant discrepancy in robustness among different classes, as shown in Fig.1(a), a phenomenon termed robust fairness [
10]. The least robust class could become the actual vulnerability of a robust model, i.e., the barrel principle. Besides, it may also cause fairness issues among different groups. Considering its harmful impact, this issue has drawn great attention from researchers recently [
9,
10,
14]. Tian et al. [
9] analyzed it among different datasets and adversarial training algorithms, validating that it is a common phenomenon in current adversarial training algorithms. Concurrently, Xu et al. [
10] also discovered this phenomenon and proposed an algorithm to finetune robust models with adaptive weights and perturbations, alleviating the unbalanced class-wise robustness. More recently, Ma et al. [
14] studied the relationship between robust fairness and perturbation radii, and then proposed a loss term that explicitly alleviates the unfairness by minimizing the variance of class-wise robustness. Although previous methods have improved robust fairness significantly, the discrepancy in the class-wise robustness is still severe. More importantly, the least robust class is still quite vulnerable. Therefore, there is an urgent need for more effective methods to enhance the robustness of the least robust class.
2.2 Fariness in deep learning
Fairness issues in deep learning have drawn much attention for a long time [
11–
13,
22,
23]. While machines do not experience fatigue or boredom, they are susceptible to biases in various attributes, such as gender and race, potentially leading to serious ethical issues. In the context of decision-making, fairness is the absence of any prejudice or favoritism toward an individual or group based on their inherent or acquired characteristics [
24]. In line with this definition, we define robust fairness in the context of deep learning as the absence of prejudice towards any specific class, manifesting as similar robustness across different classes. However, this issue has not attracted much attention in the field of adversarial robustness until recently. In this work, we aim to minimize the discrepancy between class-wise robustness and improve robust fairness. More importantly, we focus on the least robust class to actually reduce the vulnerability. Contrary to previous methods [
10,
14] that consider countermeasures from the class level, we explore more fine-grained countermeasures from the example level and propose an adversarial training algorithm that can train a fair and robust model from scratch.
3 Problem statement
Our objective is to design an adversarial training algorithm that not only enhances model robustness but also ensures fairness. Formally, our goal is to train a model with minimal average robust error () and minimal disparity in robustness across classes. The objective of our design is as follows:
where is the set of classes, is the robust error of class , is a very small positive value. The constraint of Eq. (2) aims to ensure that the robust error of each class closely aligns with the average robust error, thereby achieving robust fairness.
It is important to note that robust errors are associated with natural errors (
) [
7]. Following [
7], we separate robust errors into the sum of natural errors and boundary errors as:
where
is the probability that correctly classified examples from class
can be attacked. Equation (3) allows for a more explicit understanding of the source of robust errors for each class. Note that Xu et al. [
10] also provide a similar definition. However, their second term considers only
without
. Thus, their second term contains partial results of the first term, for which the second equal sign is not valid (Eq. (7) in [
10]). We have corrected this in Eq. (3), aligning it with the approach in [
7].
Inspired by Eq. (3), our objective can be reformulated as:
where and are both very small positive values, indicating that the class-wise natural/boundary error should be close to the average natural/boundary error.
Based on Eq. (4), we can decompose a robust error into a natural error and a boundary error, with which we can better analyze the source of robust errors and robust fairness. In the following, we explore and design fair adversarial training algorithms from the perspective of hard examples. Our design builds upon the well-known adversarial training algorithm TRADES [
7], an effective implementation of Eq. (1). Its loss function is as follows:
where is the cross-entropy loss, is the KL divergence loss, is a constant trading off these two loss terms.
4 Insight: from hard class to hard example
In response to the unbalanced robustness among different classes, one direct countermeasure involves adopting methods used in long-tail distribution learning problems [
25]. In long-tail distribution scenarios, some classes have fewer training examples than others, resulting in unbalanced performance across classes. While the datasets we consider, like CIFAR10, are not always unbalanced, the phenomenon of unbalanced performance is analogous. One popular method in long-tail distribution is reweighting (Baseline Reweight) [
26–
28]. Additionally, in the context of robust fairness, Tian et al. [
9] explored reducing the weights of other classes to enhance the robustness of the least robust class. Moreover, FRL proposed by Xu et al. [
10] and FAT proposed by Ma et al. [
14] can also be considered well-designed class-level reweighting algorithms for improving robust fairness. However, although these methods can improve the robust fairness compared with standard PGD Adv. Training [
6], their worst-class robust errors are still significantly larger than the average (Tab.1). Such results indicate that more effective methods for reducing worst-class robust error are urgently needed.
Tab.1 Robust errors (%) of different reweighting algorithms. Avg. Rob. refers to the average robust error. Worst. Rob. refers to the worst-class robust error |
Method | Avg. Rob. | Worst. Rob. |
PGD Adv. Training [6] | 57.00 | 83.80 |
Baseline Reweight | 57.44 | 82.90 |
Tian et al. [9] | 51.19 | 79.60 |
FRL [10] | 54.13 | 70.10 |
FAT [14] | 48.24 | 69.80 |
The previously mentioned methods primarily focus on class-level adjustments, involving modifications to the weights of different classes. Specifically, Xu et al. [
10] hold that hard classes with larger robust errors are intrinsically hard to learn, and adversarial training intensifies this phenomenon. While this explanation aligns with experimental results, it remains somewhat generalized. Manipulating a whole class may overly affect fairness-unrelated examples of this class and ignore fairness-related examples in other classes. Consequently, we posit that finer-grained measures targeting individual examples hold more promise. Inspired by the “hard example” of active learning [
29–
31], we decide to make a fine-grained inspection of training examples. We use the cross-entropy loss to measure the uncertainty of clean examples, with the insight that larger uncertainty indicates less-learned features and larger errors. The cross-entropy loss is calculated between the output logits of clean examples and their ground-truth labels for a robust model. This approach is cost-effective, as it involves testing on clean examples, whereas generating adversarial examples is more resource-intensive. Note that this loss is not directly related to robust errors, as no adversarial examples are involved in the calculation. As shown in Fig.2(a), hard classes with larger robust errors have larger cross-entropy loss values. Thus, we can utilize the cross-entropy loss calculated exclusively from clean examples to identify hard classes. Since this loss is calculated over training examples, we can reasonably consider it as a potential metric for finding hard examples.
Fig.2 Analyses of the relationship between cross-entropy loss values and (a) hard classes and (b) hard examples. (a) The class-wise robust error of the test set and the average cross-entropy loss value calculated over the training examples of each class. (b) We select the most robust classes (Class 9 and 1) and the least robust classes (Class 3 and 2). For hard examples with the largest cross-entropy loss values (from top 10% to top 90%), we plot the proportions of them belonging to these four classes |
Full size|PPT slide
Examples with larger cross-entropy loss values mostly belong to hard classes (Fig.2(b)), further validating our view that we can use the cross-entropy loss to identify hard examples. Since not all examples in a hard class are challenging, directly manipulating hard classes could negatively impact non-hard examples. Besides, since previous research demonstrates that excessive perturbations may hurt robustness [
32,
33], it is inappropriate to put strong constraints on non-hard examples. We should focus our efforts on examples that have a greater impact on fairness. Intuitively, by identifying hard examples, it is possible to design more effective algorithms than just reweighting different classes, which may decrease the robust errors of some classes but increase others’. Although hard examples are unevenly distributed across classes, they provide fair information about all classes. Focusing on hard examples may help all classes accordingly. It is important to note that we currently use a robust model to calculate the cross-entropy loss. However, in the following section, we will remove this condition to reduce the cost.
5 FairAT: strategy and optimization
In this section, drawing inspiration from hard examples, we propose two intuitive methods aimed at enhancing robust fairness. After weighing their advantages and disadvantages, we then develop a practical strategy to increase the diversity of hard examples. Through the dynamic augmentation of hard examples, we achieve improved robust fairness at a small cost. Additionally, by implementing adaptive early stopping, we can identify the optimal epoch without needing to access the test set. Our ultimate strategy operates within a self-contained training process, independent of pre-trained robust models.
5.1 Intuitive strategy
Drawing inspiration from hard examples, we posit that manipulating individual examples is more effective and propose two strategies aimed at mitigating the impact of hard examples. Firstly, we can augment hard classes by adding training examples to these classes. The motivation is that hard classes have more hard examples that are hard to learn. If we add more examples to them, more information about their distribution will be provided. Thus, their robustness is likely to be improved [
34–
37]. We extract examples from the “80 Million Tiny Images” dataset (80M-TI) [
38], which includes some offensive images; their use here is solely to demonstrate the effectiveness of this intuitive strategy. For the three least robust classes, we randomly choose some examples for each to enlarge their training sets. This approach can effectively improve robust fairness (Fig.3(a)). It can more effectively decrease the worst-class robust error (Class 3) and slightly decrease or increase other classes’ robust errors. However, collecting extra training examples is quite costly in practice. Besides, we do not know which class is not robust before training. Therefore, obtaining additional training examples specifically for hard classes would significantly elevate the training costs.
Fig.3 (a) Adding extra training examples from 80M-TI to the three least robust classes (Class 3, 2, and 4). We add 10% (500 of 5,000), 20%, and 30% examples to these classes respectively; (b) removing hard examples from the training set. We directly remove the top 10% to 90% hard examples of the whole training set |
Full size|PPT slide
Secondly, we remove the top 10% to 90% hard examples in the training set. Our insight is that training examples may become easier to learn since hard examples are removed. Thus, robust fairness may be improved. However, the worst-class robust error and average robust error are both becoming larger as the proportion of removed examples increases (Fig.3(b)). This outcome may be attributed to the fact that each CIFAR10 class contains only 5,000 training examples, a relatively small number. Removing some training examples will obviously reduce the information of its distribution. Therefore, the robustness of the trained model will be worse.
5.2 Increasing diversity by data augmentation
The previously mentioned two methods are either costly or ineffective. Inspired by the performance of adding extra training examples, we decide to design a method that achieves similar results but at a low cost. Firstly, we should consider a question: why can extra training examples improve the robustness of hard classes? Our explanation is intuitive. More examples provide more information about the data distribution of these classes, thus making the model learn more. Therefore, if we can provide more information about hard examples, we may also improve robust fairness.
Data augmentation can provide more diversity to data and more information about the data distribution [
15–
17,
39]. We consider using data augmentation as a substitute. Notice that the augmentation should be low-cost and focused on hard examples. Therefore, Cutout [
15] is a good candidate since it only needs a simple operation on individual images and does not need interaction between different examples. Other data augmentation for improving generalization like Cutmix [
17] and Mixup [
16] are inappropriate since they need interaction between examples, which may overly disturb the features. We test them and find that they indeed decrease robust fairness (detailed in Section 7).
We use Cutout to augment hard examples with the proportions from the top 5% to 100% (of the whole training set). Combined with Baseline Reweight, it can greatly decrease the worst-class robust error compared with the original TRADES (percentage 0% in Fig.4(a)). Notice that larger proportions of hard examples do not always lead to lower robust errors. The top 10% may contain enough hard examples. When we increase the proportion, the additional examples may not be hard anymore. In this case, we reach our method’s upper limit. An appropriate proportion would lead to a lower worst-class robust error and also a lower average robust error. In summary, augmenting hard examples by Cutout always leads to lower worst-class robust errors than the original TRADES, validating the effectiveness of our strategy.
Fig.4 (a) Using Cutout to augment hard examples with different proportions. 0% is the original TRADES (). 10% indicates 5,000 examples of the training set are identified as hard examples and augmented; (b) the training process of our FairAT evaluated by a hold-out validation set and the test set. We report worst-class robust errors and average robust errors by them |
Full size|PPT slide
5.3 Minimizing training cost
In the experiments above, the identification of hard examples is based on a pre-trained robust PreAct ResNet18 model, which is used to calculate cross-entropy values. However, training a robust model is very costly as it requires extensive gradient calculations to generate adversarial examples. If we still use this method to identify hard examples, our cost will be twice as much as common adversarial training. To minimize the burden of our method, it is natural to raise a question: can we identify hard examples without needing to train a robust model in advance?
Note that the accuracy and robustness are gradually increasing during adversarial training if we do not consider overfitting [
8]. In other words, a model has learned some knowledge about the data distribution since the training started. Therefore, we can use the model at the previous epoch as an auxiliary model to identify hard examples for the current epoch. For example, if the current epoch is 80, the previous epoch is 79. To validate our idea, we choose three models at epochs 10, 45, and 80 of the same run in TRADES. We use them to select the top 50% hard examples and analyze their consistency. Surprisingly, the selection of hard examples among the three models is largely consistent (Fig.5). A large overlap exists among these three results. Since the model at Epoch 10 can select hard examples similar to those selected by the model at Epoch 80 (a robust model), we can speculate that using the model at the previous epoch can play a similar role as a pre-trained robust model. Additionally, we also verify the consistency of hard examples between the model in training and a pre-trained robust model. The proportions of the same hard examples selected by models at 10, 45, and 80 epochs and the pre-trained robust model are 79.64%, 84.42%, and 91.68%, respectively. The results also support our speculation. In this way, we can save costs and utilize the training process itself to identify hard examples. Based on the above findings, we propose Fair Adversarial Training (FairAT): dynamically identifying hard examples by the model at the previous epoch and augmenting them by Cutout (Fig.6). Notice that hard examples are not equal to incorrectly classified examples. Instead, they are examples that the model does not learn well. These two concepts are different. Therefore, although the selection of hard examples among the three models is similar, these three models have different robust errors (Fig.5).
Fig.5 The bar plots show the hard examples selected by models at different epochs of the same training run: a PreAct ResNet18 model trained by TRADES (). The training example indexes are re-ordered to show contiguous blocks. The plots show significant consistency in the individual selection of hard examples across the models at different epochs |
Full size|PPT slide
Fig.6 Training the model of Epoch M for FairAT. It firstly inputs the original training dataset into the model of Epoch M-1 to calculate the cross-entropy values for each example. It then uses these values to differentiate hard examples from non-hard examples. After augmenting hard examples, it combines augmented examples and non-hard examples as the training dataset to train the model of Epoch M |
Full size|PPT slide
5.4 Identifying the best model
Considering the overfitting phenomenon and the unstable training process of adversarial training [
8], directly using the model of the final epoch may not be optimal. We plot worst-class robust errors and average robust errors evaluated by the test set in Fig.4(b). Their curves are unstable during training, especially for worst-class robust errors. The best model with the smallest errors is not at the final epoch. However, the test set should be unknown and cannot be used to test robust errors in practice. Therefore, we should find an alternative way. Inspired by early stopping [
8], we can randomly hold out 300 examples of each class from the training set to test robustness. These examples constitute a validation set that does not participate in training. Different from the original early stopping [
8] that considers average robust errors, we consider worst-class robust errors to improve robust fairness. Thus, we modify its metric to adapt to our settings as follows:
where is the worst-class robust error.
With this metric, we can obtain a model that achieves an average robust error of 47.66% and a worst-class robust error of 66.00% on the test set without looking at the test set. It is exactly the same as using the test set to find the best epoch. Besides, the validation curves during training closely match the test curves (Fig.4(b)), further validating the feasibility of using this small hold-out validation set to identify the best model.
6 Experiment
In this section, we conduct extensive experiments on benchmark datasets and attacks to demonstrate the superiority of FairAT.
6.1 Experimental settings
Datasets and models Following [
10], we conduct experiments on two benchmark datasets and two popular models to objectively compare our FairAT and baselines. Concretely, the datasets include CIAFR10 [
40] and SVHN [
41]. Note that CIFAR10 is a balanced dataset while SVHN is imbalanced. For both datasets, we test two models, including PreAct ResNet18 [
42] and WRN-28-10 [
43]. Additionally, we also consider the TinyImageNet-200 dataset [
44] to make comparisons.
Baselines We consider eight baseline methods to make a comprehensive comparison. Firstly, we present the original performance of three popular adversarial training algorithms, including PGD Adv. Training [
6] and two variants of TRADES [
7]. Besides, there are some unfairness debiasing algorithms for traditional fairness issues. We adopt a typical one (Baseline Reweight) [
27] and apply it to increase the weight of the class with the highest robust error in the training set, which is also considered as a baseline in [
10]. Moreover, we also provide the results of two state-of-the-art solutions for robust fairness: FRL [
10] and FAT [
14]. For FRL, we consider all its variants: FRL (Reweight), FRL (Remargin), and FRL (Reweight+Remargin).
Metrics Following [
10], we consider six metrics, i.e., average natural/boundary/robust errors and worst-class naturalboundary/robust errors to preliminarily analyze the source of robust fairness (defined in Section 3). We abbreviate them as Avg. Nat./Bndy./Rob. and Worst. Nat./Bndy./Rob. More importantly, to objectively compare different methods, we consider four popular attacks to evaluate robust errors: FGSM [
2], PGD [
6], C&W [
3], and AutoAttack [
4].
Implementation details For our FairAT, our implementation is based on TRADES [
7]. We set the initial learning rate = 0.1 and then decay it by 0.1 at epochs 75, 90, and 100. The training budget is 105 epochs. During training, we use PGD-10 (10 steps) with perturbation size
and step size =
. The proportion of hard examples is set to 30% by default. For all baseline methods, we always adopt their default settings in the corresponding papers. The evaluation is conducted on four attacks with perturbation size
. Experiments are conducted on GeForce RTX 3090.
6.2 How does FairAT work
FairAT achieves a lower worst-class robust error than all baselines (Tab.2). According to Eq. (3), robust errors can be decomposed into natural errors and boundary errors. FairAT does not have the lowest worst-class natural error. In fact, its worst-class natural error is 6.60% larger than that of FRL (Reweight+Remargin). Its reduction in worst-class robust errors comes from boundary errors. Compared with FRL (Reweight+Remargin), it significantly reduces the worst-class boundary error by 10.70%. Such a reduction means that FairAT better prevents those correctly classified natural examples from being misclassified when combined with adversarial perturbations. Therefore, FairAT improves robust fairness by improving the robustness of those correctly classified natural examples instead of reducing natural errors.
Tab.2 Average & worst-class natural error, boundary error, and robust error for various algorithms against PreAct ResNet18 on CIFAR10. Robust errors are evaluated by PGD-20. The best results are in bold |
Method | Avg. Nat. | Worst. Nat. | Avg. Bndy. | Worst. Bndy. | Avg. Rob. | Worst. Rob. |
PGD Adv. Training | 15.52 | 32.20 | 41.48 | 51.60 | 57.00 | 83.80 |
TRADES () | 12.49 | 27.50 | 43.44 | 58.30 | 56.93 | 85.80 |
TRADES () | 19.38 | 37.20 | 29.26 | 40.30 | 48.64 | 77.50 |
Baseline Reweight | 15.21 | 30.30 | 42.23 | 52.60 | 57.44 | 82.90 |
FRL (Reweight) | 16.20 | 28.80 | 38.06 | 44.50 | 54.26 | 73.30 |
FRL (Remargin) | 15.61 | 27.40 | 37.48 | 48.90 | 53.09 | 76.30 |
FRL (Reweight+Remargin) | 16.68 | 26.20 | 37.45 | 43.90 | 54.13 | 70.10 |
FAT | 20.08 | 33.70 | 28.16 | 36.10 | 48.24 | 69.80 |
FairAT (Ours) | 19.16 | 32.80 | 28.50 | 33.20 | 47.66 | 66.00 |
6.3 Robust fairness on CIFAR10
In this subsection, we evaluate FairAT and baselines with four popular attacks on CIFAR10. The experiments are conducted on PreAct ResNet18 and WRN-28-10, corresponding small models and large models in the settings of adversarial training.
PreAct ResNet18 Firstly, popular adversarial training algorithms such as PGD Adv. Training and TRADES have higher worst-class robust errors (Tab.3). It makes sense since they do not consider robust fairness. Besides, although Baseline Reweight has a lower worst-class natural error than that of TRADES (), its worst-class robust error is higher. Thus, this reweighting method from traditional fairness does not perform well regarding robust fairness.
Tab.3 Average & worst-class natural error and robust error for various algorithms against PreAct ResNet18 on CIFAR10. Robust errors are evaluated by four popular attacks, including FGSM, PGD-20, C&W, and AutoAttack. The best results are in bold |
Method | Natural | | FGSM | | PGD | | C&W | | AutoAttack |
Avg. | Worst. | | Avg. | Worst. | | Avg. | Worst. | | Avg. | Worst. | | Avg. | Worst. |
PGD Adv. Training | 15.52 | 32.20 | | 46.40 | 74.50 | | 57.00 | 83.80 | | 56.19 | 82.60 | | 58.67 | 85.50 |
TRADES () | 12.49 | 27.50 | | 47.00 | 75.90 | | 56.93 | 85.80 | | 57.50 | 86.30 | | 59.36 | 88.20 |
TRADES () | 19.38 | 37.20 | | 44.00 | 72.10 | | 48.64 | 77.50 | | 51.35 | 81.40 | | 52.27 | 82.70 |
Baseline Reweight | 15.21 | 30.30 | | 47.02 | 74.50 | | 57.44 | 82.90 | | 56.75 | 82.10 | | 59.30 | 85.00 |
FRL (Reweight) | 16.20 | 28.80 | | 46.34 | 65.10 | | 54.26 | 73.30 | | 54.82 | 74.30 | | 56.56 | 77.50 |
FRL (Remargin) | 15.61 | 27.40 | | 45.40 | 67.50 | | 53.09 | 76.30 | | 53.64 | 76.90 | | 55.36 | 79.90 |
FRL (Reweight+Remargin) | 16.68 | 26.20 | | 46.82 | 62.10 | | 54.13 | 70.10 | | 54.98 | 71.80 | | 56.74 | 74.60 |
FAT | 20.08 | 33.70 | | 43.81 | 64.80 | | 48.24 | 69.80 | | 51.13 | 76.20 | | 52.33 | 77.90 |
FairAT (Ours) | 19.16 | 32.80 | | 42.32 | 61.40 | | 47.66 | 66.00 | | 50.18 | 71.60 | | 51.33 | 73.30 |
For methods designed for robust fairness, FRL (Reweight+Remargin) achieves lower worst-class robust errors against these four attacks than the other two variants of FRL. Besides, its worst-class robust error is 8.10% lower than that of TRADES () when evaluated against AutoAttack, significantly improving the robustness of the least robust class. Unfortunately, its average robust error is 4.47% higher than that of TRADES (). Therefore, it is likely that FRL sacrifices the average robustness to improve the worst-class robustness. For FAT, although it is a newer work than FRL, its worst-class robust error is higher than that of FRL (Reweight+Remargin). The reason might be that it focuses on minimizing the variance of class-wise robust errors while overlooking the worst-class robust error.
FairAT always achieves the lowest worst-class robust error. When faced with PGD, its worst-class robust error is 3.80% lower than the second-best one. Such a reduction is significant considering the difficulty in improving the robustness of adversarial training algorithms [
4,
45]. Note that FairAT also achieves lower average robust errors than all baselines. In other words, FairAT does not sacrifice the average robustness to improve the worst-class robustness. Moreover, the robust errors of each class are all lower when compared with FRL (Reweight+Remargin) (Fig.7(a)). Most classes of FairAT have lower robust errors than those of FAT. Therefore, FairAT is indeed more robust and fairer.
Fig.7 Comparison of different adversarial training algorithms with regard to class-wise robust errors. The robust errors are evaluated by PGD-20. FRL is FRL (Reweight+Remargin). PGD AT. is PGD Adv. Training. (a) PreAct ResNet18 on CIFAR10; (b) WRN-28-10 on CIFAR10; (c) PreAct ResNet18 on SVHN; (d) WRN-28-10 on SVHN |
Full size|PPT slide
WRN-28-10 For this large model, the trend is similar to that of PreAct ResNet18, but the improvement of FairAT is more significant (Tab.4). On the one hand, FairAT always achieves a larger reduction in terms of average robust errors. For example, the reduction is 1.00% for PreAct ResNet18 but 2.13% for WRN-28-10 when against AutoAttack. On the other hand, FairAT achieves a larger reduction against strong attacks like C&W and AutoAttack in terms of worst-class robust errors. For example, the reduction is 1.30% for PreAct ResNet18 but 3.50% for WRN-28-10 when against AutoAttack. Furthermore, its natural errors are closer to those of FRL (Reweight+Remargin) for this model than for PreAct ResNet18. Such phenomena indicate that FairAT could better demonstrate its superiority when evaluated with large models on CIFAR10. Note that the natural errors of FairAT are not the lowest. However, when considering adversarial attacks, robust errors are the actual errors. FairAT is more secure and fairer in this case.
Tab.4 Average & worst-class natural error and robust error for various algorithms against WRN-28-10 on CIFAR10 |
Method | Natural | | FGSM | | PGD | | C&W | | AutoAttack |
Avg. | Worst. | | Avg. | Worst. | | Avg. | Worst. | | Avg. | Worst. | | Avg. | Worst. |
PGD Adv. Training | 13.37 | 26.10 | | 42.49 | 68.70 | | 53.00 | 79.00 | | 52.39 | 78.40 | | 54.39 | 80.30 |
TRADES () | 11.15 | 21.40 | | 40.75 | 66.40 | | 51.74 | 76.90 | | 51.00 | 76.50 | | 53.23 | 78.60 |
TRADES () | 13.78 | 26.20 | | 38.43 | 62.60 | | 47.04 | 72.50 | | 47.17 | 72.20 | | 49.24 | 74.80 |
Baseline Reweight | 13.28 | 25.40 | | 42.58 | 67.20 | | 53.08 | 78.00 | | 52.47 | 77.60 | | 54.42 | 79.20 |
FRL (Reweight) | 14.55 | 27.90 | | 42.37 | 65.70 | | 53.40 | 76.80 | | 52.42 | 75.50 | | 55.07 | 78.60 |
FRL (Remargin) | 15.59 | 24.40 | | 48.26 | 62.30 | | 56.36 | 71.10 | | 58.75 | 75.20 | | 60.18 | 77.30 |
FRL (Reweight+Remargin) | 15.66 | 30.10 | | 45.09 | 63.90 | | 51.77 | 67.70 | | 53.79 | 70.10 | | 55.17 | 71.30 |
FAT | 15.90 | 30.90 | | 40.24 | 58.30 | | 45.53 | 64.10 | | 47.94 | 68.90 | | 49.55 | 72.30 |
FairAT (Ours) | 15.35 | 28.30 | | 38.10 | 58.00 | | 44.59 | 63.20 | | 46.02 | 65.70 | | 47.42 | 67.80 |
6.4 Robust fairness on SVHN and TinyImageNet-200
In this subsection, we evaluate FairAT and baselines with four popular attacks on SVHN. The experiments are conducted on PreAct ResNet18 and WRN-28-10 with the same settings as CIFAR10.
PreAct ResNet18 on SVHN FairAT achieves surprisingly better results with this model on SVHN (Tab.5). Firstly, its reductions in average and worst-class robust errors are both more significant than those of the same settings on CIFAR10. For example, when evaluated against AutoAttack, the reductions in average and worst-class robust errors are 5.48% and 2.23%, respectively. These values are 1.00% and 1.30% with the same settings on CIFAR10 (Tab.3). For the other three attacks, the trend is similar. Such reductions are non-trivial. The reasons are two-fold: FairAT has only a small increase in the worst-class natural error but a large decrease in the worst-class boundary error. Secondly, the gap between FairAT and the best baseline regarding natural errors is also smaller than that of the same settings on CIFAR10. The gap is about 2% here but about 6% for the latter.
Tab.5 Average & worst-class natural error and robust error for various algorithms against PreAct ResNet18 on SVHN |
Method | Natural | | FGSM | | PGD | | C&W | | AutoAttack |
Avg. | Worst. | | Avg. | Worst. | | Avg. | Worst. | | Avg. | Worst. | | Avg. | Worst. |
PGD Adv. Training | 9.16 | 23.86 | | 33.68 | 63.49 | | 46.47 | 74.28 | | 49.24 | 74.10 | | 53.33 | 78.43 |
TRADES () | 7.59 | 17.59 | | 33.12 | 57.47 | | 48.21 | 72.89 | | 49.38 | 72.89 | | 53.63 | 76.02 |
TRADES () | 9.07 | 19.34 | | 32.44 | 55.48 | | 44.37 | 69.52 | | 47.21 | 71.39 | | 51.92 | 76.20 |
Baseline Reweight | 9.35 | 18.13 | | 34.15 | 58.61 | | 48.25 | 72.17 | | 49.61 | 72.41 | | 54.28 | 75.54 |
FRL (Reweight) | 9.23 | 16.36 | | 33.65 | 45.96 | | 46.67 | 70.06 | | 46.96 | 69.76 | | 59.33 | 76.33 |
FRL (Remargin) | 9.61 | 15.71 | | 32.25 | 42.76 | | 46.87 | 67.89 | | 46.54 | 67.29 | | 62.54 | 76.33 |
FRL (Reweight+Remargin) | 9.64 | 15.73 | | 32.33 | 44.54 | | 44.60 | 65.18 | | 44.31 | 64.58 | | 64.19 | 75.90 |
FAT | 10.21 | 19.81 | | 33.39 | 46.64 | | 42.83 | 57.26 | | 48.51 | 62.22 | | 52.40 | 65.00 |
FairAT (Ours) | 9.63 | 17.14 | | 26.69 | 40.31 | | 38.92 | 53.23 | | 42.89 | 56.75 | | 46.92 | 62.77 |
WRN-28-10 on SVHN For this model, although FairAT always achieves the lowest average and worst-class robust errors (Tab.6), its reduction compared with the second-best method is not as significant as that of PreAct ResNet18. This phenomenon is different from that of CIFAR10. We speculate that the reason may be that SVHN only contains digit images and is easier to learn than CIFAR10. Therefore, the performance of PreAct ResNet18 is already good, and there is little room for improvement.
Based on the results above, we can preliminarily conclude that FairAT is more robust and fairer on both balanced datasets (CIFAR10) and unbalanced datasets (SVHN).
PreAct ResNet18 on TinyImageNet-200 The conclusion on TinyImageNet-200 is similar to that on CIFAR10 and SVHN (Tab.7). Our FairAT always achieves the lowest average and worst-class robust errors. At the same time, FairAT also achieves comparable natural errors to the lowest ones. For example, its worst-class natural is just 0.35% higher than that of FRL (Reweight+Remargin). This value is marginal, especially when considering the error is still far lower than the upper bound (100%).
Tab.6 Average & worst-class natural error and robust error for various algorithms against WRN-28-10 on SVHN |
Method | Natural | | FGSM | | PGD | | C&W | | AutoAttack |
Avg. | Worst. | | Avg. | Worst. | | Avg. | Worst. | | Avg. | Worst. | | Avg. | Worst. |
PGD Adv. Training | 7.70 | 16.39 | | 33.84 | 55.12 | | 46.96 | 71.27 | | 49.56 | 73.49 | | 54.11 | 77.29 |
TRADES () | 8.97 | 20.36 | | 36.76 | 63.49 | | 48.57 | 72.89 | | 50.90 | 74.46 | | 55.30 | 76.75 |
TRADES () | 10.64 | 21.74 | | 19.34 | 41.02 | | 43.72 | 71.20 | | 46.97 | 73.92 | | 54.90 | 78.61 |
Baseline Reweight | 7.19 | 13.92 | | 37.61 | 59.52 | | 50.21 | 72.29 | | 51.70 | 73.43 | | 55.50 | 75.60 |
FRL (Reweight) | 8.08 | 14.53 | | 24.09 | 40.06 | | 54.38 | 71.08 | | 52.93 | 69.76 | | 63.67 | 76.39 |
FRL (Remargin) | 7.78 | 13.26 | | 22.64 | 35.72 | | 49.21 | 66.02 | | 47.88 | 64.88 | | 59.18 | 71.81 |
FRL (Reweight+Remargin) | 7.85 | 15.18 | | 24.93 | 36.45 | | 47.41 | 63.86 | | 46.27 | 63.07 | | 55.54 | 69.58 |
FAT | 7.28 | 12.48 | | 22.75 | 35.00 | | 38.01 | 52.17 | | 41.73 | 54.32 | | 51.86 | 63.67 |
FairAT (Ours) | 7.38 | 14.82 | | 17.08 | 31.12 | | 34.84 | 51.04 | | 37.79 | 53.64 | | 49.36 | 63.13 |
Tab.7 Average & worst-class natural error and robust error for various algorithms against PreAct ResNet18 on TinyImageNet-200 |
Method | Natural | | FGSM | | PGD | | C&W | | AutoAttack |
Avg. | Worst. | | Avg. | Worst. | | Avg. | Worst. | | Avg. | Worst. | | Avg. | Worst. |
PGD Adv. Training | 59.21 | 83.40 | | 81.92 | 97.50 | | 83.71 | 98.60 | | 88.34 | 99.00 | | 88.92 | 99.25 |
TRADES () | 58.53 | 82.75 | | 82.43 | 97.70 | | 83.27 | 98.65 | | 87.11 | 99.05 | | 87.63 | 99.20 |
TRADES () | 59.42 | 83.15 | | 80.20 | 96.90 | | 81.82 | 97.50 | | 86.10 | 98.90 | | 86.55 | 99.00 |
Baseline Reweight | 59.13 | 82.90 | | 82.08 | 96.70 | | 83.96 | 98.35 | | 88.45 | 99.10 | | 88.73 | 99.15 |
FRL (Reweight) | 59.62 | 82.75 | | 81.52 | 90.35 | | 82.34 | 96.85 | | 87.02 | 98.70 | | 87.56 | 98.95 |
FRL (Remargin) | 59.58 | 82.60 | | 81.19 | 91.60 | | 82.71 | 97.20 | | 86.33 | 98.75 | | 87.34 | 99.00 |
FRL (Reweight+Remargin) | 59.78 | 82.30 | | 81.53 | 89.90 | | 82.06 | 96.45 | | 86.49 | 98.40 | | 87.49 | 98.85 |
FAT | 59.93 | 83.80 | | 79.13 | 90.20 | | 80.38 | 96.30 | | 84.76 | 98.90 | | 85.40 | 99.10 |
FairAT (Ours) | 58.97 | 82.65 | | 78.00 | 85.65 | | 79.71 | 95.90 | | 83.52 | 98.05 | | 84.15 | 98.30 |
6.5 Ablation study
Identification of hard examples In our FairAT, we use cross-entropy values between the output logits of clean examples and their ground-truth labels to measure the uncertainty. Larger cross-entropy values indicate harder examples. To demonstrate the effectiveness of our design, we also consider three different metrics to identify hard examples. Concretely, we consider three popular methods: Margin [
41], LeastConf [
46], and Maximum Entropy [
47]. Margin selects examples whose two most likely labels in the confidence vectors have smaller differences. LeastConf selects examples whose labels in the confidence vectors have smaller confidence. Maximum Entropy selects examples whose entropy (instead of cross-entropy) of output logits is larger. As shown in Tab.8, our default setting (Cross Entropy) achieves the lowest average and worst-class robust errors at the same time. The improvement is more significant for average robust errors. Therefore, Cross Entropy is a better choice.
Tab.8 Comparison of different metrics about hard examples for PreAct ResNet18 on CIFAR10. The robust errors are evaluated by PGD-20 |
Method | Avg. Rob. | Worst. Rob. |
Margin [40] | 49.11 | 67.00 |
LeastConf [45] | 48.83 | 66.80 |
Maximum Entropy [46] | 47.93 | 66.10 |
Cross Entropy (Default) | 47.66 | 66.00 |
Proportions of Cutout In the above experiments, we set 30% as the default proportion of hard examples. To test the rationality of this setting, we test different proportions from the top 5% to 100% in Fig.8. The proportion of 30% achieves the lowest average robust/worst-class robust errors for PreAct ResNet18 on CIFAR10. Although 30% is not always the best proportion for the other three experiments, its performance is still comparable to that of the best proportion. Besides, the range of effective proportions is wide. In fact, proportions of 10% to 100% all achieve satisfactory results. This is because the top 10% may contain enough hard examples. Larger proportions have little impact since the additional examples might not be hard. Therefore, even if we do not know the best setting, just a moderate value can work well.
Fig.8 Analyses of the effect of the proportions of hard examples on FairAT. We test different proportions from 5% to 100% on two datasets and two models. The proportion of 0% corresponds to the original TRADES (). The robust errors are evaluated by PGD-20. (a) PreAct ResNet18 on CIFAR10; (b) WRN-28-10 on CIFAR10; (c) PreAct ResNet18 on SVHN; (d) WRN-28-10 on SVHN |
Full size|PPT slide
7 Discussion
In this section, we discuss three interesting questions about FairAT to deepen the understanding of our work.
The choice of data augmentation methods Audiences may wonder why we choose Cutout [
15] instead of other well-performing data augmentation methods. For example, Cutmix [
17] and Mixup [
16] are both popular and well-performing data augmentation methods. The reason why we do not consider them is that they need interaction between examples. Since we differentiate hard examples from other examples, the interaction is hard to define. And they may overly disturb the features of hard examples by inserting the features of other examples. We have tried two kinds of interaction: interaction between a hard example and a non-hard example or between a hard example and another hard example. Unfortunately, they both increase the worst-class robust error by more than 2% compared with TRADES (
). Therefore, they are not good solutions for robust fairness. Note that our focus is a more fine-grained method for improving robust fairness from the perspective of hard examples. Therefore, although there are lots of data augmentation methods, we have not traversed them. Instead, we choose Cutout [
15] as it can already fulfill our requirements about diversity.
Absolute fairness vs. high worst-class robustness In the pursuit of true robustness in reality, which objective is more practical? In our humble view, the latter better describes the real robustness. For example, if the robust error of each class is always 100%, absolute fairness is achieved. However, the model loses robustness completely. In reality, the weakest part is more likely to be attacked. In other words, the least robust class could become the attackers’ target, and its robustness is the true robustness in this case. Therefore, we hold that improving the worst-class robustness is the core problem of robust fairness and focus on this metric in Section 6.
Achieving both high fairness and high average robustness After reading the above content, a natural question emerges: is it possible to achieve high fairness and high average robustness at the same time? That is, a model has state-of-the-art robustness for the whole dataset and equal robustness for each class. To the best of our knowledge, the answer may be negative. Previous research has demonstrated that the higher robust errors of hard classes are because of the similarity between these classes [
9,
10]. Considering the limited number of training examples for many datasets like CIFAR10, it is quite difficult to effectively differentiate these hard classes by a vanilla deep learning model. The robust errors of hard classes will be higher than those of non-hard classes. Therefore, without extra training examples, it is difficult to achieve high average robustness similar to the state-of-the-art methods [
6,
7] and high fairness at the same time.
Broader impacts Adversarial training has become one of the most reliable empirical defenses against adversarial attacks. However, its robust fairness issue significantly impacts its fundamental robustness, which would cause the barrel principle and ethical issues. For the barrel principle, less robust classes are serious vulnerabilities. For example, if a deep learning model in autonomous driving exhibits high average robustness but lacks robustness to pedestrians, it could pose a danger to them. For ethical issues, different people or groups are protected with various levels of robustness. Considering the pursuit of equality in today’s society, people who have less or no protection are clearly being discriminated against. Such a situation should be avoided. In this work, we propose FairAT to alleviate this problem, enhancing the applicability of adversarially robust models in reality.
8 Conclusion
In this paper, we explore more fine-grained methods to enhance robust fairness from the perspective of hard examples. First, through an analysis of the relationship between class-wise robustness and the uncertainty of robust models regarding individual examples, we find that hard examples with greater uncertainty could serve as more precise indicators of robust fairness. Subsequently, we find that enhancing the diversity of hard examples, achievable through data augmentation, can improve robust fairness. Building on these insights, we propose Fair Adversarial Training (FairAT): dynamically identifying hard examples and augmenting them to provide more distribution information. Experimental results demonstrate that FairAT outperforms state-of-the-art methods in terms of both average robust errors and worst-class robust errors.
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}