Fairness is essential for robustness: fair adversarial training by identifying and augmenting hard examples

Ningping MOU; Xinli YUE; Lingchen ZHAO; Qian WANG

doi:10.1007/s11704-024-3587-1

PDF(5214 KB)

Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (3) : 193803. DOI: 10.1007/s11704-024-3587-1

Excellent Young Computer Scientists Forum

RESEARCH ARTICLE

Fairness is essential for robustness: fair adversarial training by identifying and augmenting hard examples

Author information +

History +

Abstract

Adversarial training has been widely considered the most effective defense against adversarial attacks. However, recent studies have demonstrated that a large discrepancy exists in the class-wise robustness of adversarial training, leading to two potential issues: firstly, the overall robustness of a model is compromised due to the weakest class; and secondly, ethical concerns arising from unequal protection and biases, where certain societal demographic groups receive less robustness in defense mechanisms. Despite these issues, solutions to address the discrepancy remain largely underexplored. In this paper, we advance beyond existing methods that focus on class-level solutions. Our investigation reveals that hard examples, identified by higher cross-entropy values, can provide more fine-grained information about the discrepancy. Furthermore, we find that enhancing the diversity of hard examples can effectively reduce the robustness gap between classes. Motivated by these observations, we propose Fair Adversarial Training (FairAT) to mitigate the discrepancy of class-wise robustness. Extensive experiments on various benchmark datasets and adversarial attacks demonstrate that FairAT outperforms state-of-the-art methods in terms of both overall robustness and fairness. For a WRN-28-10 model trained on CIFAR10, FairAT improves the average and worst-class robustness by 2.13% and 4.50%, respectively.

Graphical abstract

Keywords

robust fairness / adversarial training / hard example / data augmentation

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Ningping MOU, Xinli YUE, Lingchen ZHAO, Qian WANG. Fairness is essential for robustness: fair adversarial training by identifying and augmenting hard examples. Front. Comput. Sci., 2025, 19(3): 193803 https://doi.org/10.1007/s11704-024-3587-1

1 Introduction

Deep learning models exhibit high vulnerability to adversarial examples, potentially leading to severe security issues [1–3]. Among various defenses, adversarial training is the most effective one that stands the test of time [4,5]. Incorporating adversarial examples into the training process can significantly enhance model robustness. However, existing adversarial training algorithms [6–8] mainly focus on the average (overall) robustness of all classes without considering class-wise robustness. Recent research has demonstrated that adversarially trained models exhibit unbalanced class-wise robustness across various settings, even in balanced datasets [9,10]. For instance, a robust model may demonstrate a robust error rate of 35.3% for class 1, in contrast to 83.8% for class 3 (Fig.1(a)). This unfair phenomenon, termed robust fairness [10], poses two serious problems:

Fig.1 Demonstration of unfairness in adversarial training with a robust PreAct ResNet18 on CIFAR10. We show the class-wise/average robust errors of (a) PGD Adv. Training [6] and (b) our FairAT. Larger deviations in class-wise robustness relative to average robustness indicate more serious unfairness. The robust errors are evaluated by PGD-20 attack ( $ℓ_{\infty} = 8 / 255$ ) following [10]

Full size|PPT slide

● The barrel principle: The least robust class might become the primary target for attackers, as it represents the model’s weakest link. In this case, its robustness could be regarded as the actual robustness of the model. Given its high error rate, for example, 83.8%, the adversarially trained model is substantially vulnerable.

● Ethical concerns: Unfairness in deep learning can lead to serious social problems [11–13]. Unbalanced class-wise robustness may also raise ethical concerns among different demographic groups, such as low-income populations, due to the differing protection levels afforded to them.

Therefore, unfair robustness is critical, potentially compromising the ethical and practical efficacy of adversarial training in real-world applications. Considering its harmful impact, researchers have recently begun to study this issue. Tian et al. [9] conducted a detailed analysis, confirming that it is a widespread issue across various datasets and model structures. Simultaneously, researchers have proposed algorithms from diverse perspectives to enhance robust fairness, notably FRL [10] and FAT [14]. FRL, for instance, adjusts the weights and perturbation sizes for various classes, thereby achieving enhanced robust fairness compared to earlier adversarial training algorithms. FAT, on the other hand, introduces an additional loss term into existing adversarial training algorithms, aiming to reduce the variance among different classes. While these methods can slightly enhance robust fairness, they still fall short of achieving true fairness. Consequently, there is an urgent need for more effective algorithms.

Previous solutions (FRL [10] and FAT [14]) focus on hard classes, i.e., the classes with inferior robustness. By contrast, we explore enhancing robust fairness in a more fine-grained way, focusing on individual examples rather than classes. Our approach is based on the concept of “hard examples”, which refer to individual examples that are particularly hard for models to classify accurately, irrespective of their classes. This novel perspective brings us to two key challenges: 1) how to identify hard examples? 2) how to utilize them to improve robust fairness?

For the first question, hard classes are by definition the classes whose robustness is relatively lower than that of other classes [10]. In other words, a model is more uncertain about them. By applying cross-entropy as a metric for assessing uncertainty, we can observe that the cross-entropy values calculated exclusively from the clean examples of each class exhibit a positive correlation with their robustness. Therefore, we can use the cross-entropy values of clean examples to identify hard classes without needing to visit adversarial examples. This observation further inspires us to explore the cross-entropy values of individual examples. Interestingly, our findings reveal that examples with larger cross-entropy values belong not only to hard classes but also to other classes. As these examples are identified by the metric used to identify hard classes, we hold that they are examples related to robust fairness. Additionally, higher cross-entropy values indicate that robust models struggle to learn these examples effectively. Therefore, we can categorize these examples as hard examples.

For the second question, hard examples are, by definition, difficult for models to learn. We explore two approaches to mitigate this learning difficulty. The first is to provide more information about the data distribution of hard classes, and the second is to directly remove the hardest examples. Owing to the limited dataset size, the first approach is effective, whereas the second is not. However, the first approach is implemented by collecting extra examples for hard classes, which is costly in practice. Considering that data augmentation can serve a similar purpose at a much smaller cost [15–17], we opt to apply data augmentation to these hard examples. By employing a carefully selected data augmentation method, Cutout [15], we can enhance the robust fairness significantly.

Building on these findings, we propose Fair Adversarial Training (FairAT), a method that dynamically identifies and augments hard examples to enhance robust fairness. Extensive experiments on benchmark datasets and attacks validate the superior performance of FairAT. To summarize, our primary contributions are as follows:

● We investigate the relationship between class-wise robustness and the uncertainty of robust models about examples, indicating that hard examples with higher uncertainty could serve as more precise indicators of robust fairness.

● We discover that increasing the diversity of hard examples can improve robust fairness, leading us to propose FairAT, a more fine-grained method compared to traditional class-level solutions.

● We demonstrate the superiority of FairAT through extensive experiments on benchmark datasets and various attacks. Experimental results demonstrate that it outperforms state-of-the-art methods in terms of both overall robustness and fairness.

2 Related work

2.1 Adversarial training

Adversarial training [6–8,18], widely recognized as the most effective defense against adversarial examples [1,19,20], has stood the test of time [5,21]. It involves adversarial examples in the process of training. Following [6], given a training set

D

, adversarial training for a model

f

can be defined as a min-max game:

(1)

\begin{array}{l} \arg min_{θ} E_{(x, y) \sim D} [max_{‖ δ ‖_{p} \leq ϵ} L (f (x + δ; θ), y)], \end{array}

where

y

is the ground-truth label of a clean example

x

δ

is an adversarial perturbation constrained by

ϵ

with

ℓ_{p}

norm,

θ

is the parameters of the model

f

L

is the loss function. Inner maximization is the process of finding adversarial examples. Outer minimization is to train a model to have the minimum adversarial risk.

Although adversarial training performs well regarding common metrics, it overly pursues the average robustness without examining the class-wise robustness. There exists a significant discrepancy in robustness among different classes, as shown in Fig.1(a), a phenomenon termed robust fairness [10]. The least robust class could become the actual vulnerability of a robust model, i.e., the barrel principle. Besides, it may also cause fairness issues among different groups. Considering its harmful impact, this issue has drawn great attention from researchers recently [9,10,14]. Tian et al. [9] analyzed it among different datasets and adversarial training algorithms, validating that it is a common phenomenon in current adversarial training algorithms. Concurrently, Xu et al. [10] also discovered this phenomenon and proposed an algorithm to finetune robust models with adaptive weights and perturbations, alleviating the unbalanced class-wise robustness. More recently, Ma et al. [14] studied the relationship between robust fairness and perturbation radii, and then proposed a loss term that explicitly alleviates the unfairness by minimizing the variance of class-wise robustness. Although previous methods have improved robust fairness significantly, the discrepancy in the class-wise robustness is still severe. More importantly, the least robust class is still quite vulnerable. Therefore, there is an urgent need for more effective methods to enhance the robustness of the least robust class.

2.2 Fariness in deep learning

Fairness issues in deep learning have drawn much attention for a long time [11–13,22,23]. While machines do not experience fatigue or boredom, they are susceptible to biases in various attributes, such as gender and race, potentially leading to serious ethical issues. In the context of decision-making, fairness is the absence of any prejudice or favoritism toward an individual or group based on their inherent or acquired characteristics [24]. In line with this definition, we define robust fairness in the context of deep learning as the absence of prejudice towards any specific class, manifesting as similar robustness across different classes. However, this issue has not attracted much attention in the field of adversarial robustness until recently. In this work, we aim to minimize the discrepancy between class-wise robustness and improve robust fairness. More importantly, we focus on the least robust class to actually reduce the vulnerability. Contrary to previous methods [10,14] that consider countermeasures from the class level, we explore more fine-grained countermeasures from the example level and propose an adversarial training algorithm that can train a fair and robust model from scratch.

3 Problem statement

Our objective is to design an adversarial training algorithm that not only enhances model robustness but also ensures fairness. Formally, our goal is to train a model with minimal average robust error (

R_{r o b} (f)

) and minimal disparity in robustness across classes. The objective of our design is as follows:

(2)

\begin{matrix} \underset{f}{m i n i m i z e} R_{r o b} (f) \\ s . t . - τ \leq R_{r o b} (f, i) - R_{r o b} (f) \leq τ f o r e a c h i \in Y, \end{matrix}

where

Y

is the set of classes,

R_{r o b} (f, i)

is the robust error of class

i

τ

is a very small positive value. The constraint of Eq. (2) aims to ensure that the robust error of each class closely aligns with the average robust error, thereby achieving robust fairness.

It is important to note that robust errors are associated with natural errors (

R_{n a t} (f)

) [7]. Following [7], we separate robust errors into the sum of natural errors and boundary errors as:

(3)

\begin{aligned} R_{r o b} (f, i) \\ = & P r . {\exists δ, f (x + δ) \neq y ∣ y = i} \\ = & P r . {f (x) \neq y ∣ y = i} + \\ P r . {\exists δ, f (x + δ) \neq f (x) ∣ y = i, f (x) = y} \\ = & R_{n a t} (f, i) + R_{b n d y} (f, i), \end{aligned}

where

R_{b n d y} (f, i)

is the probability that correctly classified examples from class

i

can be attacked. Equation (3) allows for a more explicit understanding of the source of robust errors for each class. Note that Xu et al. [10] also provide a similar definition. However, their second term considers only

y = i

without

f (x) = y

. Thus, their second term contains partial results of the first term, for which the second equal sign is not valid (Eq. (7) in [10]). We have corrected this in Eq. (3), aligning it with the approach in [7].

Inspired by Eq. (3), our objective can be reformulated as:

(4)

\begin{aligned} \underset{f}{m i n i m i z e} R_{n a t} (f) + R_{b n d y} (f) \\ s . t . {\begin{cases} - τ_{1} \leq R_{n a t} (f, i) - R_{n a t} (f) \leq τ_{1} \\ - τ_{2} \leq R_{b n d y} (f, i) - R_{b n d y} (f) \leq τ_{2} f o r e a c h i \in Y, \end{cases} \end{aligned}

where

τ_{1}

and

τ_{2}

are both very small positive values, indicating that the class-wise natural/boundary error should be close to the average natural/boundary error.

Based on Eq. (4), we can decompose a robust error into a natural error and a boundary error, with which we can better analyze the source of robust errors and robust fairness. In the following, we explore and design fair adversarial training algorithms from the perspective of hard examples. Our design builds upon the well-known adversarial training algorithm TRADES [7], an effective implementation of Eq. (1). Its loss function is as follows:

(5)

\begin{array}{l} L_{T R A D E S} = C E (f (x; θ), y) + λ \cdot K L (f (x; θ) ‖ f (x + δ; θ)), \end{array}

where

C E

is the cross-entropy loss,

K L

is the KL divergence loss,

λ

is a constant trading off these two loss terms.

4 Insight: from hard class to hard example

In response to the unbalanced robustness among different classes, one direct countermeasure involves adopting methods used in long-tail distribution learning problems [25]. In long-tail distribution scenarios, some classes have fewer training examples than others, resulting in unbalanced performance across classes. While the datasets we consider, like CIFAR10, are not always unbalanced, the phenomenon of unbalanced performance is analogous. One popular method in long-tail distribution is reweighting (Baseline Reweight) [26–28]. Additionally, in the context of robust fairness, Tian et al. [9] explored reducing the weights of other classes to enhance the robustness of the least robust class. Moreover, FRL proposed by Xu et al. [10] and FAT proposed by Ma et al. [14] can also be considered well-designed class-level reweighting algorithms for improving robust fairness. However, although these methods can improve the robust fairness compared with standard PGD Adv. Training [6], their worst-class robust errors are still significantly larger than the average (Tab.1). Such results indicate that more effective methods for reducing worst-class robust error are urgently needed.

Tab.1 Robust errors (%) of different reweighting algorithms. Avg. Rob. refers to the average robust error. Worst. Rob. refers to the worst-class robust error

Method	Avg. Rob.	Worst. Rob.
PGD Adv. Training [6]	57.00	83.80
Baseline Reweight	57.44	82.90
Tian et al. [9]	51.19	79.60
FRL [10]	54.13	70.10
FAT [14]	48.24	69.80

The previously mentioned methods primarily focus on class-level adjustments, involving modifications to the weights of different classes. Specifically, Xu et al. [10] hold that hard classes with larger robust errors are intrinsically hard to learn, and adversarial training intensifies this phenomenon. While this explanation aligns with experimental results, it remains somewhat generalized. Manipulating a whole class may overly affect fairness-unrelated examples of this class and ignore fairness-related examples in other classes. Consequently, we posit that finer-grained measures targeting individual examples hold more promise. Inspired by the “hard example” of active learning [29–31], we decide to make a fine-grained inspection of training examples. We use the cross-entropy loss to measure the uncertainty of clean examples, with the insight that larger uncertainty indicates less-learned features and larger errors. The cross-entropy loss is calculated between the output logits of clean examples and their ground-truth labels for a robust model. This approach is cost-effective, as it involves testing on clean examples, whereas generating adversarial examples is more resource-intensive. Note that this loss is not directly related to robust errors, as no adversarial examples are involved in the calculation. As shown in Fig.2(a), hard classes with larger robust errors have larger cross-entropy loss values. Thus, we can utilize the cross-entropy loss calculated exclusively from clean examples to identify hard classes. Since this loss is calculated over training examples, we can reasonably consider it as a potential metric for finding hard examples.

Fig.2 Analyses of the relationship between cross-entropy loss values and (a) hard classes and (b) hard examples. (a) The class-wise robust error of the test set and the average cross-entropy loss value calculated over the training examples of each class. (b) We select the most robust classes (Class 9 and 1) and the least robust classes (Class 3 and 2). For hard examples with the largest cross-entropy loss values (from top 10% to top 90%), we plot the proportions of them belonging to these four classes

Full size|PPT slide

Examples with larger cross-entropy loss values mostly belong to hard classes (Fig.2(b)), further validating our view that we can use the cross-entropy loss to identify hard examples. Since not all examples in a hard class are challenging, directly manipulating hard classes could negatively impact non-hard examples. Besides, since previous research demonstrates that excessive perturbations may hurt robustness [32,33], it is inappropriate to put strong constraints on non-hard examples. We should focus our efforts on examples that have a greater impact on fairness. Intuitively, by identifying hard examples, it is possible to design more effective algorithms than just reweighting different classes, which may decrease the robust errors of some classes but increase others’. Although hard examples are unevenly distributed across classes, they provide fair information about all classes. Focusing on hard examples may help all classes accordingly. It is important to note that we currently use a robust model to calculate the cross-entropy loss. However, in the following section, we will remove this condition to reduce the cost.

5 FairAT: strategy and optimization

In this section, drawing inspiration from hard examples, we propose two intuitive methods aimed at enhancing robust fairness. After weighing their advantages and disadvantages, we then develop a practical strategy to increase the diversity of hard examples. Through the dynamic augmentation of hard examples, we achieve improved robust fairness at a small cost. Additionally, by implementing adaptive early stopping, we can identify the optimal epoch without needing to access the test set. Our ultimate strategy operates within a self-contained training process, independent of pre-trained robust models.

5.1 Intuitive strategy

Drawing inspiration from hard examples, we posit that manipulating individual examples is more effective and propose two strategies aimed at mitigating the impact of hard examples. Firstly, we can augment hard classes by adding training examples to these classes. The motivation is that hard classes have more hard examples that are hard to learn. If we add more examples to them, more information about their distribution will be provided. Thus, their robustness is likely to be improved [34–37]. We extract examples from the “80 Million Tiny Images” dataset (80M-TI) [38], which includes some offensive images; their use here is solely to demonstrate the effectiveness of this intuitive strategy. For the three least robust classes, we randomly choose some examples for each to enlarge their training sets. This approach can effectively improve robust fairness (Fig.3(a)). It can more effectively decrease the worst-class robust error (Class 3) and slightly decrease or increase other classes’ robust errors. However, collecting extra training examples is quite costly in practice. Besides, we do not know which class is not robust before training. Therefore, obtaining additional training examples specifically for hard classes would significantly elevate the training costs.

Fig.3 (a) Adding extra training examples from 80M-TI to the three least robust classes (Class 3, 2, and 4). We add 10% (500 of 5,000), 20%, and 30% examples to these classes respectively; (b) removing hard examples from the training set. We directly remove the top 10% to 90% hard examples of the whole training set

Full size|PPT slide

Secondly, we remove the top 10% to 90% hard examples in the training set. Our insight is that training examples may become easier to learn since hard examples are removed. Thus, robust fairness may be improved. However, the worst-class robust error and average robust error are both becoming larger as the proportion of removed examples increases (Fig.3(b)). This outcome may be attributed to the fact that each CIFAR10 class contains only 5,000 training examples, a relatively small number. Removing some training examples will obviously reduce the information of its distribution. Therefore, the robustness of the trained model will be worse.

5.2 Increasing diversity by data augmentation

The previously mentioned two methods are either costly or ineffective. Inspired by the performance of adding extra training examples, we decide to design a method that achieves similar results but at a low cost. Firstly, we should consider a question: why can extra training examples improve the robustness of hard classes? Our explanation is intuitive. More examples provide more information about the data distribution of these classes, thus making the model learn more. Therefore, if we can provide more information about hard examples, we may also improve robust fairness.

Data augmentation can provide more diversity to data and more information about the data distribution [15–17,39]. We consider using data augmentation as a substitute. Notice that the augmentation should be low-cost and focused on hard examples. Therefore, Cutout [15] is a good candidate since it only needs a simple operation on individual images and does not need interaction between different examples. Other data augmentation for improving generalization like Cutmix [17] and Mixup [16] are inappropriate since they need interaction between examples, which may overly disturb the features. We test them and find that they indeed decrease robust fairness (detailed in Section 7).

We use Cutout to augment hard examples with the proportions from the top 5% to 100% (of the whole training set). Combined with Baseline Reweight, it can greatly decrease the worst-class robust error compared with the original TRADES (percentage 0% in Fig.4(a)). Notice that larger proportions of hard examples do not always lead to lower robust errors. The top 10% may contain enough hard examples. When we increase the proportion, the additional examples may not be hard anymore. In this case, we reach our method’s upper limit. An appropriate proportion would lead to a lower worst-class robust error and also a lower average robust error. In summary, augmenting hard examples by Cutout always leads to lower worst-class robust errors than the original TRADES, validating the effectiveness of our strategy.

Fig.4 (a) Using Cutout to augment hard examples with different proportions. 0% is the original TRADES ( $λ = 6$ ). 10% indicates 5,000 examples of the training set are identified as hard examples and augmented; (b) the training process of our FairAT evaluated by a hold-out validation set and the test set. We report worst-class robust errors and average robust errors by them

Full size|PPT slide

5.3 Minimizing training cost

In the experiments above, the identification of hard examples is based on a pre-trained robust PreAct ResNet18 model, which is used to calculate cross-entropy values. However, training a robust model is very costly as it requires extensive gradient calculations to generate adversarial examples. If we still use this method to identify hard examples, our cost will be twice as much as common adversarial training. To minimize the burden of our method, it is natural to raise a question: can we identify hard examples without needing to train a robust model in advance?

Note that the accuracy and robustness are gradually increasing during adversarial training if we do not consider overfitting [8]. In other words, a model has learned some knowledge about the data distribution since the training started. Therefore, we can use the model at the previous epoch as an auxiliary model to identify hard examples for the current epoch. For example, if the current epoch is 80, the previous epoch is 79. To validate our idea, we choose three models at epochs 10, 45, and 80 of the same run in TRADES. We use them to select the top 50% hard examples and analyze their consistency. Surprisingly, the selection of hard examples among the three models is largely consistent (Fig.5). A large overlap exists among these three results. Since the model at Epoch 10 can select hard examples similar to those selected by the model at Epoch 80 (a robust model), we can speculate that using the model at the previous epoch can play a similar role as a pre-trained robust model. Additionally, we also verify the consistency of hard examples between the model in training and a pre-trained robust model. The proportions of the same hard examples selected by models at 10, 45, and 80 epochs and the pre-trained robust model are 79.64%, 84.42%, and 91.68%, respectively. The results also support our speculation. In this way, we can save costs and utilize the training process itself to identify hard examples. Based on the above findings, we propose Fair Adversarial Training (FairAT): dynamically identifying hard examples by the model at the previous epoch and augmenting them by Cutout (Fig.6). Notice that hard examples are not equal to incorrectly classified examples. Instead, they are examples that the model does not learn well. These two concepts are different. Therefore, although the selection of hard examples among the three models is similar, these three models have different robust errors (Fig.5).

Fig.5 The bar plots show the hard examples selected by models at different epochs of the same training run: a PreAct ResNet18 model trained by TRADES ( $λ = 6$ ). The training example indexes are re-ordered to show contiguous blocks. The plots show significant consistency in the individual selection of hard examples across the models at different epochs

Full size|PPT slide

Fig.6 Training the model of Epoch M for FairAT. It firstly inputs the original training dataset into the model of Epoch M-1 to calculate the cross-entropy values for each example. It then uses these values to differentiate hard examples from non-hard examples. After augmenting hard examples, it combines augmented examples and non-hard examples as the training dataset to train the model of Epoch M

Full size|PPT slide

5.4 Identifying the best model

Considering the overfitting phenomenon and the unstable training process of adversarial training [8], directly using the model of the final epoch may not be optimal. We plot worst-class robust errors and average robust errors evaluated by the test set in Fig.4(b). Their curves are unstable during training, especially for worst-class robust errors. The best model with the smallest errors is not at the final epoch. However, the test set should be unknown and cannot be used to test robust errors in practice. Therefore, we should find an alternative way. Inspired by early stopping [8], we can randomly hold out 300 examples of each class from the training set to test robustness. These examples constitute a validation set that does not participate in training. Different from the original early stopping [8] that considers average robust errors, we consider worst-class robust errors to improve robust fairness. Thus, we modify its metric to adapt to our settings as follows:

(6)

\begin{aligned} \underset{f}{m i n i m i z e} R_{r o b} (f, i) \\ s . t . R_{r o b} (f, i) \geq R_{r o b} (f, j) f o r e a c h j \neq i, \end{aligned}

where

R_{r o b} (f, i)

is the worst-class robust error.

With this metric, we can obtain a model that achieves an average robust error of 47.66% and a worst-class robust error of 66.00% on the test set without looking at the test set. It is exactly the same as using the test set to find the best epoch. Besides, the validation curves during training closely match the test curves (Fig.4(b)), further validating the feasibility of using this small hold-out validation set to identify the best model.

6 Experiment

In this section, we conduct extensive experiments on benchmark datasets and attacks to demonstrate the superiority of FairAT.

6.1 Experimental settings

Datasets and models Following [10], we conduct experiments on two benchmark datasets and two popular models to objectively compare our FairAT and baselines. Concretely, the datasets include CIAFR10 [40] and SVHN [41]. Note that CIFAR10 is a balanced dataset while SVHN is imbalanced. For both datasets, we test two models, including PreAct ResNet18 [42] and WRN-28-10 [43]. Additionally, we also consider the TinyImageNet-200 dataset [44] to make comparisons.

Baselines We consider eight baseline methods to make a comprehensive comparison. Firstly, we present the original performance of three popular adversarial training algorithms, including PGD Adv. Training [6] and two variants of TRADES [7]. Besides, there are some unfairness debiasing algorithms for traditional fairness issues. We adopt a typical one (Baseline Reweight) [27] and apply it to increase the weight of the class with the highest robust error in the training set, which is also considered as a baseline in [10]. Moreover, we also provide the results of two state-of-the-art solutions for robust fairness: FRL [10] and FAT [14]. For FRL, we consider all its variants: FRL (Reweight), FRL (Remargin), and FRL (Reweight+Remargin).

Metrics Following [10], we consider six metrics, i.e., average natural/boundary/robust errors and worst-class naturalboundary/robust errors to preliminarily analyze the source of robust fairness (defined in Section 3). We abbreviate them as Avg. Nat./Bndy./Rob. and Worst. Nat./Bndy./Rob. More importantly, to objectively compare different methods, we consider four popular attacks to evaluate robust errors: FGSM [2], PGD [6], C&W [3], and AutoAttack [4].

Implementation details For our FairAT, our implementation is based on TRADES [7]. We set the initial learning rate = 0.1 and then decay it by 0.1 at epochs 75, 90, and 100. The training budget is 105 epochs. During training, we use PGD-10 (10 steps) with perturbation size

ℓ_{\infty} = 8 / 255

and step size =

2 / 255

. The proportion of hard examples is set to 30% by default. For all baseline methods, we always adopt their default settings in the corresponding papers. The evaluation is conducted on four attacks with perturbation size

ℓ_{\infty} = 8 / 255

. Experiments are conducted on GeForce RTX 3090.

6.2 How does FairAT work

FairAT achieves a lower worst-class robust error than all baselines (Tab.2). According to Eq. (3), robust errors can be decomposed into natural errors and boundary errors. FairAT does not have the lowest worst-class natural error. In fact, its worst-class natural error is 6.60% larger than that of FRL (Reweight+Remargin). Its reduction in worst-class robust errors comes from boundary errors. Compared with FRL (Reweight+Remargin), it significantly reduces the worst-class boundary error by 10.70%. Such a reduction means that FairAT better prevents those correctly classified natural examples from being misclassified when combined with adversarial perturbations. Therefore, FairAT improves robust fairness by improving the robustness of those correctly classified natural examples instead of reducing natural errors.

Tab.2 Average & worst-class natural error, boundary error, and robust error for various algorithms against PreAct ResNet18 on CIFAR10. Robust errors are evaluated by PGD-20. The best results are in bold

Method	Avg. Nat.	Worst. Nat.	Avg. Bndy.	Worst. Bndy.	Avg. Rob.	Worst. Rob.
PGD Adv. Training	15.52	32.20	41.48	51.60	57.00	83.80
TRADES ( $λ = 1$ )	12.49	27.50	43.44	58.30	56.93	85.80
TRADES ( $λ = 6$ )	19.38	37.20	29.26	40.30	48.64	77.50
Baseline Reweight	15.21	30.30	42.23	52.60	57.44	82.90
FRL (Reweight)	16.20	28.80	38.06	44.50	54.26	73.30
FRL (Remargin)	15.61	27.40	37.48	48.90	53.09	76.30
FRL (Reweight+Remargin)	16.68	26.20	37.45	43.90	54.13	70.10
FAT	20.08	33.70	28.16	36.10	48.24	69.80
FairAT (Ours)	19.16	32.80	28.50	33.20	47.66	66.00

6.3 Robust fairness on CIFAR10

In this subsection, we evaluate FairAT and baselines with four popular attacks on CIFAR10. The experiments are conducted on PreAct ResNet18 and WRN-28-10, corresponding small models and large models in the settings of adversarial training.

PreAct ResNet18 Firstly, popular adversarial training algorithms such as PGD Adv. Training and TRADES have higher worst-class robust errors (Tab.3). It makes sense since they do not consider robust fairness. Besides, although Baseline Reweight has a lower worst-class natural error than that of TRADES (

λ = 6

), its worst-class robust error is higher. Thus, this reweighting method from traditional fairness does not perform well regarding robust fairness.

Tab.3 Average & worst-class natural error and robust error for various algorithms against PreAct ResNet18 on CIFAR10. Robust errors are evaluated by four popular attacks, including FGSM, PGD-20, C&W, and AutoAttack. The best results are in bold

Method	Natural		FGSM		PGD		C&W		AutoAttack
Method	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.
PGD Adv. Training	15.52	32.20	46.40	74.50	57.00	83.80	56.19	82.60	58.67	85.50
TRADES ( $λ = 1$ )	12.49	27.50	47.00	75.90	56.93	85.80	57.50	86.30	59.36	88.20
TRADES ( $λ = 6$ )	19.38	37.20	44.00	72.10	48.64	77.50	51.35	81.40	52.27	82.70
Baseline Reweight	15.21	30.30	47.02	74.50	57.44	82.90	56.75	82.10	59.30	85.00
FRL (Reweight)	16.20	28.80	46.34	65.10	54.26	73.30	54.82	74.30	56.56	77.50
FRL (Remargin)	15.61	27.40	45.40	67.50	53.09	76.30	53.64	76.90	55.36	79.90
FRL (Reweight+Remargin)	16.68	26.20	46.82	62.10	54.13	70.10	54.98	71.80	56.74	74.60
FAT	20.08	33.70	43.81	64.80	48.24	69.80	51.13	76.20	52.33	77.90
FairAT (Ours)	19.16	32.80	42.32	61.40	47.66	66.00	50.18	71.60	51.33	73.30

For methods designed for robust fairness, FRL (Reweight+Remargin) achieves lower worst-class robust errors against these four attacks than the other two variants of FRL. Besides, its worst-class robust error is 8.10% lower than that of TRADES (

λ = 6

) when evaluated against AutoAttack, significantly improving the robustness of the least robust class. Unfortunately, its average robust error is 4.47% higher than that of TRADES (

λ = 6

). Therefore, it is likely that FRL sacrifices the average robustness to improve the worst-class robustness. For FAT, although it is a newer work than FRL, its worst-class robust error is higher than that of FRL (Reweight+Remargin). The reason might be that it focuses on minimizing the variance of class-wise robust errors while overlooking the worst-class robust error.

FairAT always achieves the lowest worst-class robust error. When faced with PGD, its worst-class robust error is 3.80% lower than the second-best one. Such a reduction is significant considering the difficulty in improving the robustness of adversarial training algorithms [4,45]. Note that FairAT also achieves lower average robust errors than all baselines. In other words, FairAT does not sacrifice the average robustness to improve the worst-class robustness. Moreover, the robust errors of each class are all lower when compared with FRL (Reweight+Remargin) (Fig.7(a)). Most classes of FairAT have lower robust errors than those of FAT. Therefore, FairAT is indeed more robust and fairer.

Fig.7 Comparison of different adversarial training algorithms with regard to class-wise robust errors. The robust errors are evaluated by PGD-20. FRL is FRL (Reweight+Remargin). PGD AT. is PGD Adv. Training. (a) PreAct ResNet18 on CIFAR10; (b) WRN-28-10 on CIFAR10; (c) PreAct ResNet18 on SVHN; (d) WRN-28-10 on SVHN

Full size|PPT slide

WRN-28-10 For this large model, the trend is similar to that of PreAct ResNet18, but the improvement of FairAT is more significant (Tab.4). On the one hand, FairAT always achieves a larger reduction in terms of average robust errors. For example, the reduction is 1.00% for PreAct ResNet18 but 2.13% for WRN-28-10 when against AutoAttack. On the other hand, FairAT achieves a larger reduction against strong attacks like C&W and AutoAttack in terms of worst-class robust errors. For example, the reduction is 1.30% for PreAct ResNet18 but 3.50% for WRN-28-10 when against AutoAttack. Furthermore, its natural errors are closer to those of FRL (Reweight+Remargin) for this model than for PreAct ResNet18. Such phenomena indicate that FairAT could better demonstrate its superiority when evaluated with large models on CIFAR10. Note that the natural errors of FairAT are not the lowest. However, when considering adversarial attacks, robust errors are the actual errors. FairAT is more secure and fairer in this case.

Tab.4 Average & worst-class natural error and robust error for various algorithms against WRN-28-10 on CIFAR10

Method	Natural		FGSM		PGD		C&W		AutoAttack
Method	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.
PGD Adv. Training	13.37	26.10	42.49	68.70	53.00	79.00	52.39	78.40	54.39	80.30
TRADES ( $λ = 1$ )	11.15	21.40	40.75	66.40	51.74	76.90	51.00	76.50	53.23	78.60
TRADES ( $λ = 6$ )	13.78	26.20	38.43	62.60	47.04	72.50	47.17	72.20	49.24	74.80
Baseline Reweight	13.28	25.40	42.58	67.20	53.08	78.00	52.47	77.60	54.42	79.20
FRL (Reweight)	14.55	27.90	42.37	65.70	53.40	76.80	52.42	75.50	55.07	78.60
FRL (Remargin)	15.59	24.40	48.26	62.30	56.36	71.10	58.75	75.20	60.18	77.30
FRL (Reweight+Remargin)	15.66	30.10	45.09	63.90	51.77	67.70	53.79	70.10	55.17	71.30
FAT	15.90	30.90	40.24	58.30	45.53	64.10	47.94	68.90	49.55	72.30
FairAT (Ours)	15.35	28.30	38.10	58.00	44.59	63.20	46.02	65.70	47.42	67.80

6.4 Robust fairness on SVHN and TinyImageNet-200

In this subsection, we evaluate FairAT and baselines with four popular attacks on SVHN. The experiments are conducted on PreAct ResNet18 and WRN-28-10 with the same settings as CIFAR10.

PreAct ResNet18 on SVHN FairAT achieves surprisingly better results with this model on SVHN (Tab.5). Firstly, its reductions in average and worst-class robust errors are both more significant than those of the same settings on CIFAR10. For example, when evaluated against AutoAttack, the reductions in average and worst-class robust errors are 5.48% and 2.23%, respectively. These values are 1.00% and 1.30% with the same settings on CIFAR10 (Tab.3). For the other three attacks, the trend is similar. Such reductions are non-trivial. The reasons are two-fold: FairAT has only a small increase in the worst-class natural error but a large decrease in the worst-class boundary error. Secondly, the gap between FairAT and the best baseline regarding natural errors is also smaller than that of the same settings on CIFAR10. The gap is about 2% here but about 6% for the latter.

Tab.5 Average & worst-class natural error and robust error for various algorithms against PreAct ResNet18 on SVHN

Method	Natural		FGSM		PGD		C&W		AutoAttack
Method	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.
PGD Adv. Training	9.16	23.86	33.68	63.49	46.47	74.28	49.24	74.10	53.33	78.43
TRADES ( $λ = 1$ )	7.59	17.59	33.12	57.47	48.21	72.89	49.38	72.89	53.63	76.02
TRADES ( $λ = 6$ )	9.07	19.34	32.44	55.48	44.37	69.52	47.21	71.39	51.92	76.20
Baseline Reweight	9.35	18.13	34.15	58.61	48.25	72.17	49.61	72.41	54.28	75.54
FRL (Reweight)	9.23	16.36	33.65	45.96	46.67	70.06	46.96	69.76	59.33	76.33
FRL (Remargin)	9.61	15.71	32.25	42.76	46.87	67.89	46.54	67.29	62.54	76.33
FRL (Reweight+Remargin)	9.64	15.73	32.33	44.54	44.60	65.18	44.31	64.58	64.19	75.90
FAT	10.21	19.81	33.39	46.64	42.83	57.26	48.51	62.22	52.40	65.00
FairAT (Ours)	9.63	17.14	26.69	40.31	38.92	53.23	42.89	56.75	46.92	62.77

WRN-28-10 on SVHN For this model, although FairAT always achieves the lowest average and worst-class robust errors (Tab.6), its reduction compared with the second-best method is not as significant as that of PreAct ResNet18. This phenomenon is different from that of CIFAR10. We speculate that the reason may be that SVHN only contains digit images and is easier to learn than CIFAR10. Therefore, the performance of PreAct ResNet18 is already good, and there is little room for improvement.

Based on the results above, we can preliminarily conclude that FairAT is more robust and fairer on both balanced datasets (CIFAR10) and unbalanced datasets (SVHN).

PreAct ResNet18 on TinyImageNet-200 The conclusion on TinyImageNet-200 is similar to that on CIFAR10 and SVHN (Tab.7). Our FairAT always achieves the lowest average and worst-class robust errors. At the same time, FairAT also achieves comparable natural errors to the lowest ones. For example, its worst-class natural is just 0.35% higher than that of FRL (Reweight+Remargin). This value is marginal, especially when considering the error is still far lower than the upper bound (100%).

Tab.6 Average & worst-class natural error and robust error for various algorithms against WRN-28-10 on SVHN

Method	Natural		FGSM		PGD		C&W		AutoAttack
Method	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.
PGD Adv. Training	7.70	16.39	33.84	55.12	46.96	71.27	49.56	73.49	54.11	77.29
TRADES ( $λ = 1$ )	8.97	20.36	36.76	63.49	48.57	72.89	50.90	74.46	55.30	76.75
TRADES ( $λ = 6$ )	10.64	21.74	19.34	41.02	43.72	71.20	46.97	73.92	54.90	78.61
Baseline Reweight	7.19	13.92	37.61	59.52	50.21	72.29	51.70	73.43	55.50	75.60
FRL (Reweight)	8.08	14.53	24.09	40.06	54.38	71.08	52.93	69.76	63.67	76.39
FRL (Remargin)	7.78	13.26	22.64	35.72	49.21	66.02	47.88	64.88	59.18	71.81
FRL (Reweight+Remargin)	7.85	15.18	24.93	36.45	47.41	63.86	46.27	63.07	55.54	69.58
FAT	7.28	12.48	22.75	35.00	38.01	52.17	41.73	54.32	51.86	63.67
FairAT (Ours)	7.38	14.82	17.08	31.12	34.84	51.04	37.79	53.64	49.36	63.13

Tab.7 Average & worst-class natural error and robust error for various algorithms against PreAct ResNet18 on TinyImageNet-200

Method	Natural		FGSM		PGD		C&W		AutoAttack
Method	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.
PGD Adv. Training	59.21	83.40	81.92	97.50	83.71	98.60	88.34	99.00	88.92	99.25
TRADES ( $λ = 1$ )	58.53	82.75	82.43	97.70	83.27	98.65	87.11	99.05	87.63	99.20
TRADES ( $λ = 6$ )	59.42	83.15	80.20	96.90	81.82	97.50	86.10	98.90	86.55	99.00
Baseline Reweight	59.13	82.90	82.08	96.70	83.96	98.35	88.45	99.10	88.73	99.15
FRL (Reweight)	59.62	82.75	81.52	90.35	82.34	96.85	87.02	98.70	87.56	98.95
FRL (Remargin)	59.58	82.60	81.19	91.60	82.71	97.20	86.33	98.75	87.34	99.00
FRL (Reweight+Remargin)	59.78	82.30	81.53	89.90	82.06	96.45	86.49	98.40	87.49	98.85
FAT	59.93	83.80	79.13	90.20	80.38	96.30	84.76	98.90	85.40	99.10
FairAT (Ours)	58.97	82.65	78.00	85.65	79.71	95.90	83.52	98.05	84.15	98.30

6.5 Ablation study

Identification of hard examples In our FairAT, we use cross-entropy values between the output logits of clean examples and their ground-truth labels to measure the uncertainty. Larger cross-entropy values indicate harder examples. To demonstrate the effectiveness of our design, we also consider three different metrics to identify hard examples. Concretely, we consider three popular methods: Margin [41], LeastConf [46], and Maximum Entropy [47]. Margin selects examples whose two most likely labels in the confidence vectors have smaller differences. LeastConf selects examples whose labels in the confidence vectors have smaller confidence. Maximum Entropy selects examples whose entropy (instead of cross-entropy) of output logits is larger. As shown in Tab.8, our default setting (Cross Entropy) achieves the lowest average and worst-class robust errors at the same time. The improvement is more significant for average robust errors. Therefore, Cross Entropy is a better choice.

Tab.8 Comparison of different metrics about hard examples for PreAct ResNet18 on CIFAR10. The robust errors are evaluated by PGD-20

Method	Avg. Rob.	Worst. Rob.
Margin [40]	49.11	67.00
LeastConf [45]	48.83	66.80
Maximum Entropy [46]	47.93	66.10
Cross Entropy (Default)	47.66	66.00

Proportions of Cutout In the above experiments, we set 30% as the default proportion of hard examples. To test the rationality of this setting, we test different proportions from the top 5% to 100% in Fig.8. The proportion of 30% achieves the lowest average robust/worst-class robust errors for PreAct ResNet18 on CIFAR10. Although 30% is not always the best proportion for the other three experiments, its performance is still comparable to that of the best proportion. Besides, the range of effective proportions is wide. In fact, proportions of 10% to 100% all achieve satisfactory results. This is because the top 10% may contain enough hard examples. Larger proportions have little impact since the additional examples might not be hard. Therefore, even if we do not know the best setting, just a moderate value can work well.

Fig.8 Analyses of the effect of the proportions of hard examples on FairAT. We test different proportions from 5% to 100% on two datasets and two models. The proportion of 0% corresponds to the original TRADES ( $λ = 6$ ). The robust errors are evaluated by PGD-20. (a) PreAct ResNet18 on CIFAR10; (b) WRN-28-10 on CIFAR10; (c) PreAct ResNet18 on SVHN; (d) WRN-28-10 on SVHN

Full size|PPT slide

7 Discussion

In this section, we discuss three interesting questions about FairAT to deepen the understanding of our work.

The choice of data augmentation methods Audiences may wonder why we choose Cutout [15] instead of other well-performing data augmentation methods. For example, Cutmix [17] and Mixup [16] are both popular and well-performing data augmentation methods. The reason why we do not consider them is that they need interaction between examples. Since we differentiate hard examples from other examples, the interaction is hard to define. And they may overly disturb the features of hard examples by inserting the features of other examples. We have tried two kinds of interaction: interaction between a hard example and a non-hard example or between a hard example and another hard example. Unfortunately, they both increase the worst-class robust error by more than 2% compared with TRADES (

λ = 6

). Therefore, they are not good solutions for robust fairness. Note that our focus is a more fine-grained method for improving robust fairness from the perspective of hard examples. Therefore, although there are lots of data augmentation methods, we have not traversed them. Instead, we choose Cutout [15] as it can already fulfill our requirements about diversity.

Absolute fairness vs. high worst-class robustness In the pursuit of true robustness in reality, which objective is more practical? In our humble view, the latter better describes the real robustness. For example, if the robust error of each class is always 100%, absolute fairness is achieved. However, the model loses robustness completely. In reality, the weakest part is more likely to be attacked. In other words, the least robust class could become the attackers’ target, and its robustness is the true robustness in this case. Therefore, we hold that improving the worst-class robustness is the core problem of robust fairness and focus on this metric in Section 6.

Achieving both high fairness and high average robustness After reading the above content, a natural question emerges: is it possible to achieve high fairness and high average robustness at the same time? That is, a model has state-of-the-art robustness for the whole dataset and equal robustness for each class. To the best of our knowledge, the answer may be negative. Previous research has demonstrated that the higher robust errors of hard classes are because of the similarity between these classes [9,10]. Considering the limited number of training examples for many datasets like CIFAR10, it is quite difficult to effectively differentiate these hard classes by a vanilla deep learning model. The robust errors of hard classes will be higher than those of non-hard classes. Therefore, without extra training examples, it is difficult to achieve high average robustness similar to the state-of-the-art methods [6,7] and high fairness at the same time.

Broader impacts Adversarial training has become one of the most reliable empirical defenses against adversarial attacks. However, its robust fairness issue significantly impacts its fundamental robustness, which would cause the barrel principle and ethical issues. For the barrel principle, less robust classes are serious vulnerabilities. For example, if a deep learning model in autonomous driving exhibits high average robustness but lacks robustness to pedestrians, it could pose a danger to them. For ethical issues, different people or groups are protected with various levels of robustness. Considering the pursuit of equality in today’s society, people who have less or no protection are clearly being discriminated against. Such a situation should be avoided. In this work, we propose FairAT to alleviate this problem, enhancing the applicability of adversarially robust models in reality.

8 Conclusion

In this paper, we explore more fine-grained methods to enhance robust fairness from the perspective of hard examples. First, through an analysis of the relationship between class-wise robustness and the uncertainty of robust models regarding individual examples, we find that hard examples with greater uncertainty could serve as more precise indicators of robust fairness. Subsequently, we find that enhancing the diversity of hard examples, achievable through data augmentation, can improve robust fairness. Building on these insights, we propose Fair Adversarial Training (FairAT): dynamically identifying hard examples and augmenting them to provide more distribution information. Experimental results demonstrate that FairAT outperforms state-of-the-art methods in terms of both average robust errors and worst-class robust errors.

Ningping Mou received the BE degree from Wuhan University, China in 2021. He is currently working toward the PhD degree in the School of Cyber Science and Engineering, Wuhan University, China and a joint PhD degree with City University of Hong Kong, China. His research interests include machine learning and AI security

Xinli Yue received the BE degree from Wuhan University, China in 2022. He is currently working toward the MS degree in the School of Cyber Science and Engineering, Wuhan University, China. His research interests include machine learning and AI security

Lingchen Zhao is currently an associate professor with the School of Cyber Science and Engineering, Wuhan University, China. He received his PhD degree in Cyberspace Security from Wuhan University, China in 2021 and his BE degree in Information Security from Central South University, China in 2016. He was a postdoctoral researcher with City University of Hong Kong, China from 2021 to 2022. His research interests include data security and AI security

Qian Wang (Fellow, IEEE) is currently a professor with the School of Cyber Science and Engineering, Wuhan University, China. He has published more than 200 papers with more than 120 publications in top-tier international conferences including USENIX NSDI, IEEE S&P, ACM CCS, USENIX Security, NDSS and with more than 20000 Google Scholar citations. He has long been engaged in the research of cyberspace security, with a focus on AI security, data outsourcing security and privacy, wireless systems security, and applied cryptography. He was a recipient of the 2018 IEEE TCSC Award for Excellence in Scalable Computing (early Career Researcher) and the 2016 IEEE ComSoc Asia-Pacific Outstanding Young Researcher Award. He serves as Associate Editors for IEEE TDSC, IEEE TIFS, and IEEE TETC

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I J, Fergus R. Intriguing properties of neural networks. In: Proceedings of the 2nd International Conference on Learning Representations. 2014

[2]	Goodfellow I J, Shlens J, Szegedy C. Explaining and harnessing adversarial examples. In: Proceedings of the 3rd International Conference on Learning Representations. 2015

[3]	Carlini N, Wagner D. Towards evaluating the robustness of neural networks. In: Proceedings of IEEE Symposium on Security and Privacy. 2017, 39–57

[4]	Croce F, Hein M. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 206

[5]	Athalye A, Carlini N, Wagner D A. Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 274–283

[6]	Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A. Towards deep learning models resistant to adversarial attacks. In: Proceedings of the 6th International Conference on Learning Representations. 2018

[7]	Zhang H, Yu Y, Jiao J, Xing E P, El Ghaoui L, Jordan M I. Theoretically principled trade-off between robustness and accuracy. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 7472–7482

[8]	Rice L, Wong E, Kolter J Z. Overfitting in adversarially robust deep learning. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 749

[9]	Tian Q, Kuang K, Jiang K, Wu F, Wang Y. Analysis and applications of class-wise robustness in adversarial training. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021, 1561–1570

[10]	Xu H, Liu X, Li Y, Jain A K, Tang J. To be robust or to be fair: towards fairness in adversarial training. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 11492–11501

[11]	Hardt M, Price E, Srebro N. Equality of opportunity in supervised learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 3323–3331

[12]	Krasanakis E, Spyromitros-Xioufis E, Papadopoulos S, Kompatsiaris Y. Adaptive sensitive reweighting to mitigate bias in fairness-aware classification. In: Proceedings of 2018 World Wide Web Conference. 2018, 853–862

[13]	Ustun B, Liu Y, Parkes D C. Fairness without harm: decoupled classifiers with preference guarantees. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 6373–6382

[14]	Ma X, Wang Z, Liu W. On the tradeoff between robustness and fairness. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 26230–26241

[15]	Devries T, Taylor G W. Improved regularization of convolutional neural networks with cutout. 2017, arXiv preprint arXiv: 1708.04552

[16]	Zhang H, Cissé M, Dauphin Y N, Lopez-Paz D. Mixup: beyond empirical risk minimization. In: Proceedings of the 6th International Conference on Learning Representations. 2018

[17]	Yun S, Han D, Chun S, Oh S J, Yoo Y, Choe J. CutMix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 6022–6031

[18]	Wang Y, Zou D, Yi J, Bailey J, Ma X, Gu Q. Improving adversarial robustness requires revisiting misclassified examples. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[19]	Zhan Y, Zheng B, Wang Q, Mou N, Guo B, Li Q, Shen C, Wang C. Towards black-box adversarial attacks on interpretable deep learning systems. In: Proceedings of 2022 IEEE International Conference on Multimedia and Expo. 2022, 1–6

[20]	Mou N, Zheng B, Wang Q, Ge Y, Guo B. A few seconds can change everything: Fast decision-based attacks against DNNs. In: Proceedings of the 31st International Joint Conference on Artificial Intelligence. 2022, 3342–3350

[21]	Tramèr F, Carlini N, Brendel W, Mądry A. On adaptive attacks to adversarial example defenses. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 138

[22]	Aghaei S, Azizi M J, Vayanos P. Learning optimal and fair decision trees for non-discriminative decision-making. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 1418–1426

[23]	Goel N, Yaghini M, Faltings B. Non-discriminatory machine learning through convex fairness criteria. In: Proceedings of 2018 AAAI/ACM Conference on AI, Ethics, and Society. 2018, 116

[24]	Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A . A survey on bias and fairness in machine learning. ACM Computing Surveys, 2022, 54( 6): 115

[25]	Wang Y X, Ramanan D, Hebert M. Learning to model the tail. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 7032–7042

[26]	Cao K, Wei C, Gaidon A, Aréchiga N, Ma T. Learning imbalanced datasets with label-distribution-aware margin loss. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 1567–1578

[27]	Agarwal A, Beygelzimer A, Dudík M, Langford J, Wallach H. A reductions approach to fair classification. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 60–69

[28]	Cui Y, Jia M, Lin T Y, Song Y, Belongie S. Class-balanced loss based on effective number of samples. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9260–9269

[29]	Zhan X, Liu H, Li Q, Chan A B. A comparative survey: benchmarking for pool-based active learning. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence. 2021, 4679–4686

[30]	Beluch W H, Genewein T, Nürnberger A, Köhler J M. The power of ensembles for active learning in image classification. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 9368–9377

[31]	Gal Y, Islam R, Ghahramani Z. Deep Bayesian active learning with image data. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 1183–1192

[32]	Rade R, Moosavi-Dezfooli S M. Reducing excessive margin to achieve a better accuracy vs. robustness trade-off. In: Proceedings of the 10th International Conference on Learning Representations. 2022

[33]	Zhang J, Zhu J, Niu G, Han B, Sugiyama M, Kankanhalli M S. Geometry-aware instance-reweighted adversarial training. In: Proceedings of the 9th International Conference on Learning Representations. 2021

[34]	Carmon Y, Raghunathan A, Schmidt L, Liang P, Duchi J C. Unlabeled data improves adversarial robustness. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 1004

[35]	Hendrycks D, Lee K, Mazeika M. Using pre-training can improve model robustness and uncertainty. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 2712–2721

[36]	Najafi A, Maeda S I, Koyama M, Miyato T. Robustness to adversarial perturbations in learning from incomplete data. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 497

[37]	Zhai R, Cai T, He D, Dan C, He K, Hopcroft J, Wang L. Adversarially robust generalization just requires more unlabeled data. In: Proceedings of ICLR 2020. 2020

[38]	Torralba A, Fergus R, Freeman W T . 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30( 11): 1958–1970

[39]	Cubuk E D, Zoph B, Mané D, Vasudevan V, Le Q V. AutoAugment: learning augmentation policies from data. 2018, arXiv preprint arXiv: 1805.09501

[40]	Krizhevsky A. Learning multiple layers of features from tiny images. Technical Report, University of Toronto, 2009

[41]	Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A Y. Reading digits in natural images with unsupervised feature learning. In: Proceedings of NIPS Workshop on Deep Learning and Unsupervised Feature Learning. 2011

[42]	He K, Zhang X, Ren S, Sun J. Identity mappings in deep residual networks. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 630–645

[43]	Zagoruyko S, Komodakis N. Wide residual networks. In: Proceedings of British Machine Vision Conference. 2016

[44]	Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248–255

[45]	Croce F, Andriushchenko M, Sehwag V, Debenedetti E, Flammarion N, Chiang M, Mittal P, Hein M. RobustBench: a standardized adversarial robustness benchmark. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021

[46]	Wang D, Shang Y. A new active labeling method for deep learning. In: Proceedings of 2014 International Joint Conference on Neural Networks. 2014, 112–119

[47]	Shannon C E . A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review, 2001, 5( 1): 3–55

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China (Grant Nos. U20B2049, U21B2018 and 62302344).

Competing interests

The authors declare that they have no competing interests or financial conflicts to disclose.

RIGHTS & PERMISSIONS

2025 Higher Education Press

AI Summary AI Mindmap

PDF(5214 KB)

Supplementary files

FCS-23587-OF-NM_suppl_1 (246 KB)

723

Accesses

Citations

Altmetric

Detail

Sections

Recommended

Abstract
Graphical abstract
Keywords
Cite this article
1 Introduction
Fig.1 Demonstration of unfairness in adversarial training with a robust PreAct ResNet18 on CIFAR10. We show the class-wise/average robust errors of (a) PGD Adv. Training [6] and (b) our FairAT. Larger deviations in class-wise robustness relative to average robustness indicate more serious unfairness. The robust errors are evaluated by PGD-20 attack (ℓ∞=8/255) following [10]
2 Related work
2.1 Adversarial training
2.2 Fariness in deep learning
3 Problem statement
4 Insight: from hard class to hard example
Tab.1 Robust errors (%) of different reweighting algorithms. Avg. Rob. refers to the average robust error. Worst. Rob. refers to the worst-class robust error
Fig.2 Analyses of the relationship between cross-entropy loss values and (a) hard classes and (b) hard examples. (a) The class-wise robust error of the test set and the average cross-entropy loss value calculated over the training examples of each class. (b) We select the most robust classes (Class 9 and 1) and the least robust classes (Class 3 and 2). For hard examples with the largest cross-entropy loss values (from top 10% to top 90%), we plot the proportions of them belonging to these four classes
5 FairAT: strategy and optimization
5.1 Intuitive strategy
Fig.3 (a) Adding extra training examples from 80M-TI to the three least robust classes (Class 3, 2, and 4). We add 10% (500 of 5,000), 20%, and 30% examples to these classes respectively; (b) removing hard examples from the training set. We directly remove the top 10% to 90% hard examples of the whole training set
5.2 Increasing diversity by data augmentation
Fig.4 (a) Using Cutout to augment hard examples with different proportions. 0% is the original TRADES (λ=6). 10% indicates 5,000 examples of the training set are identified as hard examples and augmented; (b) the training process of our FairAT evaluated by a hold-out validation set and the test set. We report worst-class robust errors and average robust errors by them
5.3 Minimizing training cost
Fig.5 The bar plots show the hard examples selected by models at different epochs of the same training run: a PreAct ResNet18 model trained by TRADES (λ=6). The training example indexes are re-ordered to show contiguous blocks. The plots show significant consistency in the individual selection of hard examples across the models at different epochs
Fig.6 Training the model of Epoch M for FairAT. It firstly inputs the original training dataset into the model of Epoch M-1 to calculate the cross-entropy values for each example. It then uses these values to differentiate hard examples from non-hard examples. After augmenting hard examples, it combines augmented examples and non-hard examples as the training dataset to train the model of Epoch M
5.4 Identifying the best model
6 Experiment
6.1 Experimental settings
6.2 How does FairAT work
Tab.2 Average & worst-class natural error, boundary error, and robust error for various algorithms against PreAct ResNet18 on CIFAR10. Robust errors are evaluated by PGD-20. The best results are in bold
6.3 Robust fairness on CIFAR10
Tab.3 Average & worst-class natural error and robust error for various algorithms against PreAct ResNet18 on CIFAR10. Robust errors are evaluated by four popular attacks, including FGSM, PGD-20, C&W, and AutoAttack. The best results are in bold
Fig.7 Comparison of different adversarial training algorithms with regard to class-wise robust errors. The robust errors are evaluated by PGD-20. FRL is FRL (Reweight+Remargin). PGD AT. is PGD Adv. Training. (a) PreAct ResNet18 on CIFAR10; (b) WRN-28-10 on CIFAR10; (c) PreAct ResNet18 on SVHN; (d) WRN-28-10 on SVHN
Tab.4 Average & worst-class natural error and robust error for various algorithms against WRN-28-10 on CIFAR10
6.4 Robust fairness on SVHN and TinyImageNet-200
Tab.5 Average & worst-class natural error and robust error for various algorithms against PreAct ResNet18 on SVHN
Tab.6 Average & worst-class natural error and robust error for various algorithms against WRN-28-10 on SVHN
Tab.7 Average & worst-class natural error and robust error for various algorithms against PreAct ResNet18 on TinyImageNet-200
6.5 Ablation study
Tab.8 Comparison of different metrics about hard examples for PreAct ResNet18 on CIFAR10. The robust errors are evaluated by PGD-20
Fig.8 Analyses of the effect of the proportions of hard examples on FairAT. We test different proportions from 5% to 100% on two datasets and two models. The proportion of 0% corresponds to the original TRADES (λ=6). The robust errors are evaluated by PGD-20. (a) PreAct ResNet18 on CIFAR10; (b) WRN-28-10 on CIFAR10; (c) PreAct ResNet18 on SVHN; (d) WRN-28-10 on SVHN
7 Discussion
8 Conclusion
References
Acknowledgements
Competing interests
RIGHTS & PERMISSIONS

Received	Accepted	Published
22 Jul 2023	16 Jan 2024	15 Mar 2025
Just Accepted Date	Issue Date
17 Jan 2024	23 Apr 2024

About the journal

Browse

Authors & reviewers

Abstract

Graphical abstract

Keywords

Cite this article

1 Introduction

2 Related work

2.1 Adversarial training

2.2 Fariness in deep learning

3 Problem statement

4 Insight: from hard class to hard example

Tab.1 Robust errors (%) of different reweighting algorithms. Avg. Rob. refers to the average robust error. Worst. Rob. refers to the worst-class robust error

5 FairAT: strategy and optimization

5.1 Intuitive strategy

5.2 Increasing diversity by data augmentation

5.3 Minimizing training cost

5.4 Identifying the best model

6 Experiment

6.1 Experimental settings

6.2 How does FairAT work

Tab.2 Average & worst-class natural error, boundary error, and robust error for various algorithms against PreAct ResNet18 on CIFAR10. Robust errors are evaluated by PGD-20. The best results are in bold

6.3 Robust fairness on CIFAR10

Tab.3 Average & worst-class natural error and robust error for various algorithms against PreAct ResNet18 on CIFAR10. Robust errors are evaluated by four popular attacks, including FGSM, PGD-20, C&W, and AutoAttack. The best results are in bold

Tab.4 Average & worst-class natural error and robust error for various algorithms against WRN-28-10 on CIFAR10

6.4 Robust fairness on SVHN and TinyImageNet-200

Tab.5 Average & worst-class natural error and robust error for various algorithms against PreAct ResNet18 on SVHN

Tab.6 Average & worst-class natural error and robust error for various algorithms against WRN-28-10 on SVHN

Tab.7 Average & worst-class natural error and robust error for various algorithms against PreAct ResNet18 on TinyImageNet-200

6.5 Ablation study

Tab.8 Comparison of different metrics about hard examples for PreAct ResNet18 on CIFAR10. The robust errors are evaluated by PGD-20

7 Discussion

8 Conclusion

{{custom_sec.title}}

{{custom_sec.title}}

References

Acknowledgements

Competing interests

RIGHTS & PERMISSIONS