Most explanation methods are designed in an empirical manner, so exploring whether there exists a first-principles explanation of a deep neural network (DNN) becomes the next core scientific problem in explainable artificial intelligence (XAI). Although it is still an open problem, in this paper, we discuss whether the interaction-based explanation can serve as the first-principles explanation of a DNN. The strong explanatory power of interaction theory comes from the following aspects: (1) it establishes a new axiomatic system to quantify the decision-making logic of a DNN into a set of symbolic interaction concepts; (2) it simultaneously explains various deep learning phenomena, such as generalization power, adversarial sensitivity, representation bottleneck, and learning dynamics; (3) it provides mathematical tools that uniformly explain the mechanisms of various empirical attribution methods and empirical adversarial-transferability-boosting methods; (4) it explains the extremely complex learning dynamics of a DNN by analyzing the two-phase dynamics of interaction complexity, which further reveals the internal mechanism of why and how the generalization power/adversarial sensitivity of a DNN changes during the learning process.
Image generation models have made remarkable progress, and image evaluation is crucial for explaining and driving the development of these models. Previous studies have extensively explored human and automatic evaluations of image generation. Herein, these studies are comprehensively surveyed, specifically for two main parts: evaluation protocols and evaluation methods. First, 10 image generation tasks are summarized with focus on their differences in evaluation aspects. Based on this, a novel protocol is proposed to cover human and automatic evaluation aspects required for various image generation tasks. Second, the review of automatic evaluation methods in the past five years is highlighted. To our knowledge, this paper presents the first comprehensive summary of human evaluation, encompassing evaluation methods, tools, details, and data analysis methods. Finally, the challenges and potential directions for image generation evaluation are discussed. We hope that this survey will help researchers develop a systematic understanding of image generation evaluation, stay updated with the latest advancements in the field, and encourage further research.
The task of recognizing Chinese variant characters aims to address the challenges of semantic ambiguity and confusion, which potentially cause risks to the security of Web content and complicate the governance of sensitive words. Most existing approaches predominantly prioritize the acquisition of contextual knowledge from Chinese corpora and vocabularies during pretraining, often overlooking the inherent phonological and morphological characteristics of the Chinese language. To address these issues, we propose a shared-weight multimodal translation model (SMTM) based on multimodal information of Chinese characters, which integrates the phonology of Pinyin and the morphology of fonts into each Chinese character token to learn the deeper semantics of variant text. Specifically, we encode the Pinyin features of Chinese characters using the embedding layer, and the font features of Chinese characters are extracted based on convolutional neural networks directly. Considering the multimodal similarity between the source and target sentences of the Chinese variant-character-recognition task, we design the shared-weight embedding mechanism to generate target sentences using the heuristic information from the source sentences in the training process. The simulation results show that our proposed SMTM achieves remarkable performance of 89.550% and 79.480% on bilingual evaluation understudy (BLEU) and F1 metrics respectively, with significant improvement compared with state-of-the-art baseline models.
On-orbit service is important for maintaining the sustainability of the space environment. A space-based visible camera is an economical and lightweight sensor for situational awareness during on-orbit service. However, it can be easily affected by the low illumination environment. Recently, deep learning has achieved remarkable success in image enhancement of natural images, but it is seldom applied in space due to the data bottleneck. In this study, we first propose a dataset of BeiDou navigation satellites for on-orbit low-light image enhancement (LLIE). In the automatic data collection scheme, we focus on reducing the domain gap and improving the diversity of the dataset. We collect hardware-in-the-loop images based on a robotic simulation testbed imitating space lighting conditions. To evenly sample poses of different orientations and distances without collision, we propose a collision-free workspace and pose-stratified sampling. Subsequently, we develop a novel diffusion model. To enhance the image contrast without over-exposure and blurred details, we design fused attention guidance to highlight the structure and the dark region. Finally, a comparison of our method with previous methods indicates that our method has better on-orbit LLIE performance.
Hardware transient faults are proven to have a significant impact on deep neural networks (DNNs), whose safety-critical misclassification (SCM) in autonomous vehicles, healthcare, and space applications is increased up to four times. However, the inaccuracy evaluation using accurate fault injection is time-consuming and requires several hours and even a couple of days on a complete simulation platform. To accelerate the evaluation of hardware transient faults on DNNs, we design a unified and end-to-end automatic methodology, A-Mean, using the silent data corruption (SDC) rate of basic operations (such as convolution, addition, multiply, ReLU, and max-pooling) and a static two-level mean calculation mechanism to rapidly compute the overall SDC rate, for estimating the general classification metric accuracy and application-specific metric SCM. More importantly, a max-policy is used to determine the SDC boundary of non-sequential structures in DNNs. Then, the worst-case scheme is used to further calculate the enlarged SCM and halved accuracy under transient faults, via merging the static results of SDC with the original data from one-time dynamic fault-free execution. Furthermore, all of the steps mentioned above have been implemented automatically, so that this easy-to-use automatic tool can be employed for prompt evaluation of transient faults on diverse DNNs. Meanwhile, a novel metric “fault sensitivity” is defined to characterize the variation of transient fault-induced higher SCM and lower accuracy. The comparative results with a state-of-the-art fault injection method TensorFI+ on five DNN models and four datasets show that our proposed estimation method A-Mean achieves up to 922.80 times speedup, with just 4.20% SCM loss and 0.77% accuracy loss on average. The artifact of A-Mean is publicly available at https://github.com/breatrice321/A-Mean.
To address the challenge of cyberattacks, intrusion detection systems (IDSs) are introduced to recognize intrusions and protect computer networks. Among all these IDSs, conventional machine learning methods rely on shallow learning and have unsatisfactory performance. Unlike machine learning methods, deep learning methods are the mainstream methods because of their capability to handle mass data without prior knowledge of specific domain expertise. Concerning deep learning, long short-term memory (LSTM) and temporal convolutional networks (TCNs) can be used to extract temporal features from different angles, while convolutional neural networks (CNNs) are valuable for learning spatial properties. Based on the above, this paper proposes a novel interlaced and spatiotemporal deep learning model called CRGT-SA, which combines CNN with gated TCN and recurrent neural network (RNN) modules to learn spatiotemporal properties, and imports the self-attention mechanism to select significant features. More specifically, our proposed model splits the feature extraction into multiple steps with a gradually increasing granularity, and executes each step with a combined CNN, LSTM, and gated TCN module. Our proposed CRGT-SA model is validated using the UNSW-NB15 dataset and is compared with other compelling techniques, including traditional machine learning and deep learning models as well as state-of-the-art deep learning models. According to the simulation results, our proposed model exhibits the highest accuracy and F1-score among all the compared methods. More specifically, our proposed model achieves 91.5% and 90.5% accuracy for binary and multi-class classifications respectively, and demonstrates its ability to protect the Internet from complicated cyberattacks. Moreover, we conduct another series of simulations on the NSL-KDD dataset; the simulation results of comparison with other models further prove the generalization ability of our proposed model.
Metallic mesh is a transparent electromagnetic shielding film with a fine metal line structure. However, in production preparation or actual use it can develop defects that affect the optoelectronic performance. The development of in situ non-destructive testing (NDT) devices for metallic mesh requires long working distances, reflective optical path design, and miniaturization. To address the limitations of existing smartphone microscopes, which feature short working distances and inadequate transmission imaging for industrial in situ inspection, we propose a novel long-working-distance reflective smartphone microscopy (LD-RSM) system. LD-RSM comprises a 4f optical imaging system with external optical components and a smartphone. This system uses a beam splitter to achieve reflective imaging with the illumination system and imaging system on the same side of the sample. It achieves an optical resolution of 4.92 μm and a working distance of up to 22.23 mm. Additionally, we introduce dual-prior weighted robust principal component analysis (DW-RPCA) for defect detection. This approach leverages spectral filter fusion and the Hough transform to model different defect types, which enhances the accuracy and efficiency of defect identification. Coupled with a double-threshold segmentation approach, the DW-RPCA method achieves a pixel-level defect detection accuracy (f-value) of 0.856 and 0.848 in square and circular metallic mesh datasets, respectively. Our work shows strong potential in the field of in situ industrial product inspection.
Spatial crowdsourcing (SC), as an effective paradigm for accomplishing spatiotemporal tasks, has gradually attracted widespread attention from both industry and academia. With the advancement of mobile technology, the service modes of SC have become more diversified and flexible, aiming to better meet the variable requirements of users. However, most research has focused on homogeneous task allocation problems under a single service model, without considering the individual differences among task requirements and workers. Consequently, many of these studies fail to achieve satisfactory outcomes in real scenarios. Based on real service scenarios, in this study, we investigate a heterogeneous multi-task allocation (HMTA) problem for hybrid scenarios and provide a formal description and definition of the problem. To solve the problem, we propose a role division approach embedded with an individual sorting model (RD-ISM). This approach is implemented based on a batch-based mode (BBM) and consists of two parts. First, an individual sorting model is introduced to determine the sequence of objects based on spatiotemporal attributes, prioritizing tasks and workers. Second, a role division model is designed based on an attraction–repulsion mechanism to match tasks and workers. Following several iterations over multiple batches, the approach obtains the final matching results. The effectiveness of the approach is verified using real and synthetic datasets and its performance is demonstrated through comparisons with other algorithms. Additionally, the impact of different parameters within the approach is investigated, confirming its scalability.
The rapid growth and increasing complexity of Internet of Things (IoT) devices have made network intrusion detection a critical challenge, especially in edge computing environments where data privacy is a primary concern. Machine learning-based intrusion detection techniques enhance IoT network security but often require centralized network data, posing significant risks to data privacy and security. Although federated learning (FL)-based network intrusion detection methods have emerged in recent years to address privacy concerns, they have not fully leveraged the advantages of graph neural networks (GNNs) for intrusion detection. To address this issue, we propose a federated spatiotemporal graph convolutional network (FedSTGCN) model, which integrates the capabilities of spatiotemporal GNNs (STGNNs) and federated learning. This framework enables collaborative model training across distributed IoT devices without requiring the sharing of raw data, thereby improving network intrusion detection accuracy while preserving data privacy. Extensive experiments are conducted on two widely used IoT intrusion detection datasets to evaluate the effectiveness of the proposed approach. The results demonstrate that FedSTGCN outperforms other methods in both binary and multiclass classification tasks, achieving over 97% accuracy in binary classification tasks and over 92% weighted F1-score in multiclass classification tasks.
Predicting remaining useful life (RUL) of bearings under scarce labeled data is significant for intelligent manufacturing. Current approaches typically encounter the challenge that different degradation stages have similar behaviors in multisensor scenarios. Given that cross-sensor similarity improves the discrimination of degradation features, we propose a multisensor contrast method for RUL prediction under scarce RUL-labeled data, in which we use cross-sensor similarity to mine multisensor similar representations that indicate machine health condition from rich unlabeled sensor data in a co-occurrence space. Specifically, we use ResNet18 to span the features of different sensors into the co-occurrence space. We then obtain multisensor similar representations of abundant unlabeled data through alternate contrast based on cross-sensor similarity in the co-occurrence space. The multisensor similar representations indicate the machine degradation stage. Finally, we focus on finetuning these similar representations to achieve RUL prediction with limited labeled sensor data. The proposed method is evaluated on a publicly available bearing dataset, and the results show that the mean absolute percentage error is reduced by at least 0.058, and the score is improved by at least 0.122 compared with those of state-of-the-art methods.
As the topology of DeviceNet in industrial automation systems grows more complex and the reliability requirement for industrial equipment and processes becomes more stringent, the importance of network troubleshooting is increasingly evident. Intermittent connection (IC) faults frequently occur in DeviceNet systems, impairing production performance and even operational safety. However, existing IC troubleshooting methods for DeviceNet, especially those with complex topologies, cannot directly handle multi-fault scenarios, which require human intervention for a full diagnosis. In this paper, a novel data-driven IC fault diagnosis method based on Bayesian inference is proposed for DeviceNet with complex topologies, which can accurately and efficiently localize all IC faults in the network without interrupting the normal system operation. First, the observation symptoms are defined by analyzing the data frames interrupted by IC faults, and the suspected IC faults are derived by integrating the observation symptoms and the network topology information. Second, a Bayesian inference-based estimation approach for the posterior probability of each suspected fault occurring in the network is proposed using the quantity of observation symptoms and their causal relationships regarding the suspected faults. Finally, a maximum likelihood-based fast diagnosis algorithm is developed to rapidly identify the IC fault locations in various complex scenarios. A laboratory testbed is constructed and case studies are conducted under various topologies and fault scenarios to demonstrate the effectiveness and advantages of the proposed method. Experimental results show that the IC fault locations diagnosed by the proposed method agree well with the experimental setup.
Isolated data islands are prevalent in intelligent automated optical inspection (AOI) systems, limiting the full utilization of data resources and impeding the potential of AOI systems. Establishing a collaborative ecology involving software providers, hardware manufacturers, and factories offers an encouraging solution to build a closed-loop data flow and achieve optimal data resource utilization. However, concerns about privacy issues, rights infringement, and threats from other participants present challenges in establishing an efficient and effective community. In this paper, we propose a novel framework, AOI-OPEN, which first creates a trustworthy AOI ecology to gather related entities with decentralized autonomous organization (DAO) mechanisms. Then, a parallel data pipeline is proposed to generate large-scale virtual samples from small-scale real data for AOI systems. Finally, federated learning (FL) is adopted to use the distributed data resources among multiple entities and build privacy-preserving big models. Experiments on defect classification tasks show that, with privacy preserved, AOI-OPEN greatly strengthens the utilization of distributed data resources and improves the accuracy of inspection models.
Multi-objective games (MOGs) have received much attention in recent years as a class of games with vector payoffs. Based on the semi-tensor product (STP), this paper discusses the MOG, including the existence, finite-step reachability, and finite-step controllability of Pareto equilibrium of this model, from both static and dynamic perspectives. First, the MOG concept is presented using multi-layer graphs, and STP is used to convert the payoff function into its algebraic form. Then, from the static perspective, two necessary and sufficient conditions are proposed to verify whether all players can meet their expectations and whether the strategy profile is a Pareto equilibrium, separately. Furthermore, from the dynamic perspective, a strategy updating rule is designed to investigate the finite-step reachability of the evolutionary MOG. Finally, the finite-step controllability of the evolutionary MOG is analyzed by adding pseudo-players, and a backward search algorithm is provided to find the shortest evolutionary process and control sequence.