Robustness of deep neural networks (DNNs) has caused great concerns in the academic and industrial communities, especially in safety-critical domains. Instead of verifying whether the robustness property holds or not in certain neural networks, this paper focuses on training robust neural networks with respect to given perturbations. State-of-the-art training methods, interval bound propagation (IBP) and CROWN-IBP, perform well with respect to small perturbations, but their performance declines significantly in large perturbation cases, which is termed “drawdown risk” in this paper. Specifically, drawdown risk refers to the phenomenon that IBP-family training methods cannot provide expected robust neural networks in larger perturbation cases, as in smaller perturbation cases. To alleviate the unexpected drawdown risk, we propose a global and monotonically decreasing robustness training strategy that takes multiple perturbations into account during each training epoch (global robustness training), and the corresponding robustness losses are combined with monotonically decreasing weights (monotonically decreasing robustness training). With experimental demonstrations, our presented strategy maintains performance on small perturbations and the drawdown risk on large perturbations is alleviated to a great extent. It is also noteworthy that our training method achieves higher model accuracy than the original training methods, which means that our presented training strategy gives more balanced consideration to robustness and accuracy.
Federated learning (FL) is a novel technique in deep learning that enables clients to collaboratively train a shared model while retaining their decentralized data. However, researchers working on FL face several unique challenges, especially in the context of heterogeneity. Heterogeneity in data distributions, computational capabilities, and scenarios among clients necessitates the development of customized models and objectives in FL. Unfortunately, existing works such as FedAvg may not effectively accommodate the specific needs of each client. To address the challenges arising from heterogeneity in FL, we provide an overview of the heterogeneities in data, model, and objective (DMO). Furthermore, we propose a novel framework called federated mutual learning (FML), which enables each client to train a personalized model that accounts for the data heterogeneity (DH). A “meme model” serves as an intermediary between the personalized and global models to address model heterogeneity (MH). We introduce a knowledge distillation technique called deep mutual learning (DML) to transfer knowledge between these two models on local data. To overcome objective heterogeneity (OH), we design a shared global model that includes only certain parts, and the personalized model is task-specific and enhanced through mutual learning with the meme model. We evaluate the performance of FML in addressing DMO heterogeneities through experiments and compare it with other commonly used FL methods in similar scenarios. The results demonstrate that FML outperforms other methods and effectively addresses the DMO challenges encountered in the FL setting.
Cross-modal retrieval tries to achieve mutual retrieval between modalities by establishing consistent alignment for different modal data. Currently, many cross-modal retrieval methods have been proposed and have achieved excellent results; however, these are trained with clean cross-modal pairs, which are semantically matched but costly, compared with easily available data with noise alignment (i.e., paired but mismatched in semantics). When training these methods with noise-aligned data, the performance degrades dramatically. Therefore, we propose a robust cross-modal retrieval with alignment refurbishment (RCAR), which significantly reduces the impact of noise on the model. Specifically, RCAR first conducts multi-task learning to slow down the overfitting to the noise to make data separable. Then, RCAR uses a two-component beta-mixture model to divide them into clean and noise alignments and refurbishes the label according to the posterior probability of the noise-alignment component. In addition, we define partial and complete noises in the noise-alignment paradigm. Experimental results show that, compared with the popular cross-modal retrieval methods, RCAR achieves more robust performance with both types of noise.
Traditional Chinese medicine (TCM) is an interesting research topic in China’s thousands of years of history. With the recent advances in artificial intelligence technology, some researchers have started to focus on learning the TCM prescriptions in a data-driven manner. This involves appropriately recommending a set of herbs based on patients’ symptoms. Most existing herb recommendation models disregard TCM domain knowledge, for example, the interactions between symptoms and herbs and the TCM-informed observations (i.e., TCM formulation of prescriptions). In this paper, we propose a knowledge-guided and TCM-informed approach for herb recommendation. The knowledge used includes path interactions and co-occurrence relationships among symptoms and herbs from a knowledge graph generated from TCM literature and prescriptions. The aforementioned knowledge is used to obtain the discriminative feature vectors of symptoms and herbs via a graph attention network. To increase the ability of herb prediction for the given symptoms, we introduce TCM-informed observations in the prediction layer. We apply our proposed model on a TCM prescription dataset, demonstrating significant improvements over state-of-the-art herb recommendation methods.
To balance the inference speed and detection accuracy of a grasp detection algorithm, which are both important for robot grasping tasks, we propose an encoder–decoder structured pixel-level grasp detection neural network named the attention-based efficient robot grasp detection network (AE-GDN). Three spatial attention modules are introduced in the encoder stages to enhance the detailed information, and three channel attention modules are introduced in the decoder stages to extract more semantic information. Several lightweight and efficient DenseBlocks are used to connect the encoder and decoder paths to improve the feature modeling capability of AE-GDN. A high intersection over union (IoU) value between the predicted grasp rectangle and the ground truth does not necessarily mean a high-quality grasp configuration, but might cause a collision. This is because traditional IoU loss calculation methods treat the center part of the predicted rectangle as having the same importance as the area around the grippers. We design a new IoU loss calculation method based on an hourglass box matching mechanism, which will create good correspondence between high IoUs and high-quality grasp configurations. AE-GDN achieves the accuracy of 98.9% and 96.6% on the Cornell and Jacquard datasets, respectively. The inference speed reaches 43.5 frames per second with only about 1.2 × 106 parameters. The proposed AE-GDN has also been deployed on a practical robotic arm grasping system and performs grasping well. Codes are available at https://github.com/robvincen/robot_gradet.
This paper introduces a novel framework, i.e., RFPose-OT, to enable three-dimensional (3D) human pose estimation from radio frequency (RF) signals. Different from existing methods that predict human poses from RF signals at the signal level directly, we consider the structure difference between the RF signals and the human poses, propose a transformation of the RF signals to the pose domain at the feature level based on the optimal transport (OT) theory, and generate human poses from the transformed features. To evaluate RFPose-OT, we build a radio system and a multi-view camera system to acquire the RF signal data and the ground-truth human poses. The experimental results in a basic indoor environment, an occlusion indoor environment, and an outdoor environment demonstrate that RFPose-OT can predict 3D human poses with higher precision than state-of-the-art methods.
Time delay and coupling strength are important factors that affect the synchronization of neural networks. In this study, a modular neural network containing subnetworks of different scales was constructed using the Hodgkin–Huxley (HH) neural model; i.e., a small-scale random network was unidirectionally connected to a large-scale small-world network through chemical synapses. Time delays were found to induce multiple synchronization transitions in the network. An increase in coupling strength also promoted synchronization of the network when the time delay was an integer multiple of the firing period of a single neuron. Considering that time delays at different locations in a modular network may have different effects, we explored the influence of time delays within each subnetwork and between two subnetworks on the synchronization of modular networks. We found that when the subnetworks were well synchronized internally, an increase in the time delay within both subnetworks induced multiple synchronization transitions of their own. In addition, the synchronization state of the small-scale network affected the synchronization of the large-scale network. It was surprising to find that an increase in the time delay between the two subnetworks caused the synchronization factor of the modular network to vary periodically, but it had essentially no effect on the synchronization within the receiving subnetwork. By analyzing the phase difference between the two subnetworks, we found that the mechanism of the periodic variation of the synchronization factor of the modular network was the periodic variation of the phase difference. Finally, the generality of the results was demonstrated by investigating modular networks at different scales.
BeiDou-3 navigation satellite system was officially opened in 2020. While bringing high-performance services to people around the world, the navigation system requires well-designed BeiDou antennas. In this paper, we propose a wideband circularly polarized high-performance BeiDou antenna. The antenna realizes wideband circularly polarized radiation through a four-port sequential feed network, and the phase imbalance of the feed network from 1.05 to 1.80 GHz is less than 7°. The manufactured antenna demonstrates a return loss of more than 13 dB and an axial ratio <3 dB over the entire global navigation satellite system (GNSS) frequency band. The right-handed circular polarization (RHCP) gain of the proposed antenna is greater than 4 dB in the GNSS low-frequency band and can reach more than 7.1 dB in the high-frequency band. Dimension of the proposed antenna is 120 mm×120 mm×20 mm, i.e., 0.54λo×0.54λo×0.09λo, where λo is the wavelength of the center frequency. The proposed antenna connected to a GNSS receiver has tracked 12 BeiDou satellites with C/N0 ratios of GNSS signals greater than 30 dB. Such a high-performance antenna provides a basis for high-quality positioning services.
The deformability and high degree of freedom of mollusks bring challenges in mathematical modeling and synthesis of motions. Traditional analytical and statistical models are limited by either rigid skeleton assumptions or model capacity, and have difficulty in generating realistic and multi-pattern mollusk motions. In this work, we present a large-scale dynamic pose dataset of Drosophila larvae and propose a motion synthesis model named Path2Pose to generate a pose sequence given the initial poses and the subsequent guiding path. The Path2Pose model is further used to synthesize long pose sequences of various motion patterns through a recursive generation method. Evaluation analysis results demonstrate that our novel model synthesizes highly realistic mollusk motions and achieves state-of-the-art performance. Our work proves high performance of deep neural networks for mollusk motion synthesis and the feasibility of long pose sequence synthesis based on the customized body shape and guiding path.