Nowadays, the management of resource contention in shared cloud remains a pending problem. The evolution and deployment of new application paradigms (e.g., deep learning training and microservices) and custom hardware (e.g., graphics processing unit (GPU) and tensor processing unit (TPU)) have posed new challenges in resource management system design. Current solutions tend to trade cluster efficiency for guaranteed application performance, e.g., resource over-allocation, leaving a lot of resources underutilized. Overcoming this dilemma is not easy, because different components across the software stack are involved. Nevertheless, massive efforts have been devoted to seeking effective performance isolation and highly efficient resource scheduling. The goal of this paper is to systematically cover related aspects to deliver the techniques from the coordination perspective, and to identify the corresponding trends they indicate. Briefly, four topics are involved. First, isolation mechanisms deployed at different levels (micro-architecture, system, and virtualization levels) are reviewed, including GPU multitasking methods. Second, resource scheduling techniques within an individual machine and at the cluster level are investigated, respectively. Particularly, GPU scheduling for deep learning applications is described in detail. Third, adaptive resource management including the latest microservice-related research is thoroughly explored. Finally, future research directions are discussed in the light of advanced work. We hope that this review paper will help researchers establish a global view of the landscape of resource management techniques in shared cloud, and see technology trends more clearly.
With the continuous improvement of supercomputer performance and the integration of artificial intelligence with traditional scientific computing, the scale of applications is gradually increasing, from millions to tens of millions of computing cores, which raises great challenges to achieve high scalability and efficiency of parallel applications on super-large-scale systems. Taking the Sunway exascale prototype system as an example, in this paper we first analyze the challenges of high scalability and high efficiency for parallel applications in the exascale era. To overcome these challenges, the optimization technologies used in the parallel supporting environment software on the Sunway exascale prototype system are highlighted, including the parallel operating system, input/output (I/O) optimization technology, ultra-large-scale parallel debugging technology, 10-million-core parallel algorithm, and mixed-precision method. Parallel operating systems and I/O optimization technology mainly support large-scale system scaling, while the ultra-large-scale parallel debugging technology, 10-million-core parallel algorithm, and mixed-precision method mainly enhance the efficiency of large-scale applications. Finally, the contributions to various applications running on the Sunway exascale prototype system are introduced, verifying the effectiveness of the parallel supporting environment design.
Deep learning provides an effective way for automatic classification of cardiac arrhythmias, but in clinical decision-making, pure data-driven methods working as black-boxes may lead to unsatisfactory results. A promising solution is combining domain knowledge with deep learning. This paper develops a flexible and extensible framework for integrating domain knowledge with a deep neural network. The model consists of a deep neural network to capture the statistical pattern between input data and the ground-truth label, and a knowledge module to guarantee consistency with the domain knowledge. These two components are trained interactively to bring the best of both worlds. The experiments show that the domain knowledge is valuable in refining the neural network prediction and thus improves accuracy.
Session-based recommendation aims to predict the next item based on a user’s limited interactions within a short period. Existing approaches use mainly recurrent neural networks (RNNs) or graph neural networks (GNNs) to model the sequential patterns or the transition relationships between items. However, such models either ignore the over-smoothing issue of GNNs, or directly use cross-entropy loss with a softmax layer for model optimization, which easily results in the over-fitting problem. To tackle the above issues, we propose a self-supervised graph learning with target-adaptive masking (SGL-TM) method. Specifically, we first construct a global graph based on all involved sessions and subsequently capture the self-supervised signals from the global connections between items, which helps supervise the model in generating accurate representations of items in the ongoing session. After that, we calculate the main supervised loss by comparing the ground truth with the predicted scores of items adjusted by our designed target-adaptive masking module. Finally, we combine the main supervised component with the auxiliary self-supervision module to obtain the final loss for optimizing the model parameters. Extensive experimental results from two benchmark datasets, Gowalla and Diginetica, indicate that SGL-TM can outperform state-of-the-art baselines in terms of Recall@20 and MRR@20, especially in short sessions.
Image secret sharing (ISS) is gaining popularity due to the importance of digital images and its wide application to cloud-based distributed storage and multiparty secure computing. Shadow image authentication generally includes shadow image detection and identification, and plays an important role in ISS. However, traditional dealer-participatory methods, which suffer from significant pixel expansion or storing auxiliary information, authenticate the shadow image mainly during the decoding phase, also known as unidirectional authentication. The authentication of the shadow image in the distributing (encoding) phase is also important for the participant. In this study, we introduce a public key based bidirectional shadow image authentication method in ISS without pixel expansion for a (k,n) threshold. When the dealer distributes each shadow image to a corresponding participant, the participant can authenticate the received shadow image with his/her private key. In the decoding phase, the dealer can authenticate each received shadow image with a secret key; in addition, the dealer can losslessly decode the secret image with any k or more shadow images. The proposed method is validated using theoretical analyses, illustrations, and comparisons.
As a wearable robot, an exoskeleton provides a direct transfer of mechanical power to assist or augment the wearer’s movement with an anthropomorphic configuration. When an exoskeleton is used to facilitate the wearer’s movement, a motion generation process often plays an important role in high-level control. One of the main challenges in this area is to generate in real time a reference trajectory that is parallel with human intention and can adapt to different situations. In this paper, we first describe a novel motion modeling method based on probabilistic movement primitive (ProMP) for a lower limb exoskeleton, which is a new and powerful representative tool for generating motion trajectories. To adapt the trajectory to different situations when the exoskeleton is used by different wearers, we propose a novel motion learning scheme based on black-box optimization (BBO) PIBB combined with ProMP. The motion model is first learned by ProMP offline, which can generate reference trajectories for use by exoskeleton controllers online. PIBB is adopted to learn and update the model for online trajectory generation, which provides the capability of adaptation of the system and eliminates the effects of uncertainties. Simulations and experiments involving six subjects using the lower limb exoskeleton HEXO demonstrate the effectiveness of the proposed methods.
The recent progress in multi-agent deep reinforcement learning (MADRL) makes it more practical in real-world tasks, but its relatively poor scalability and the partially observable constraint raise more challenges for its performance and deployment. Based on our intuitive observation that human society could be regarded as a large-scale partially observable environment, where everyone has the functions of communicating with neighbors and remembering his/her own experience, we propose a novel network structure called the hierarchical graph recurrent network (HGRN) for multi-agent cooperation under partial observability. Specifically, we construct the multi-agent system as a graph, use a novel graph convolution structure to achieve communication between heterogeneous neighboring agents, and adopt a recurrent unit to enable agents to record historical information. To encourage exploration and improve robustness, we design a maximum-entropy learning method that can learn stochastic policies of a configurable target action entropy. Based on the above technologies, we propose a value-based MADRL algorithm called Soft-HGRN and its actor-critic variant called SAC-HGRN. Experimental results based on three homogeneous tasks and one heterogeneous environment not only show that our approach achieves clear improvements compared with four MADRL baselines, but also demonstrate the interpretability, scalability, and transferability of the proposed model.
Ensuring the safety of pedestrians is essential and challenging when autonomous vehicles are involved. Classical pedestrian avoidance strategies cannot handle uncertainty, and learning-based methods lack performance guarantees. In this paper we propose a hybrid reinforcement learning (HRL) approach for autonomous vehicles to safely interact with pedestrians behaving uncertainly. The method integrates the rule-based strategy and reinforcement learning strategy. The confidence of both strategies is evaluated using the data recorded in the training process. Then we design an activation function to select the final policy with higher confidence. In this way, we can guarantee that the final policy performance is not worse than that of the rule-based policy. To demonstrate the effectiveness of the proposed method, we validate it in simulation using an accelerated testing technique to generate stochastic pedestrians. The results indicate that it increases the success rate for pedestrian avoidance to 98.8%, compared with 94.4% of the baseline method.
Inadequate geometric accuracy of cameras is the main constraint to improving the precision of infrared horizon sensors with a large field of view (FOV). An enormous FOV with a blind area in the center greatly limits the accuracy and feasibility of traditional geometric calibration methods. A novel camera calibration method for infrared horizon sensors is presented and validated in this paper. Three infrared targets are used as control points. The camera is mounted on a rotary table. As the table rotates, these control points will be evenly distributed in the entire FOV. Compared with traditional methods that combine a collimator and a rotary table which cannot effectively cover a large FOV and require harsh experimental equipment, this method is easier to implement at a low cost. A corresponding three-step parameter estimation algorithm is proposed to avoid precisely measuring the positions of the camera and the control points. Experiments are implemented with 10 infrared horizon sensors to verify the effectiveness of the calibration method. The results show that the proposed method is highly stable, and that the calibration accuracy is at least 30% higher than those of existing methods.
The rapid development of communications industry has spawned more new services and applications. The sixth-generation wireless communication system (6G) network is faced with more stringent and diverse requirements. While ensuring performance requirements, such as high data rate and low latency, the problem of high energy consumption in the fifth-generation wireless communication system (5G) network has also become one of the problems to be solved in 6G. The wide-area coverage signaling cell technology conforms to the future development trend of radio access networks, and has the advantages of reducing network energy consumption and improving resource utilization. In wide-area coverage signaling cells, on-demand multi-dimensional resource allocation is an important technical means to ensure the ultimate performance requirements of users, and its effect will affect the efficiency of network resource utilization. This paper constructs a user-centric dynamic allocation model of wireless resources, and proposes a deep Q-network based dynamic resource allocation algorithm. The algorithm can realize dynamic and flexible admission control and multi-dimensional resource allocation in wide-area coverage signaling cells according to the data rate and latency demands of users. According to the simulation results, the proposed algorithm can effectively improve the average user experience on a long time scale, and ensure network users a high data rate and low energy consumption.
In the underwater medium, the speed of sound varies with water depth, temperature, and salinity. The inhomogeneity of water leads to bending of sound rays, making the existing localization algorithms based on straight-line propagation less precise. To realize high-precision node positioning in underwater acoustic sensor networks (UASNs), a multi-layer isogradient sound speed profile (SSP) model is developed using the linear segmentation approximation approach. Then, the sound ray tracking problem is converted into a polynomial root-searching problem. Based on the derived gradient of the signal’s Doppler shift at the sensor node, a novel underwater node localization algorithm is proposed using both the time difference of arrival (TDOA) and frequency difference of arrival (FDOA). Simulations are implemented to illustrate the effectiveness of the proposed algorithm. Compared with the traditional straight-line propagation method, the proposed algorithm can effectively handle the sound ray bending phenomenon. Estimation accuracy with different SSP modeling errors is also investigated. Overall, accurate and reliable node localization can be achieved.
A power amplifier’s linearity determines the emission signal’s quality and the efficiency of the system. Nonlinear distortion can result in system bit error, out-of-band radiation, and interference with other channels, which severely influence communication system’s quality and reliability. Starting from the third-order intermodulation point of the milimeter wave (mm-Wave) power amplifiers, the circuit’s nonlinearity is compensated for. The analysis, design, and implementation of linear class AB mm-Wave power amplifiers based on GlobalFoundries 45 nm CMOS silicon-on-insulator (SOI) technology are presented. Three single-ended and differential stacked power amplifiers have been implemented based on cascode cells and triple cascode cells operating in U-band frequencies. According to nonlinear analysis and on-wafer measurements, designs based on triple cascode cells outperform those based on cascode cells. Using single-ended measurements, the differential power amplifier achieves a measured peak power-added efficiency (PAE) of 47.2% and a saturated output power (Psat) of 25.2 dBm at 44 GHz. The amplifier achieves a Psat higher than 23 dBm and a maximum PAE higher than 25% in the measured bandwidth from 44 GHz to 50 GHz.