Diffusion models, a family of generative models based on deep learning, have become increasingly prominent in cutting-edge machine learning research. With distinguished performance in generating samples that resemble the observed data, diffusion models are widely used in image, video, and text synthesis nowadays. In recent years, the concept of diffusion has been extended to time-series applications, and many powerful models have been developed. Considering the deficiency of a methodical summary and discourse on these models, we provide this survey as an elementary resource for new researchers in this area and to provide inspiration to motivate future research. For better understanding, we include an introduction about the basics of diffusion models. Except for this, we primarily focus on diffusion-based methods for time-series forecasting, imputation, and generation, and present them, separately, in three individual sections. We also compare different methods for the same application and highlight their connections if applicable. Finally, we conclude with the common limitation of diffusion-based methods and highlight potential future research directions.
Prompt learning has attracted broad attention in computer vision since the large pre-trained visionlanguage models (VLMs) exploded. Based on the close relationship between vision and language information built by VLM, prompt learning becomes a crucial technique in many important applications such as artificial intelligence generated content (AIGC). In this survey, we provide a progressive and comprehensive review of visual prompt learning as related to AIGC. We begin by introducing VLM, the foundation of visual prompt learning. Then, we review the vision prompt learning methods and prompt-guided generative models, and discuss how to improve the efficiency of adapting AIGC models to specific downstream tasks. Finally, we provide some promising research directions concerning prompt learning.
As one of the most fundamental topics in reinforcement learning (RL), sample efficiency is essential to the deployment of deep RL algorithms. Unlike most existing exploration methods that sample an action from different types of posterior distributions, we focus on the policy sampling process and propose an efficient selective sampling approach to improve sample efficiency by modeling the internal hierarchy of the environment. Specifically, we first employ clustering methods in the policy sampling process to generate an action candidate set. Then we introduce a clustering buffer for modeling the internal hierarchy, which consists of on-policy data, off-policy data, and expert data to evaluate actions from the clusters in the action candidate set in the exploration stage. In this way, our approach is able to take advantage of the supervision information in the expert demonstration data. Experiments on six different continuous locomotion environments demonstrate superior reinforcement learning performance and faster convergence of selective sampling. In particular, on the LGSVL task, our method can reduce the number of convergence steps by 46.7% and the convergence time by 28.5%. Furthermore, our code is open-source for reproducibility. The code is available at https://github.com/Shihwin/SelectiveSampling.
Text generation is an essential research area in artificial intelligence (AI) technology and natural language processing and provides key technical support for the rapid development of AI-generated content (AIGC). It is based on technologies such as natural language processing, machine learning, and deep learning, which enable learning language rules through training models to automatically generate text that meets grammatical and semantic requirements. In this paper, we sort and systematically summarize the main research progress in text generation and review recent text generation papers, focusing on presenting a detailed understanding of the technical models. In addition, several typical text generation application systems are presented. Finally, we address some challenges and future directions in AI text generation. We conclude that improving the quality, quantity, interactivity, and adaptability of generated text can help fundamentally advance AI text generation development.
Convolutional neural networks (CNNs) have been developed quickly in many real-world fields. However, CNN’s performance depends heavily on its hyperparameters, while finding suitable hyperparameters for CNNs working in application fields is challenging for three reasons: (1) the problem of mixed-variable encoding for different types of hyperparameters in CNNs, (2) expensive computational costs in evaluating candidate hyperparameter configuration, and (3) the problem of ensuring convergence rates and model performance during hyperparameter search. To overcome these problems and challenges, a hybrid-model optimization algorithm is proposed in this paper to search suitable hyperparameter configurations automatically based on the Gaussian process and particle swarm optimization (GPPSO) algorithm. First, a new encoding method is designed to efficiently deal with the CNN hyperparameter mixed-variable problem. Second, a hybrid-surrogate-assisted model is proposed to reduce the high cost of evaluating candidate hyperparameter configurations. Third, a novel activation function is suggested to improve the model performance and ensure the convergence rate. Intensive experiments are performed on image-classification benchmark datasets to demonstrate the superior performance of GPPSO over state-of-the-art methods. Moreover, a case study on metal fracture diagnosis is carried out to evaluate the GPPSO algorithm performance in practical applications. Experimental results demonstrate the effectiveness and efficiency of GPPSO, achieving accuracy of 95.26% and 76.36% only through 0.04 and 1.70 GPU days on the CIFAR-10 and CIFAR-100 datasets, respectively.
Scholars are expected to continue enhancing the depth and breadth of theoretical research on reconfigurable intelligent surface (RIS) to provide a higher theoretical limit for RIS engineering applications. Notably, significant advancements have been achieved through both academic research breakthroughs and the promotion of engineering applications and industrialization. We provide an overview of RIS engineering applications, focusing primarily on their typical features, classifications, and deployment scenarios. Furthermore, we systematically and comprehensively analyze the challenges faced by RIS and propose potential solutions including addressing the beamforming issues through cascade channel decoupling, tackling the effects and resolutions of regulatory constraints on RIS, exploring the network-controlled mode for RIS system architecture, examining integrated channel regulation and information modulation, and investigating the use of the true time delay (TTD) mechanism for RIS. In addition, two key technical points, RIS-assisted non-orthogonal multiple access (NOMA) and RIS-based transmitter, are reviewed from the perspective of completeness. Finally, we discuss future trends and challenges in this field.
Reconfigurable intelligent surface (RIS) is widely accepted as a potential technology to assist in communication between base stations (BSs) and users in edge areas. We study the energy efficiency of a RIS-assisted multi-cell communication system with a realistic RIS power consumption model. With the goal of maximizing the energy efficiency of the system, we optimize the transmit beamforming vectors at the BS and the RIS phase shift matrix by a proposed alternative optimization algorithm. First, the transmit beamforming vector is optimized by solving the transformed weighted minimum mean square error (WMMSE) problem. Subsequently, to solve the inconvenience incurred by the discrete relationship between the RIS reflecting unit power consumption and its discrete phase shift, we use a continuous function to approximate their relationship. With this approximation, we can use the majorization minimization (MM) technique to optimize the continuous RIS phase shifts, and then quantize the obtained phase shifts to discrete ones. Simulation results demonstrate that the energy efficiency of the system is effectively optimized by the proposed algorithm.
As a candidate technique to achieve sixth-generation wireless communication (6G), reconfigurable intelligent surface (RIS) has become popular in both academia and industry. For better exploration of the advantages of RIS, we compare the performances of RIS and network-controlled repeater (NCR) in 3GPP release-18. We first theoretically analyze the received signal power and signal-to-noise ratio performances for both RIS and NCR. Then, we simulate the reference signal received power and signal-to-interference-and-noise ratio performances at the system level for both RIS and NCR in the frequency range 1 and frequency range 2 bands. Finally, several insights on engineering applications are given based on the comparison between RIS and NCR.
Technological advancements continue to expand the communications industry’s potential. Images, which are an important component in strengthening communication, are widely available. Therefore, image quality assessment (IQA) is critical in improving content delivered to end users. Convolutional neural networks (CNNs) used in IQA face two common challenges. One issue is that these methods fail to provide the best representation of the image. The other issue is that the models have a large number of parameters, which easily leads to overfitting. To address these issues, the dense convolution network (DSC-Net), a deep learning model with fewer parameters, is proposed for no-reference image quality assessment (NR-IQA). Moreover, it is obvious that the use of multimodal data for deep learning has improved the performance of applications. As a result, multimodal dense convolution network (MDSC-Net) fuses the texture features extracted using the gray-level co-occurrence matrix (GLCM) method and spatial features extracted using DSC-Net and predicts the image quality. The performance of the proposed framework on the benchmark synthetic datasets LIVE, TID2013, and KADID-10k demonstrates that the MDSC-Net approach achieves good performance over state-of-the-art methods for the NR-IQA task.
The rise of artificial intelligence generated content (AIGC) has been remarkable in the language and image fields, but artificial intelligence (AI) generated three-dimensional (3D) models are still under-explored due to their complex nature and lack of training data. The conventional approach of creating 3D content through computer-aided design (CAD) is labor-intensive and requires expertise, making it challenging for novice users. To address this issue, we propose a sketch-based 3D modeling approach, Deep3DSketch-im, which uses a single freehand sketch for modeling. This is a challenging task due to the sparsity and ambiguity. Deep3DSketch-im uses a novel data representation called the signed distance field (SDF) to improve the sketch-to-3D model process by incorporating an implicit continuous field instead of voxel or points, and a specially designed neural network that can capture point and local features. Extensive experiments are conducted to demonstrate the effectiveness of the approach, achieving state-of-the-art (SOTA) performance on both synthetic and real datasets. Additionally, users show more satisfaction with results generated by Deep3DSketch-im, as reported in a user study. We believe that Deep3DSketch-im has the potential to revolutionize the process of 3D modeling by providing an intuitive and easy-to-use solution for novice users.
Magnetically driven microrobots hold great potential to perform specific tasks more locally and less invasively in the human body. To reach the lesion area in vivo, microrobots should usually be navigated in flowing blood, which is much more complex than static liquid. Therefore, it is more challenging to design a corresponding precise control scheme. A considerable amount of work has been done regarding control of magnetic microrobots in a flow and the corresponding theories. In this paper, we review and summarize the state-of-the-art research progress concerning magnetic microrobots in blood flow, including the establishment of flow systems, dynamics modeling of motion, and control methods. In addition, current challenges and limitations are discussed. We hope this work can shed light on the efficient control of microrobots in complex flow environments and accelerate the study of microrobots for clinical use.
Complex beams play important roles in wireless communications, radar, and satellites, and have attracted great interest in recent years. In light of this background, we present a fast and efficient approach to realize complex beams by using semidefinite relaxation (SDR) optimization and amplitude-phase digital coding metasurfaces. As the application examples of this approach, complex beam patterns with cosecant, flat-top, and double shapes are designed and verified using full-wave simulations and experimental measurements. The results show excellent main lobes and low-level side lobes and demonstrate the effectiveness of the approach. Compared with previous works, this approach can solve the complex beam-forming problem more rapidly and effectively. Therefore, the approach will be of great significance in the design of beam-forming systems in wireless applications.
Reconfigurable intelligent surfaces (RISs) have the capability to change the wireless environment smartly. Considering the attenuation of subchannels and crowding users involved in the wideband system, we introduce RISs into the multi-user multi-input single-output (MU-MISO) system with orthogonal frequency division multiplexing (OFDM) for performance enhancement. Maximizing the minimum rate of dense users in an MU-MISO-OFDM system assisted by RIS with an approximate practical model is formulated as the joint optimization problem involving subcarrier allocation, transmit precoding (TPC) matrices at the base station, and RIS passive beamforming. A coalition-game subcarrier allocation (CSA) algorithm is proposed to solve space–frequency resource allocation on subcarriers, which reforms the interference topology among dense users. Fractional programming and convex optimization method are used to optimize the TPC matrices and the RIS passive beamforming, which improves the spectral efficiency synthetically across all subchannels in the wideband system. Simulation results indicate that the CSA algorithm provides a significant gain for dense users. Besides, the proposed joint optimization method shows the considerable advantage of the RISs in the MU-MISO-OFDM system.
Heavy-duty diesel vehicles are important sources of urban nitrogen oxides (NOx) in actual applications for environmental compliance, emitting more than 80% of NOx and more than 90% of particulate matter (PM) in total vehicle emissions. The detection and control of heavy-duty diesel emissions are critical for protecting public health. Currently, vehicles on the road must be regularly tested, every six months or once a year, to filter out high-emission mobile sources at vehicle inspection stations. However, it is difficult to effectively screen high-emission vehicles in time with a long interval between annual inspections, and the fixed threshold cannot adapt to the dynamic changes of vehicle driving conditions. An on-board diagnostic device (OBD) is installed inside the vehicle and can record the vehicle’s emission data in real time. In this paper, we propose a temporal optimization long short-term memory (LSTM) and adaptive dynamic threshold approach to identify heavy-duty high-emitters by using OBD data, which can continuously track and record the emission status in real time. First, a temporal optimization LSTM emission prediction model is established to solve the attention bias discrepancy problem on time steps that is caused by the large number of OBD data streams in practice. Then, the concentration prediction error sequence is detected and distinguished from the anomalous emission contexts using flexible criteria, calculated by an adaptive dynamic threshold with changing driving conditions. Finally, a similarity metric strategy for the time series is introduced to correct some pseudo anomalous results. Experiments on three real OBD time-series emission datasets demonstrate that our method can achieve high accuracy anomalous emission identification.
Simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) have been attracting significant attention in both academia and industry for their advantages of achieving 360◦ coverage and enhanced degrees-of-freedom. This article first identifies the fundamentals of STAR-RIS, by discussing the hardware models, channel models, and signal models. Then, three representative categorizing approaches for STAR-RISs are introduced from the phase-shift, directional, and energy consumption perspectives. Furthermore, the beamforming design of STAR-RISs is investigated for both independent and coupled phase-shift cases. As a recent advance, a general optimization framework, which has high compatibility and provable optimality regardless of the application scenarios, is proposed. As a further advance, several promising applications are discussed to demonstrate the potential benefits of applying STAR-RISs in sixth-generation wireless communication. Lastly, a few future directions and research opportunities are highlighted.
This paper investigates the state-tracking control problem in conversion mode of a tilt-rotor aircraft with a switching modeling method and a smooth interpolation technique. Based on the nonlinear model of the conversion mode, a switched linear model is developed by using the Jacobian linearization method and designing the switching signal based on the mast angle. Furthermore, an H∞ state-tracking control scheme is designed to deal with the conversion mode control issue. Moreover, instead of limiting the amplitude of control inputs, a smooth interpolation method is developed to create bumpless performance. Finally, the XV-15 tilt-rotor aircraft is chosen as a prototype to illustrate the effectiveness of this developed control method.
Physical layer key generation (PKG) technology leverages reciprocal channel randomness to generate shared secret keys. However, multipath fading at the receiver may degrade the correlation between legitimate uplink and downlink channels, resulting in a low key generation rate (KGR). In this paper, we propose a PKG scheme based on the pattern-reconfigurable antenna (PRA) to boost the secret key capacity. First, we propose a reconfigurable intelligent surface (RIS) based PRA architecture with the capability of flexible and reconfigurable antenna patterns. Then, we present the PRA-based PKG protocol to improve the KGR via mitigation of the effects of multipath fading. Specifically, a novel algorithm for estimation of the multipath channel parameters is proposed based on atomic norm minimization. Thereafter, a novel optimization method for the matching reception of multipath signals is formulated based on the improved binary particle swarm optimization (BPSO) algorithm. Finally, simulation results show that the proposed scheme can resist multipath fading and achieve a high KGR compared to existing schemes. Moreover, our findings indicate that the increased degree of freedom of the antenna patterns can significantly increase the secret key capacity.
Diffusion models are effective purification methods, where the noises or adversarial attacks are removed using generative approaches before pre-existing classifiers conducting classification tasks. However, the efficiency of diffusion models is still a concern, and existing solutions are based on knowledge distillation which can jeopardize the generation quality because of the small number of generation steps. Hence, we propose TendiffPure as a tensorized and compressed diffusion model for purification. Unlike the knowledge distillation methods, we directly compress U-Nets as backbones of diffusion models using tensor-train decomposition, which reduces the number of parameters and captures more spatial information in multi-dimensional data such as images. The space complexity is reduced from O(N2) to O(NR2) with R ≤ 4 as the tensor-train rank and N as the number of channels. Experimental results show that TendiffPure can more efficiently obtain high-quality purification results and outperforms the baseline purification methods on CIFAR-10, Fashion-MNIST, and MNIST datasets for two noises and one adversarial attack.
Artificial intelligence generated content (AIGC) has emerged as an indispensable tool for producing large-scale content in various forms, such as images, thanks to the significant role that AI plays in imitation and production. However, interpretability and controllability remain challenges. Existing AI methods often face challenges in producing images that are both flexible and controllable while considering causal relationships within the images. To address this issue, we have developed a novel method for causal controllable image generation (CCIG) that combines causal representation learning with bi-directional generative adversarial networks (GANs). This approach enables humans to control image attributes while considering the rationality and interpretability of the generated images and also allows for the generation of counterfactual images. The key of our approach, CCIG, lies in the use of a causal structure learning module to learn the causal relationships between image attributes and joint optimization with the encoder, generator, and joint discriminator in the image generation module. By doing so, we can learn causal representations in image’s latent space and use causal intervention operations to control image generation. We conduct extensive experiments on a real-world dataset, CelebA. The experimental results illustrate the effectiveness of CCIG.
Recently, various algorithms have been developed for generating appealing music. However, the style control in the generation process has been somewhat overlooked. Music style refers to the representative and unique appearance presented by a musical work, and it is one of the most salient qualities of music. In this paper, we propose an innovative music generation algorithm capable of creating a complete musical composition from scratch based on a specified target style. A style-conditioned linear Transformer and a style-conditioned patch discriminator are introduced in the model. The style-conditioned linear Transformer models musical instrument digital interface (MIDI) event sequences and emphasizes the role of style information. Simultaneously, the style-conditioned patch discriminator applies an adversarial learning mechanism with two innovative loss functions to enhance the modeling of music sequences. Moreover, we establish a discriminative metric for the first time, enabling the evaluation of the generated music’s consistency concerning music styles. Both objective and subjective evaluations of our experimental results indicate that our method’s performance with regard to music production is better than the performances encountered in the case of music production with the use of state-of-the-art methods in available public datasets.
This research investigates the digital-to-analog converter (DAC) free architecture for the digital reconfigurable intelligent surface (RIS) system, where the transmission lines are implemented for reflection coefficient (RC) control to reduce power consumption. In the proposed architecture, the radio frequency (RF) switch based phase shifter is considered. By using a single-pole four-throw (SP4T) switch to simultaneously control the RCs of a group of elements, a 2-bit phase shifter is realized for passive beam steering. A novel modulation scheme is developed to explore the cost effectiveness, which approaches the performance of traditional quadrature amplitude modulation (QAM). Specifically, to overcome the limitation of the phase shift bits, joint frequency-shift and phase-rotation operations are applied to the constellation points. The simulation and experimental results demonstrate that the proposed architecture is capable of providing an ideal transmission performance. Moreover, 64- and 256-QAM modulation schemes could be implemented by expanding the elements and phase bits.
The wavefront control of spin or orbital angular momentum (OAM) is widely applied in the optical and radio fields. However, most passive metasurfaces provide limited manipulations, such as the spin-locked wavefront, a static OAM combination, or an uncontrollable OAM energy distribution. We propose a reflection-type multi-feed metasurface to independently generate multi-mode OAM beams with dynamically switchable OAM combinations and spin states, while simultaneously, the energy distribution of carrying OAM modes is controllable. Specifically, four elements are proposed to overcome the spin-locked phase limitation by combining propagation and geometric phases. The robustness of these elements is analyzed. By involving the amplitude term and multi-feed technology in the design process, the proposed metasurface can generate OAM beams with a controllable energy distribution over modes and switchable mode combinations. OAM-based radio communication with four independent channels is experimentally demonstrated at 14 GHz by employing a pair of the proposed metasurfaces. The powers of different channels are adjustable by the provided amplitude term, and the maximum crosstalk is −9 dB, proving the effectiveness and practicability of the proposed method.
Unsupervised domain adaptation enables neural networks to transfer from a labeled source domain to an unlabeled target domain by learning domain-invariant representations. Recent approaches achieve this by directly matching the marginal distributions of these two domains. Most of them, however, ignore exploration of the dynamic trade-off between domain alignment and semantic discrimination learning, thus rendering them susceptible to the problems of negative transfer and outlier samples. To address these issues, we introduce the dynamic parameterized learning framework. First, by exploring domain-level semantic knowledge, the dynamic alignment parameter is proposed, to adaptively adjust the optimization steps of domain alignment and semantic discrimination learning. Besides, for obtaining semantic-discriminative and domain-invariant representations, we propose to align training trajectories on both source and target domains. Comprehensive experiments are conducted to validate the effectiveness of the proposed methods, and extensive comparisons are conducted on seven datasets of three visual tasks to demonstrate their practicability.
The performance of existing maneuvering target tracking methods for highly maneuvering targets in cluttered environments is unsatisfactory. This paper proposes a hybrid-driven approach for tracking multiple highly maneuvering targets, leveraging the advantages of both data-driven and model-based algorithms. The time-varying constant velocity model is integrated into the Gaussian process (GP) of online learning to improve the performance of GP prediction. This integration is further combined with a generalized probabilistic data association algorithm to realize multi-target tracking. Through the simulations, it has been demonstrated that the hybrid-driven approach exhibits significant performance improvements in comparison with widely used algorithms such as the interactive multi-model method and the data-driven GP motion tracker.
Due to the openness of the wireless propagation environment, wireless networks are highly susceptible to malicious jamming, which significantly impacts their legitimate communication performance. This study investigates a reconfigurable intelligent surface (RIS) assisted anti-jamming communication system. Specifically, the objective is to enhance the system’s anti-jamming performance by optimizing the transmitting power of the base station and the passive beamforming of the RIS. Taking into account the dynamic and unpredictable nature of a smart jammer, the problem of joint optimization of transmitting power and RIS reflection coefficients is modeled as a Markov decision process (MDP). To tackle the complex and coupled decision problem, we propose a learning framework based on the double deep Q-network (DDQN) to improve the system achievable rate and energy efficiency. Unlike most power-domain jamming mitigation methods that require information on the jamming power, the proposed DDQN algorithm is better able to adapt to dynamic and unknown environments without relying on the prior information about jamming power. Finally, simulation results demonstrate that the proposed algorithm outperforms multi-armed bandit (MAB) and deep Q-network (DQN) schemes in terms of the anti-jamming performance and energy efficiency.