Diffusion models, a family of generative models based on deep learning, have become increasingly prominent in cutting-edge machine learning research. With distinguished performance in generating samples that resemble the observed data, diffusion models are widely used in image, video, and text synthesis nowadays. In recent years, the concept of diffusion has been extended to time-series applications, and many powerful models have been developed. Considering the deficiency of a methodical summary and discourse on these models, we provide this survey as an elementary resource for new researchers in this area and to provide inspiration to motivate future research. For better understanding, we include an introduction about the basics of diffusion models. Except for this, we primarily focus on diffusion-based methods for time-series forecasting, imputation, and generation, and present them, separately, in three individual sections. We also compare different methods for the same application and highlight their connections if applicable. Finally, we conclude with the common limitation of diffusion-based methods and highlight potential future research directions.
Prompt learning has attracted broad attention in computer vision since the large pre-trained visionlanguage models (VLMs) exploded. Based on the close relationship between vision and language information built by VLM, prompt learning becomes a crucial technique in many important applications such as artificial intelligence generated content (AIGC). In this survey, we provide a progressive and comprehensive review of visual prompt learning as related to AIGC. We begin by introducing VLM, the foundation of visual prompt learning. Then, we review the vision prompt learning methods and prompt-guided generative models, and discuss how to improve the efficiency of adapting AIGC models to specific downstream tasks. Finally, we provide some promising research directions concerning prompt learning.
Text generation is an essential research area in artificial intelligence (AI) technology and natural language processing and provides key technical support for the rapid development of AI-generated content (AIGC). It is based on technologies such as natural language processing, machine learning, and deep learning, which enable learning language rules through training models to automatically generate text that meets grammatical and semantic requirements. In this paper, we sort and systematically summarize the main research progress in text generation and review recent text generation papers, focusing on presenting a detailed understanding of the technical models. In addition, several typical text generation application systems are presented. Finally, we address some challenges and future directions in AI text generation. We conclude that improving the quality, quantity, interactivity, and adaptability of generated text can help fundamentally advance AI text generation development.
The rise of artificial intelligence generated content (AIGC) has been remarkable in the language and image fields, but artificial intelligence (AI) generated three-dimensional (3D) models are still under-explored due to their complex nature and lack of training data. The conventional approach of creating 3D content through computer-aided design (CAD) is labor-intensive and requires expertise, making it challenging for novice users. To address this issue, we propose a sketch-based 3D modeling approach, Deep3DSketch-im, which uses a single freehand sketch for modeling. This is a challenging task due to the sparsity and ambiguity. Deep3DSketch-im uses a novel data representation called the signed distance field (SDF) to improve the sketch-to-3D model process by incorporating an implicit continuous field instead of voxel or points, and a specially designed neural network that can capture point and local features. Extensive experiments are conducted to demonstrate the effectiveness of the approach, achieving state-of-the-art (SOTA) performance on both synthetic and real datasets. Additionally, users show more satisfaction with results generated by Deep3DSketch-im, as reported in a user study. We believe that Deep3DSketch-im has the potential to revolutionize the process of 3D modeling by providing an intuitive and easy-to-use solution for novice users.
Artificial intelligence generated content (AIGC) has emerged as an indispensable tool for producing large-scale content in various forms, such as images, thanks to the significant role that AI plays in imitation and production. However, interpretability and controllability remain challenges. Existing AI methods often face challenges in producing images that are both flexible and controllable while considering causal relationships within the images. To address this issue, we have developed a novel method for causal controllable image generation (CCIG) that combines causal representation learning with bi-directional generative adversarial networks (GANs). This approach enables humans to control image attributes while considering the rationality and interpretability of the generated images and also allows for the generation of counterfactual images. The key of our approach, CCIG, lies in the use of a causal structure learning module to learn the causal relationships between image attributes and joint optimization with the encoder, generator, and joint discriminator in the image generation module. By doing so, we can learn causal representations in image’s latent space and use causal intervention operations to control image generation. We conduct extensive experiments on a real-world dataset, CelebA. The experimental results illustrate the effectiveness of CCIG.
Recently, various algorithms have been developed for generating appealing music. However, the style control in the generation process has been somewhat overlooked. Music style refers to the representative and unique appearance presented by a musical work, and it is one of the most salient qualities of music. In this paper, we propose an innovative music generation algorithm capable of creating a complete musical composition from scratch based on a specified target style. A style-conditioned linear Transformer and a style-conditioned patch discriminator are introduced in the model. The style-conditioned linear Transformer models musical instrument digital interface (MIDI) event sequences and emphasizes the role of style information. Simultaneously, the style-conditioned patch discriminator applies an adversarial learning mechanism with two innovative loss functions to enhance the modeling of music sequences. Moreover, we establish a discriminative metric for the first time, enabling the evaluation of the generated music’s consistency concerning music styles. Both objective and subjective evaluations of our experimental results indicate that our method’s performance with regard to music production is better than the performances encountered in the case of music production with the use of state-of-the-art methods in available public datasets.
Diffusion models are effective purification methods, where the noises or adversarial attacks are removed using generative approaches before pre-existing classifiers conducting classification tasks. However, the efficiency of diffusion models is still a concern, and existing solutions are based on knowledge distillation which can jeopardize the generation quality because of the small number of generation steps. Hence, we propose TendiffPure as a tensorized and compressed diffusion model for purification. Unlike the knowledge distillation methods, we directly compress U-Nets as backbones of diffusion models using tensor-train decomposition, which reduces the number of parameters and captures more spatial information in multi-dimensional data such as images. The space complexity is reduced from O(N2) to O(NR2) with R ≤ 4 as the tensor-train rank and N as the number of channels. Experimental results show that TendiffPure can more efficiently obtain high-quality purification results and outperforms the baseline purification methods on CIFAR-10, Fashion-MNIST, and MNIST datasets for two noises and one adversarial attack.
Federated learning effectively addresses issues such as data privacy by collaborating across participating devices to train global models. However, factors such as network topology and computing power of devices can affect its training or communication process in complex network environments. Computing and network convergence (CNC) of sixth-generation (6G) networks, a new network architecture and paradigm with computing-measurable, perceptible, distributable, dispatchable, and manageable capabilities, can effectively support federated learning training and improve its communication efficiency. By guiding the participating devices’ training in federated learning based on business requirements, resource load, network conditions, and computing power of devices, CNC can reach this goal. In this paper, to improve the communication efficiency of federated learning in complex networks, we study the communication efficiency optimization methods of federated learning for CNC of 6G networks that give decisions on the training process for different network conditions and computing power of participating devices. The simulations address two architectures that exist for devices in federated learning and arrange devices to participate in training based on arithmetic power while achieving optimization of communication efficiency in the process of transferring model parameters. The results show that the methods we proposed can cope well with complex network situations, effectively balance the delay distribution of participating devices for local training, improve the communication efficiency during the transfer of model parameters, and improve the resource utilization in the network.
Fueled by the explosive growth of ultra-low-latency and real-time applications with specific computing and network performance requirements, the computing force network (CFN) has become a hot research subject. The primary CFN challenge is to leverage network resources and computing resources. Although recent advances in deep reinforcement learning (DRL) have brought significant improvement in network optimization, these methods still suffer from topology changes and fail to generalize for those topologies not seen in training. This paper proposes a graph neural network (GNN) based DRL framework to accommodate network traffic and computing resources jointly and efficiently. By taking advantage of the generalization capability in GNN, the proposed method can operate over variable topologies and obtain higher performance than the other DRL methods.
With the booming development of fifth-generation network technology and Internet of Things, the number of end-user devices (EDs) and diverse applications is surging, resulting in massive data generated at the edge of networks. To process these data efficiently, the innovative mobile edge computing (MEC) framework has emerged to guarantee low latency and enable efficient computing close to the user traffic. Recently, federated learning (FL) has demonstrated its empirical success in edge computing due to its privacy-preserving advantages. Thus, it becomes a promising solution for analyzing and processing distributed data on EDs in various machine learning tasks, which are the major workloads in MEC. Unfortunately, EDs are typically powered by batteries with limited capacity, which brings challenges when performing energy-intensive FL tasks. To address these challenges, many strategies have been proposed to save energy in FL. Considering the absence of a survey that thoroughly summarizes and classifies these strategies, in this paper, we provide a comprehensive survey of recent advances in energy-efficient strategies for FL in MEC. Specifically, we first introduce the system model and energy consumption models in FL, in terms of computation and communication. Then we analyze the challenges regarding improving energy efficiency and summarize the energy-efficient strategies from three perspectives: learning-based, resource allocation, and client selection. We conduct a detailed analysis of these strategies, comparing their advantages and disadvantages. Additionally, we visually illustrate the impact of these strategies on the performance of FL by showcasing experimental results. Finally, several potential future research directions for energy-efficient FL are discussed.
Federated learning (FL), a cutting-edge distributed machine learning training paradigm, aims to generate a global model by collaborating on the training of client models without revealing local private data. The cooccurrence of non-independent and identically distributed (non-IID) and long-tailed distribution in FL is one challenge that substantially degrades aggregate performance. In this paper, we present a corresponding solution called federated dual-decoupling via model and logit calibration (FedDDC) for non-IID and long-tailed distributions. The model is characterized by three aspects. First, we decouple the global model into the feature extractor and the classifier to fine-tune the components affected by the joint problem. For the biased feature extractor, we propose a client confidence re-weighting scheme to assist calibration, which assigns optimal weights to each client. For the biased classifier, we apply the classifier re-balancing method for fine-tuning. Then, we calibrate and integrate the client confidence re-weighted logits with the re-balanced logits to obtain the unbiased logits. Finally, we use decoupled knowledge distillation for the first time in the joint problem to enhance the accuracy of the global model by extracting the knowledge of the unbiased model. Numerous experiments demonstrate that on non-IID and long-tailed data in FL, our approach outperforms state-of-the-art methods.
Cross-lingual summarization (CLS) is the task of generating a summary in a target language from a document in a source language. Recently, end-to-end CLS models have achieved impressive results using large-scale, high-quality datasets typically constructed by translating monolingual summary corpora into CLS corpora. However, due to the limited performance of low-resource language translation models, translation noise can seriously degrade the performance of these models. In this paper, we propose a fine-grained reinforcement learning approach to address low-resource CLS based on noisy data. We introduce the source language summary as a gold signal to alleviate the impact of the translated noisy target summary. Specifically, we design a reinforcement reward by calculating the word correlation and word missing degree between the source language summary and the generated target language summary, and combine it with cross-entropy loss to optimize the CLS model. To validate the performance of our proposed model, we construct Chinese-Vietnamese and Vietnamese-Chinese CLS datasets. Experimental results show that our proposed model outperforms the baselines in terms of both the ROUGE score and BERTScore.
The new generation of artificial intelligence (AI) research initiated by Chinese scholars conforms to the needs of a new information environment changes, and strives to advance traditional artificial intelligence (AI 1.0) to a new stage of AI 2.0. As one of the important components of AI, collective intelligence (CI 1.0), i.e., swarm intelligence, is developing to the stage of CI 2.0 (crowd intelligence). Through in-depth analysis and informative argumentation, it is found that an incompatibility exists between CI 1.0 and CI 2.0. Therefore, CI 1.5 is introduced to build a bridge between the above two stages, which is based on bio-collaborative behavioral mimicry. CI 1.5 is the transition from CI 1.0 to CI 2.0, which contributes to the compatibility of the two stages. Then, a new interpretation of the meta-synthesis of wisdom proposed by Qian Xuesen is given. The meta-synthesis of wisdom, as an improvement of crowd intelligence, is an advanced stage of bionic intelligence, i.e., CI 3.0. It is pointed out that the dual-wheel drive of large language models and big data with deep uncertainty is an evolutionary path from CI 2.0 to CI 3.0, and some elaboration is made. As a result, we propose four development stages (CI 1.0, CI 1.5, CI 2.0, and CI 3.0), which form a complete framework for the development of CI. These different stages are progressively improved and have good compatibility. Due to the dominant role of cooperation in the development stages of CI, three types of cooperation in CI are discussed: indirect regulatory cooperation in lower organisms, direct communicative cooperation in higher organisms, and shared intention based collaboration in humans. Labor division is the main form of achieving cooperation and, for this reason, this paper investigates the relationship between the complexity of behavior and types of labor division. Finally, based on the overall understanding of the four development stages of CI, the future development direction and research issues of CI are explored.
While large language models (LLMs) have made significant strides in natural language processing (NLP), they continue to face challenges in adequately addressing the intricacies of the Chinese language in certain scenarios. We propose a framework called Six-Writings multimodal processing (SWMP) to enable direct integration of Chinese NLP (CNLP) with morphological and semantic elements. The first part of SWMP, known as Six-Writings pictophonetic coding (SWPC), is introduced with a suitable level of granularity for radicals and components, enabling effective representation of Chinese characters and words. We conduct several experimental scenarios, including the following: (1) We establish an experimental database consisting of images and SWPC for Chinese characters, enabling dual-mode processing and matrix generation for CNLP. (2) We characterize various generative modes of Chinese words, such as thousands of Chinese idioms, used as question-and-answer (Q&A) prompt functions, facilitating analogies by SWPC. The experiments achieve 100% accuracy in answering all questions in the Chinese morphological data set (CA8-Mor-10177). (3) A fine-tuning mechanism is proposed to refine word embedding results using SWPC, resulting in an average relative error of ≤25% for 39.37% of the questions in the Chinese wOrd Similarity data set (COS960). The results demonstrate that SWMP/SWPC methods effectively capture the distinctive features of Chinese and offer a promising mechanism to enhance CNLP with better efficiency.
Smart city situational awareness has recently emerged as a hot topic in research societies, industries, and governments because of its potential to integrate cutting-edge information technology and solve urgent challenges that modern cities face. For example, in the latest five-year plan, the Chinese government has highlighted the demand to empower smart city management with new technologies such as big data and Internet of Things, for which situational awareness is normally the crucial first step. While traditional static surveillance data on cities have been available for decades, this review reports a type of relatively new yet highly important urban data source, i.e., the big mobile data collected by devices with various levels of mobility representing the movement and distribution of public and private agents in the city. We especially focus on smart city situational awareness enabled by synthesizing the localization of hundreds of thousands of mobile software Apps using the Global Positioning System (GPS). This technique enjoys advantages such as a large penetration rate (~50% urban population covered), uniform spatiotemporal coverage, and high localization precision. We first discuss the pragmatic requirements for smart city situational awareness and the challenges faced. Then we introduce two suites of empowering technologies that help fulfill the requirements of (1) cybersecurity insurance for smart cities and (2) spatiotemporal modeling and visualization for situational awareness, both via big mobile data. The main contributions of this review lie in the description of a comprehensive technological framework for smart city situational awareness and the demonstration of its feasibility via real-world applications.
Mid-wavelength infrared (MWIR) detection and long-wavelength infrared (LWIR) detection constitute the key technologies for space-based Earth observation and astronomical detection. The advanced ability of infrared (IR) detection technology to penetrate the atmosphere and identify the camouflaged targets makes it excellent for space-based remote sensing. Thus, such detectors play an essential role in detecting and tracking low-temperature and far-distance moving targets. However, due to the diverse scenarios in which space-based IR detection systems are built, the key parameters of IR technologies are subject to unique demands. We review the developments and features of MWIR and LWIR detectors with a particular focus on their applications in space-based detection. We conduct a comprehensive analysis of key performance indicators for IR detection systems, including the ground sampling distance (GSD), operation range, and noise equivalent temperature difference (NETD) among others, and their interconnections with IR detector parameters. Additionally, the influences of pixel distance, focal plane array size, and operation temperature of space-based IR remote sensing are evaluated. The development requirements and technical challenges of MWIR and LWIR detection systems are also identified to achieve high-quality space-based observation platforms.
This article addresses the secure finite-time tracking problem via event-triggered command-filtered control for nonlinear time-delay cyber physical systems (CPSs) subject to cyber attacks. Under the attack circumstance, the output and state information of CPSs is unavailable for the feedback design, and the classical coordinate conversion of the iterative process is incompetent in relation to the tracking task. To solve this, a new coordinate conversion is proposed by considering the attack gains and the reference signal simultaneously. By employing the transformed variables, a modified fractional-order command-filtered signal is incorporated to overcome the complexity explosion issue, and the Nussbaum function is used to tackle the varying attack gains. By systematically constructing the Lyapunov–Krasovskii functional, an adaptive event-triggered mechanism is presented in detail, with which the communication resources are greatly saved, and the finite-time tracking of CPSs under cyber attacks is guaranteed. Finally, an example demonstrates the effectiveness.
In this paper, the distributed optimization problem is investigated for a class of general nonlinear model-free multi-agent systems. The dynamical model of each agent is unknown and only the input/output data are available. A model-free adaptive control method is employed, by which the original unknown nonlinear system is equivalently converted into a dynamic linearized model. An event-triggered consensus scheme is developed to guarantee that the consensus error of the outputs of all agents is convergent. Then, by means of the distributed gradient descent method, a novel event-triggered model-free adaptive distributed optimization algorithm is put forward. Sufficient conditions are established to ensure the consensus and optimality of the addressed system. Finally, simulation results are provided to validate the effectiveness of the proposed approach.
How to collaboratively offload tasks between user devices, edge networks (ENs), and cloud data centers is an interesting and challenging research topic. In this paper, we investigate the offloading decision, analytical modeling, and system parameter optimization problem in a collaborative cloud–edge–device environment, aiming to trade off different performance measures. According to the differentiated delay requirements of tasks, we classify the tasks into delay-sensitive and delay-tolerant tasks. To meet the delay requirements of delay-sensitive tasks and process as many delay-tolerant tasks as possible, we propose a cloud–edge–device collaborative task offloading scheme, in which delay-sensitive and delay-tolerant tasks follow the access threshold policy and the loss policy, respectively. We establish a four-dimensional continuous-time Markov chain as the system model. By using the Gauss–Seidel method, we derive the stationary probability distribution of the system model. Accordingly, we present the blocking rate of delay-sensitive tasks and the average delay of these two types of tasks. Numerical experiments are conducted and analyzed to evaluate the system performance, and numerical simulations are presented to evaluate and validate the effectiveness of the proposed task offloading scheme. Finally, we optimize the access threshold in the EN buffer to obtain the minimum system cost with different proportions of delay-sensitive tasks.
The integration of industrial Internet, cloud computing, and big data technology is changing the business and management mode of the industry chain. However, the industry chain is characterized by a wide range of fields, complex environment, and many factors, which creates a challenge for efficient integration and leveraging of industrial big data. Aiming at the integration of physical space and virtual space of the current industry chain, we propose an industry chain digital twin (DT) system framework for the industrial Internet. In addition, an industry chain information model based on a knowledge graph (KG) is proposed to integrate complex and heterogeneous industry chain data and extract industrial knowledge. First, the ontology of the industry chain is established, and an entity alignment method based on scientific and technological achievements is proposed. Second, the bidirectional encoder representations from Transformers (BERT) based multi-head selection model is proposed for joint entity–relation extraction of industry chain information. Third, a relation completion model based on a relational graph convolutional network (R-GCN) and a graph sample and aggregate network (GraphSAGE) is proposed which considers both semantic information and graph structure information of KG. Experimental results show that the performances of the proposed joint entity–relation extraction model and relation completion model are significantly better than those of the baselines. Finally, an industry chain information model is established based on the data of 18 industry chains in the field of basic machinery, which proves the feasibility of the proposed method.
This paper proposes a kind of programmable logic element (PLE) based on Sense-Switch pFLASH technology. By programming Sense-Switch pFLASH, all three-bit look-up table (LUT3) functions, partial four-bit look-up table (LUT4) functions, latch functions, and d flip flop (DFF) with enable and reset functions can be realized. Because PLE uses a choice of operational logic (COOL) approach for the operation of logic functions, it allows any logic circuit to be implemented at any ratio of combinatorial logic to register. This intrinsic property makes it close to the basic application specific integrated circuit (ASIC) cell in terms of fine granularity, thus allowing ASIC-like cell-based mappers to apply all their optimization potential. By measuring Sense-Switch pFLASH and PLE circuits, the results show that the “on” state driving current of the Sense-Switch pFLASH is about 245.52 μA, and that the “off” state leakage current is about 0.1 pA. The programmable function of PLE works normally. The delay of the typical combinatorial logic operation AND3 is 0.69 ns, and the delay of the sequential logic operation DFF is 0.65 ns, both of which meet the requirements of the design technical index.
This paper is concerned with the scaled formation control problem for multi-agent systems (MASs) over fixed and switching topologies. First, a modified resilient dynamic event-triggered (DET) mechanism involving an auxiliary dynamic variable (ADV) based on sampled data is proposed. In the proposed DET mechanism, a random variable obeying the Bernoulli distribution is introduced to express the idle and busy situations of communication networks. Meanwhile, the operation of absolute value is introduced into the triggering condition to effectively reduce the formation error. Second, a scaled formation control protocol with the proposed resilient DET mechanism is designed over fixed and switching topologies. The scaled formation error system is modeled as a time-varying delay system. Then, several sufficient stability criteria are derived by constructing appropriate Lyapunov–Krasovskii functionals (LKFs). A co-design algorithm based on the sparrow search algorithm (SSA) is presented to design the control gains and triggering parameters jointly. Finally, numerical simulations of multiple unmanned aerial vehicles (UAVs) are presented to validate the designed control method.