Journal home Browse Most accessed

Most accessed

  • Select all
  • Review
    Jie CHEN, Dandan WU, Ruiyun XIE
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(8): 1117-1142. https://doi.org/10.1631/FITEE.2200314

    Three technical problems should be solved urgently in cyberspace security: the timeliness and accuracy of network attack detection, the credibility assessment and prediction of the security situation, and the effectiveness of security defense strategy optimization. Artificial intelligence (AI) algorithms have become the core means to increase the chance of security and improve the network attack and defense ability in the application of cyberspace security. Recently, the breakthrough and application of AI technology have provided a series of advanced approaches for further enhancing network defense ability. This work presents a comprehensive review of AI technology articles for cyberspace security applications, mainly from 2017 to 2022. The papers are selected from a variety of journals and conferences: 52.68% are from Elsevier, Springer, and IEEE journals and 25% are from international conferences. With a specific focus on the latest approaches in machine learning (ML), deep learning (DL), and some popular optimization algorithms, the characteristics of the algorithmic models, performance results, datasets, potential benefits, and limitations are analyzed, and some of the existing challenges are highlighted. This work is intended to provide technical guidance for researchers who would like to obtain the potential of AI technical methods for cyberspace security and to provide tips for the later resolution of specific cyberspace security issues, and a mastery of the current development trends of technology and application and hot issues in the field of network security. It also indicates certain existing challenges and gives directions for addressing them effectively.

  • Review
    Zhenxin MU, Jie PAN, Ziye ZHOU, Junzhi YU, Lu CAO
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(8): 1093-1116. https://doi.org/10.1631/FITEE.2200590

    For complex functions to emerge in artificial systems, it is important to understand the intrinsic mechanisms of biological swarm behaviors in nature. In this paper, we present a comprehensive survey of pursuit-evasion, which is a critical problem in biological groups. First, we review the problem of pursuit-evasion from three different perspectives: game theory, control theory and artificial intelligence, and bio-inspired perspectives. Then we provide an overview of the research on pursuit-evasion problems in biological systems and artificial systems. We summarize predator pursuit behavior and prey evasion behavior as predator-prey behavior. Next, we analyze the application of pursuit-evasion in artificial systems from three perspectives, i.e., strong pursuer group vs. weak evader group, weak pursuer group vs. strong evader group, and equal-ability group. Finally, relevant prospects for future pursuit-evasion challenges are discussed. This survey provides new insights into the design of multi-agent and multi-robot systems to complete complex hunting tasks in uncertain dynamic scenarios.

  • Orginal Article
    Tao SHEN, Jie ZHANG, Xinkang JIA, Fengda ZHANG, Zheqi LV, Kun KUANG, Chao WU, Fei WU
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(10): 1390-1402. https://doi.org/10.1631/FITEE.2300098

    Federated learning (FL) is a novel technique in deep learning that enables clients to collaboratively train a shared model while retaining their decentralized data. However, researchers working on FL face several unique challenges, especially in the context of heterogeneity. Heterogeneity in data distributions, computational capabilities, and scenarios among clients necessitates the development of customized models and objectives in FL. Unfortunately, existing works such as FedAvg may not effectively accommodate the specific needs of each client. To address the challenges arising from heterogeneity in FL, we provide an overview of the heterogeneities in data, model, and objective (DMO). Furthermore, we propose a novel framework called federated mutual learning (FML), which enables each client to train a personalized model that accounts for the data heterogeneity (DH). A “meme model” serves as an intermediary between the personalized and global models to address model heterogeneity (MH). We introduce a knowledge distillation technique called deep mutual learning (DML) to transfer knowledge between these two models on local data. To overcome objective heterogeneity (OH), we design a shared global model that includes only certain parts, and the personalized model is task-specific and enhanced through mutual learning with the meme model. We evaluate the performance of FML in addressing DMO heterogeneities through experiments and compare it with other commonly used FL methods in similar scenarios. The results demonstrate that FML outperforms other methods and effectively addresses the DMO challenges encountered in the FL setting.

  • Review
    Lequan LIN, Zhengkun LI, Ruikun LI, Xuliang LI, Junbin GAO
    Frontiers of Information Technology & Electronic Engineering, 2024, 25(1): 19-41. https://doi.org/10.1631/FITEE.2300310

    Diffusion models, a family of generative models based on deep learning, have become increasingly prominent in cutting-edge machine learning research. With distinguished performance in generating samples that resemble the observed data, diffusion models are widely used in image, video, and text synthesis nowadays. In recent years, the concept of diffusion has been extended to time-series applications, and many powerful models have been developed. Considering the deficiency of a methodical summary and discourse on these models, we provide this survey as an elementary resource for new researchers in this area and to provide inspiration to motivate future research. For better understanding, we include an introduction about the basics of diffusion models. Except for this, we primarily focus on diffusion-based methods for time-series forecasting, imputation, and generation, and present them, separately, in three individual sections. We also compare different methods for the same application and highlight their connections if applicable. Finally, we conclude with the common limitation of diffusion-based methods and highlight potential future research directions.

  • Perspective
    Yingbo LI, Zhao LI, Yucong DUAN, Anamaria-Beatrice SPULBER
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(8): 1231-1238. https://doi.org/10.1631/FITEE.2200675
  • Orginal Article
    Linna ZHOU, Zhigao LU, Weike YOU, Xiaofei FANG
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(8): 1143-1155. https://doi.org/10.1631/FITEE.2300041

    In the field of reversible data hiding (RDH), designing a high-precision predictor to reduce the embedding distortion and developing an effective embedding strategy to minimize the distortion caused by embedding information are the two most critical aspects. In this paper, we propose a new RDH method, including a predictor based on a transformer and a novel embedding strategy with multiple embedding rules. In the predictor part, we first design a transformer-based predictor. Then, we propose an image division method to divide the image into four parts, which can use more pixels as context. Compared with other predictors, the transformer-based predictor can extend the range of pixels for prediction from neighboring pixels to global ones, making it more accurate in reducing the embedding distortion. In the embedding strategy part, we first propose a complexity measurement with pixels in the target blocks. Then, we develop an improved prediction error ordering rule. Finally, we provide an embedding strategy including multiple embedding rules for the first time. The proposed RDH method can effectively reduce the distortion and provide satisfactory results in improving the visual quality of data-hidden images, and experimental results show that the performance of our RDH method is leading the field.

  • Orginal Article
    Luolin XIONG, Yang TANG, Chensheng LIU, Shuai MAO, Ke MENG, Zhaoyang DONG, Feng QIAN
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(9): 1261-1272. https://doi.org/10.1631/FITEE.2200667

    Considering the popularity of electric vehicles and the flexibility of household appliances, it is feasible to dispatch energy in home energy systems under dynamic electricity prices to optimize electricity cost and comfort residents. In this paper, a novel home energy management (HEM) approach is proposed based on a data-driven deep reinforcement learning method. First, to reveal the multiple uncertain factors affecting the charging behavior of electric vehicles (EVs), an improved mathematical model integrating driver's experience, unexpected events, and traffic conditions is introduced to describe the dynamic energy demand of EVs in home energy systems. Second, a decoupled advantage actor-critic (DA2C) algorithm is presented to enhance the energy optimization performance by alleviating the overfitting problem caused by the shared policy and value networks. Furthermore, separate networks for the policy and value functions ensure the generalization of the proposed method in unseen scenarios. Finally, comprehensive experiments are carried out to compare the proposed approach with existing methods, and the results show that the proposed method can optimize electricity cost and consider the residential comfort level in different scenarios.

  • Orginal Article
    Qian XU, Chutian YU, Xiang YUAN, Mengli WEI, Hongzhe LIU
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(9): 1253-1260. https://doi.org/10.1631/FITEE.2200596

    In this paper, the optimization problem subject to N nonidentical closed convex set constraints is studied. The aim is to design a corresponding distributed optimization algorithm over the fixed unbalanced graph to solve the considered problem. To this end, with the push-sum framework improved, the distributed optimization algorithm is newly designed, and its strict convergence analysis is given under the assumption that the involved graph is strongly connected. Finally, simulation results support the good performance of the proposed algorithm.

  • Orginal Article
    Kaili QI, Minqing ZHANG, Fuqiang DI, Yongjun KONG
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(8): 1156-1168. https://doi.org/10.1631/FITEE.2200501

    To improve the embedding capacity of reversible data hiding in encrypted images (RDH-EI), a new RDH-EI scheme is proposed based on adaptive quadtree partitioning and most significant bit (MSB) prediction. First, according to the smoothness of the image, the image is partitioned into blocks based on adaptive quadtree partitioning, and then blocks of different sizes are encrypted and scrambled at the block level to resist the analysis of the encrypted images. In the data embedding stage, the adaptive MSB prediction method proposed by Wang and He (2022) is improved by taking the upper-left pixel in the block as the target pixel, to predict other pixels to free up more embedding space. To the best of our knowledge, quadtree partitioning is first applied to RDH-EI. Simulation results show that the proposed method is reversible and separable, and that its average embedding capacity is improved. For gray images with a size of 512×512, the average embedding capacity is increased by 25565 bits. For all smooth images with improved embedding capacity, the average embedding capacity is increased by about 35530 bits.

  • Orginal Article
    Xiuli CHAI, Xiuhui CHEN, Yakun MA, Fang ZUO, Zhihua GAN, Yushu ZHANG
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(8): 1169-1180. https://doi.org/10.1631/FITEE.2200498

    With the substantial increase in image transmission, the demand for image security is increasing. Noise-like images can be obtained by conventional encryption schemes, and although the security of the images can be guaranteed, the noise-like images cannot be directly previewed and retrieved. Based on the rank-then-encipher method, some researchers have designed a three-pixel exact thumbnail preserving encryption (TPE2) scheme, which can be applied to balance the security and availability of images, but this scheme has low encryption efficiency. In this paper, we introduce an efficient exact thumbnail preserving encryption scheme. First, blocking and bit-plane decomposition operations are performed on the plaintext image. The zigzag scrambling model is used to change the bit positions in the lower four bit planes. Subsequently, an operation is devised to permute the higher four bit planes, which is an extended application of the hidden Markov model. Finally, according to the difference in bit weights in each bit plane, a bit-level weighted diffusion rule is established to generate an encrypted image and still maintain the same sum of pixels within the block. Simulation results show that the proposed scheme improves the encryption efficiency and can guarantee the availability of images while protecting their privacy.

  • Review
    Yiming LEI, Jingqi LI, Zilong LI, Yuan CAO, Hongming SHAN
    Frontiers of Information Technology & Electronic Engineering, 2024, 25(1): 42-63. https://doi.org/10.1631/FITEE.2300389

    Prompt learning has attracted broad attention in computer vision since the large pre-trained visionlanguage models (VLMs) exploded. Based on the close relationship between vision and language information built by VLM, prompt learning becomes a crucial technique in many important applications such as artificial intelligence generated content (AIGC). In this survey, we provide a progressive and comprehensive review of visual prompt learning as related to AIGC. We begin by introducing VLM, the foundation of visual prompt learning. Then, we review the vision prompt learning methods and prompt-guided generative models, and discuss how to improve the efficiency of adapting AIGC models to specific downstream tasks. Finally, we provide some promising research directions concerning prompt learning.

  • Orginal Article
    Liang WANG, Shunjiu HUANG, Lina ZUO, Jun LI, Wenyuan LIU
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(8): 1194-1213. https://doi.org/10.1631/FITEE.2200659

    The problem of data right confirmation is a long-term bottleneck in data sharing. Existing methods for confirming data rights lack credibility owing to poor supervision, and work only with specific data types because of their technical limitations. The emergence of blockchain is followed by some new data-sharing models that may provide improved data security. However, few of these models perform well enough in confirming data rights because the data access could not be fully under the control of the blockchain facility. In view of this, we propose a rightconfirmable data-sharing model named RCDS that features symbol mapping coding (SMC) and blockchain. With SMC, each party encodes its digital identity into the byte sequence of the shared data by generating a unique symbol mapping table, whereby declaration of data rights can be content-independent for any type and any volume of data. With blockchain, all data-sharing participants jointly supervise the delivery and the access to shared data, so that granting of data rights can be openly verified. The evaluation results show that RCDS is effective and practical in data-sharing applications that are conscientious about data right confirmation.

  • Orginal Article
    Shihmin WANG, Binqi ZHAO, Zhengfeng ZHANG, Junping ZHANG, Jian PU
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(11): 1541-1556. https://doi.org/10.1631/FITEE.2300084

    As one of the most fundamental topics in reinforcement learning (RL), sample efficiency is essential to the deployment of deep RL algorithms. Unlike most existing exploration methods that sample an action from different types of posterior distributions, we focus on the policy sampling process and propose an efficient selective sampling approach to improve sample efficiency by modeling the internal hierarchy of the environment. Specifically, we first employ clustering methods in the policy sampling process to generate an action candidate set. Then we introduce a clustering buffer for modeling the internal hierarchy, which consists of on-policy data, off-policy data, and expert data to evaluate actions from the clusters in the action candidate set in the exploration stage. In this way, our approach is able to take advantage of the supervision information in the expert demonstration data. Experiments on six different continuous locomotion environments demonstrate superior reinforcement learning performance and faster convergence of selective sampling. In particular, on the LGSVL task, our method can reduce the number of convergence steps by 46.7% and the convergence time by 28.5%. Furthermore, our code is open-source for reproducibility. The code is available at https://github.com/Shihwin/SelectiveSampling.

  • Review
    Bing LI, Peng YANG, Yuankang SUN, Zhongjian HU, Meng YI
    Frontiers of Information Technology & Electronic Engineering, 2024, 25(1): 64-83. https://doi.org/10.1631/FITEE.2300410

    Text generation is an essential research area in artificial intelligence (AI) technology and natural language processing and provides key technical support for the rapid development of AI-generated content (AIGC). It is based on technologies such as natural language processing, machine learning, and deep learning, which enable learning language rules through training models to automatically generate text that meets grammatical and semantic requirements. In this paper, we sort and systematically summarize the main research progress in text generation and review recent text generation papers, focusing on presenting a detailed understanding of the technical models. In addition, several typical text generation application systems are presented. Finally, we address some challenges and future directions in AI text generation. We conclude that improving the quality, quantity, interactivity, and adaptability of generated text can help fundamentally advance AI text generation development.

  • Orginal Article
    Zhe JIN, Yin ZHANG, Jiaxu MIAO, Yi YANG, Yueting ZHUANG, Yunhe PAN
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(10): 1416-1429. https://doi.org/10.1631/FITEE.2200662

    Traditional Chinese medicine (TCM) is an interesting research topic in China’s thousands of years of history. With the recent advances in artificial intelligence technology, some researchers have started to focus on learning the TCM prescriptions in a data-driven manner. This involves appropriately recommending a set of herbs based on patients’ symptoms. Most existing herb recommendation models disregard TCM domain knowledge, for example, the interactions between symptoms and herbs and the TCM-informed observations (i.e., TCM formulation of prescriptions). In this paper, we propose a knowledge-guided and TCM-informed approach for herb recommendation. The knowledge used includes path interactions and co-occurrence relationships among symptoms and herbs from a knowledge graph generated from TCM literature and prescriptions. The aforementioned knowledge is used to obtain the discriminative feature vectors of symptoms and herbs via a graph attention network. To increase the ability of herb prediction for the given symptoms, we introduce TCM-informed observations in the prediction layer. We apply our proposed model on a TCM prescription dataset, demonstrating significant improvements over state-of-the-art herb recommendation methods.

  • Orginal Article
    Xi SUN, Zhimin LV
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(9): 1273-1286. https://doi.org/10.1631/FITEE.2200304

    Next point-of-interest (POI) recommendation is an important personalized task in location-based social networks (LBSNs) and aims to recommend the next POI for users in a specific situation with historical check-in data. State-of-the-art studies linearly discretize the user's spatiotemporal information and then use recurrent neural network (RNN) based models for modeling. However, these studies ignore the nonlinear effects of spatiotemporal information on user preferences and spatiotemporal correlations between user trajectories and candidate POIs. To address these limitations, a spatiotemporal trajectory (STT) model is proposed in this paper. We use the long short-term memory (LSTM) model with an attention mechanism as the basic framework and introduce the user's spatiotemporal information into the model in encoding. In the process of encoding information, an exponential decay factor is applied to reflect the nonlinear drift of user interest over time and distance. In addition, we design a spatiotemporal matching module in the process of recalling the target to select the most relevant POI by measuring the relevance between the user's current trajectory and the candidate set. We evaluate the performance of our STT model with four real-world datasets. Experimental results show that our model outperforms existing state-of-the-art methods.

  • Perspective
    Xin PENG
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(11): 1513-1519. https://doi.org/10.1631/FITEE.2300537
  • Orginal Article
    Fengda ZHANG, Kun KUANG, Long CHEN, Zhaoyang YOU, Tao SHEN, Jun XIAO, Yin ZHANG, Chao WU, Fei WU, Yueting ZHUANG, Xiaolin LI
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(8): 1181-1193. https://doi.org/10.1631/FITEE.2200268

    To leverage the enormous amount of unlabeled data on distributed edge devices, we formulate a new problem in federated learning called federated unsupervised representation learning (FURL) to learn a common representation model without supervision while preserving data privacy. FURL poses two new challenges: (1) data distribution shift (non-independent and identically distributed, non-IID) among clients would make local models focus on different categories, leading to the inconsistency of representation spaces; (2) without unified information among the clients in FURL, the representations across clients would be misaligned. To address these challenges, we propose the federated contrastive averaging with dictionary and alignment (FedCA) algorithm. FedCA is composed of two key modules: a dictionary module to aggregate the representations of samples from each client which can be shared with all clients for consistency of representation space and an alignment module to align the representation of each client on a base model trained on public data. We adopt the contrastive approach for local model training. Through extensive experiments with three evaluation protocols in IID and non-IID settings, we demonstrate that FedCA outperforms all baselines with significant margins.

  • Orginal Article
    Baoxiong XU, Jianxin YI, Feng CHENG, Ziping GONG, Xianrong WAN
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(8): 1214-1230. https://doi.org/10.1631/FITEE.2200260

    In radar systems, target tracking errors are mainly from motion models and nonlinear measurements. When we evaluate a tracking algorithm, its tracking accuracy is the main criterion. To improve the tracking accuracy, in this paper we formulate the tracking problem into a regression model from measurements to target states. A tracking algorithm based on a modified deep feedforward neural network (MDFNN) is then proposed. In MDFNN, a filter layer is introduced to describe the temporal sequence relationship of the input measurement sequence, and the optimal measurement sequence size is analyzed. Simulations and field experimental data of the passive radar show that the accuracy of the proposed algorithm is better than those of extended Kalman filter (EKF), unscented Kalman filter (UKF), and recurrent neural network (RNN) based tracking methods under the considered scenarios.

  • Orginal Article
    Han YAN, Chongquan ZHONG, Yuhu WU, Liyong ZHANG, Wei LU
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(11): 1557-1573. https://doi.org/10.1631/FITEE.2200515

    Convolutional neural networks (CNNs) have been developed quickly in many real-world fields. However, CNN’s performance depends heavily on its hyperparameters, while finding suitable hyperparameters for CNNs working in application fields is challenging for three reasons: (1) the problem of mixed-variable encoding for different types of hyperparameters in CNNs, (2) expensive computational costs in evaluating candidate hyperparameter configuration, and (3) the problem of ensuring convergence rates and model performance during hyperparameter search. To overcome these problems and challenges, a hybrid-model optimization algorithm is proposed in this paper to search suitable hyperparameter configurations automatically based on the Gaussian process and particle swarm optimization (GPPSO) algorithm. First, a new encoding method is designed to efficiently deal with the CNN hyperparameter mixed-variable problem. Second, a hybrid-surrogate-assisted model is proposed to reduce the high cost of evaluating candidate hyperparameter configurations. Third, a novel activation function is suggested to improve the model performance and ensure the convergence rate. Intensive experiments are performed on image-classification benchmark datasets to demonstrate the superior performance of GPPSO over state-of-the-art methods. Moreover, a case study on metal fracture diagnosis is carried out to evaluate the GPPSO algorithm performance in practical applications. Experimental results demonstrate the effectiveness and efficiency of GPPSO, achieving accuracy of 95.26% and 76.36% only through 0.04 and 1.70 GPU days on the CIFAR-10 and CIFAR-100 datasets, respectively.

  • Orginal Article
    Ran TIAN, Xinmei LI, Zhongyu MA, Yanxing LIU, Jingxia WANG, Chu WANG
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(9): 1287-1301. https://doi.org/10.1631/FITEE.2200540

    Accurate long-term power forecasting is important in the decision-making operation of the power grid and power consumption management of customers to ensure the power system's reliable power supply and the grid economy's reliable operation. However, most time-series forecasting models do not perform well in dealing with long-time-series prediction tasks with a large amount of data. To address this challenge, we propose a parallel time-series prediction model called LDformer. First, we combine Informer with long short-term memory (LSTM) to obtain deep representation abilities in the time series. Then, we propose a parallel encoder module to improve the robustness of the model and combine convolutional layers with an attention mechanism to avoid value redundancy in the attention mechanism. Finally, we propose a probabilistic sparse (ProbSparse) self-attention mechanism combined with UniDrop to reduce the computational overhead and mitigate the risk of losing some key connections in the sequence. Experimental results on five datasets show that LDformer outperforms the state-of-the-art methods for most of the cases when handling the different long-time-series prediction tasks.

  • Correspondence
    Chun GENG, Jiwei LIAN, Dazhi DING
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(9): 1366-1374. https://doi.org/10.1631/FITEE.2200454
  • Orginal Article
    Jinyi GUO, Jieyu DING
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(10): 1403-1415. https://doi.org/10.1631/FITEE.2200514

    Cross-modal retrieval tries to achieve mutual retrieval between modalities by establishing consistent alignment for different modal data. Currently, many cross-modal retrieval methods have been proposed and have achieved excellent results; however, these are trained with clean cross-modal pairs, which are semantically matched but costly, compared with easily available data with noise alignment (i.e., paired but mismatched in semantics). When training these methods with noise-aligned data, the performance degrades dramatically. Therefore, we propose a robust cross-modal retrieval with alignment refurbishment (RCAR), which significantly reduces the impact of noise on the model. Specifically, RCAR first conducts multi-task learning to slow down the overfitting to the noise to make data separable. Then, RCAR uses a two-component beta-mixture model to divide them into clean and noise alignments and refurbishes the label according to the posterior probability of the noise-alignment component. In addition, we define partial and complete noises in the noise-alignment paradigm. Experimental results show that, compared with the popular cross-modal retrieval methods, RCAR achieves more robust performance with both types of noise.

  • Correspondence
    Lingsheng YANG, Bin WANG, Yajie LI
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(9): 1357-1365. https://doi.org/10.1631/FITEE.2200542
  • Orginal Article
    Zicong XIA, Yang LIU, Wenlian LU, Weihua GUI
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(9): 1239-1252. https://doi.org/10.1631/FITEE.2200381

    In this paper, we address matrix-valued distributed stochastic optimization with inequality and equality constraints, where the objective function is a sum of multiple matrix-valued functions with stochastic variables and the considered problems are solved in a distributed manner. A penalty method is derived to deal with the constraints, and a selection principle is proposed for choosing feasible penalty functions and penalty gains. A distributed optimization algorithm based on the gossip model is developed for solving the stochastic optimization problem, and its convergence to the optimal solution is analyzed rigorously. Two numerical examples are given to demonstrate the viability of the main results.

  • Orginal Article
    Chuyun SHEN, Wenhao LI, Qisen XU, Bin HU, Bo JIN, Haibin CAI, Fengping ZHU, Yuxin LI, Xiangfeng WANG
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(9): 1332-1348. https://doi.org/10.1631/FITEE.2200299

    Interactive medical image segmentation based on human-in-the-loop machine learning is a novel paradigm that draws on human expert knowledge to assist medical image segmentation. However, existing methods often fall into what we call interactive misunderstanding, the essence of which is the dilemma in trading off short- and long-term interaction information. To better use the interaction information at various timescales, we propose an interactive segmentation framework, called interactive MEdical image segmentation with self-adaptive Confidence CAlibration (MECCA), which combines action-based confidence learning and multi-agent reinforcement learning. A novel confidence network is learned by predicting the alignment level of the action with short-term interaction information. A confidence-based reward-shaping mechanism is then proposed to explicitly incorporate confidence in the policy gradient calculation, thus directly correcting the model's interactive misunderstanding. MECCA also enables user-friendly interactions by reducing the interaction intensity and difficulty via label generation and interaction guidance, respectively. Numerical experiments on different segmentation tasks show that MECCA can significantly improve short- and long-term interaction information utilization efficiency with remarkably fewer labeled samples. The demo video is available at https://bit.ly/mecca-demo-video.

  • Orginal Article
    Xiaofei QIN, Wenkai HU, Chen XIAO, Changxiang HE, Songwen PEI, Xuedian ZHANG
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(10): 1430-1444. https://doi.org/10.1631/FITEE.2200502

    To balance the inference speed and detection accuracy of a grasp detection algorithm, which are both important for robot grasping tasks, we propose an encoder–decoder structured pixel-level grasp detection neural network named the attention-based efficient robot grasp detection network (AE-GDN). Three spatial attention modules are introduced in the encoder stages to enhance the detailed information, and three channel attention modules are introduced in the decoder stages to extract more semantic information. Several lightweight and efficient DenseBlocks are used to connect the encoder and decoder paths to improve the feature modeling capability of AE-GDN. A high intersection over union (IoU) value between the predicted grasp rectangle and the ground truth does not necessarily mean a high-quality grasp configuration, but might cause a collision. This is because traditional IoU loss calculation methods treat the center part of the predicted rectangle as having the same importance as the area around the grippers. We design a new IoU loss calculation method based on an hourglass box matching mechanism, which will create good correspondence between high IoUs and high-quality grasp configurations. AE-GDN achieves the accuracy of 98.9% and 96.6% on the Cornell and Jacquard datasets, respectively. The inference speed reaches 43.5 frames per second with only about 1.2 × 106 parameters. The proposed AE-GDN has also been deployed on a practical robotic arm grasping system and performs grasping well. Codes are available at https://github.com/robvincen/robot_gradet.

  • Orginal Article
    Weifang HUANG, Lijian YANG, Xuan ZHAN, Ziying FU, Ya JIA
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(10): 1458-1470. https://doi.org/10.1631/FITEE.2300008

    Time delay and coupling strength are important factors that affect the synchronization of neural networks. In this study, a modular neural network containing subnetworks of different scales was constructed using the Hodgkin–Huxley (HH) neural model; i.e., a small-scale random network was unidirectionally connected to a large-scale small-world network through chemical synapses. Time delays were found to induce multiple synchronization transitions in the network. An increase in coupling strength also promoted synchronization of the network when the time delay was an integer multiple of the firing period of a single neuron. Considering that time delays at different locations in a modular network may have different effects, we explored the influence of time delays within each subnetwork and between two subnetworks on the synchronization of modular networks. We found that when the subnetworks were well synchronized internally, an increase in the time delay within both subnetworks induced multiple synchronization transitions of their own. In addition, the synchronization state of the small-scale network affected the synchronization of the large-scale network. It was surprising to find that an increase in the time delay between the two subnetworks caused the synchronization factor of the modular network to vary periodically, but it had essentially no effect on the synchronization within the receiving subnetwork. By analyzing the phase difference between the two subnetworks, we found that the mechanism of the periodic variation of the synchronization factor of the modular network was the periodic variation of the phase difference. Finally, the generality of the results was demonstrated by investigating modular networks at different scales.

  • Orginal Article
    Dawen XIA, Jian GENG, Ruixi HUANG, Bingqi SHEN, Yang HU, Yantao LI, Huaqing LI
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(9): 1316-1331. https://doi.org/10.1631/FITEE.2200621

    To address the imbalance problem between supply and demand for taxis and passengers, this paper proposes a distributed ensemble empirical mode decomposition with normalization of spatial attention mechanism based bi-directional gated recurrent unit (EEMDN-SABiGRU) model on Spark for accurate passenger hotspot prediction. It focuses on reducing blind cruising costs, improving carrying efficiency, and maximizing incomes. Specifically, the EEMDN method is put forward to process the passenger hotspot data in the grid to solve the problems of non-smooth sequences and the degradation of prediction accuracy caused by excessive numerical differences, while dealing with the eigenmodal EMD. Next, a spatial attention mechanism is constructed to capture the characteristics of passenger hotspots in each grid, taking passenger boarding and alighting hotspots as weights and emphasizing the spatial regularity of passengers in the grid. Furthermore, the bi-directional GRU algorithm is merged to deal with the problem that GRU can obtain only the forward information but ignores the backward information, to improve the accuracy of feature extraction. Finally, the accurate prediction of passenger hotspots is achieved based on the EEMDN-SABiGRU model using real-world taxi GPS trajectory data in the Spark parallel computing framework. The experimental results demonstrate that based on the four datasets in the 00-grid, compared with LSTM, EMD-LSTM, EEMD-LSTM, GRU, EMD-GRU, EEMD-GRU, EMDN-GRU, CNN, and BP, the mean absolute percentage error, mean absolute error, root mean square error, and maximum error values of EEMDN-SABiGRU decrease by at least 43.18%, 44.91%, 55.04%, and 39.33%, respectively.

  • Review
    Yajun ZHAO
    Frontiers of Information Technology & Electronic Engineering, 2023, 24(12): 1669-1688. https://doi.org/10.1631/FITEE.2200666

    Scholars are expected to continue enhancing the depth and breadth of theoretical research on reconfigurable intelligent surface (RIS) to provide a higher theoretical limit for RIS engineering applications. Notably, significant advancements have been achieved through both academic research breakthroughs and the promotion of engineering applications and industrialization. We provide an overview of RIS engineering applications, focusing primarily on their typical features, classifications, and deployment scenarios. Furthermore, we systematically and comprehensively analyze the challenges faced by RIS and propose potential solutions including addressing the beamforming issues through cascade channel decoupling, tackling the effects and resolutions of regulatory constraints on RIS, exploring the network-controlled mode for RIS system architecture, examining integrated channel regulation and information modulation, and investigating the use of the true time delay (TTD) mechanism for RIS. In addition, two key technical points, RIS-assisted non-orthogonal multiple access (NOMA) and RIS-based transmitter, are reviewed from the perspective of completeness. Finally, we discuss future trends and challenges in this field.