Oct 2020, Volume 14 Issue 5
    

  • Select all
  • REVIEW ARTICLE
    Ling SHEN, Richang HONG, Yanbin HAO

    Emerging Internet services and applications attract increasing users to involve in diverse video-related activities, such as video searching, video downloading, video sharing and so on. As normal operations, they lead to an explosive growth of online video volume, and inevitably give rise to the massive near-duplicate contents. Near-duplicate video retrieval (NDVR) has always been a hot topic. The primary purpose of this paper is to present a comprehensive survey and an updated reviewof the advance on large-scaleNDVR to supply guidance for researchers. Specifically, we summarize and compare the definitions of near-duplicate videos (NDVs) in the literature, analyze the relationship between NDVR and its related research topics theoretically, describe its generic framework in detail, investigate the existing state-of-the-art NDVR systems. Finally, we present the development trends and research directions of this topic.

  • RESEARCH ARTICLE
    Yaopeng LIU, Hao PENG, Jianxin LI, Yangqiu SONG, Xiong LI

    Real-life events are emerging and evolving in social and news streams. Recent methods have succeeded in capturing designed features of monolingual events, but lack of interpretability and multi-lingual considerations. To this end, we propose a multi-lingual event mining model, namely MLEM, to automatically detect events and generate evolution graph in multilingual hybrid-length text streams including English, Chinese, French, German, Russian and Japanese. Specially, we merge the same entities and similar phrases and present multiple similarity measures by incremental word2vec model. We propose an 8-tuple to describe event for correlation analysis and evolution graph generation. We evaluate the MLEM model using a massive humangenerated dataset containing real world events. Experimental results show that our new model MLEM outperforms the baseline method both in efficiency and effectiveness.

  • LETTER
    Shumin HAN, Derong SHEN, Tiezheng NIE, Yue KOU, Ge YU
  • RESEARCH ARTICLE
    Jintao GAO, Wenjie LIU, Zhanhuai LI

    Collecting statistics is a time- and resourceconsuming operation in database systems. It is even more challenging to efficiently collect statistics without affecting system performance, meanwhile keeping correctness in distributed database. Traditional strategies usually consider one dimension during collecting statistics, which is lack of adaptiveness. In this paper, we propose an adaptive strategy for statistics collecting(ASC), which well balances collecting efficiency, correctness of statistics and effect to system performance. We formally define the procedure of collecting statistics and abstract the relationships among collecting efficiency, correctness of statistics and effect to system performance, and introduce an elastic structure(ESI) storing necessary information generated during proceeding our strategy. ASC can pick appropriate time to trigger collecting action and filter unnecessary tasks, meanwhile reasonably allocating collecting tasks to appropriate executing locations with right executing models through the information stored at ESI. We implement and evaluate our strategy in a distributed database. Experiments show that our solutions generally improve the efficiency and correctness of collecting statistics, moreover, reduce the negative effect to system performance comparing with other strategies.

  • RESEARCH ARTICLE
    Minxi LI, Jiali MAO, Xiaodong QI, Cheqing JIN

    Rampant cloned vehicle offenses have caused great damage to transportation management as well as public safety and even the world economy. It necessitates an efficient detection mechanism to identify the vehicles with fake license plates accurately, and further explore the motives through discerning the behaviors of cloned vehicles. The ubiquitous inspection spots that deployed in the city have been collecting moving information of passing vehicles, which opens up a new opportunity for cloned vehicle detection. Existing detection methods cannot detect the cloned vehicle effectively due to that they use the fixed speed threshold. In this paper, we propose a two-phase framework, called CVDF, to detect cloned vehicles and discriminate behavior patterns of vehicles that use the same plate number. In the detection phase, cloned vehicles are identified based on speed thresholds extracted from historical trajectory and behavior abnormality analysis within the local neighborhood. In the behavior analysis phase, consider the traces of vehicles that uses the same license plate will be mixed together, we aim to differentiate the trajectories through matching degreebased clustering and then extract frequent temporal behavior patterns. The experimental results on the real-world data show that CVDF framework has high detection precision and could reveal cloned vehicles’ behavior effectively. Our proposal provides a scientific basis for traffic management authority to solve the crime of cloned vehicle.

  • RESEARCH ARTICLE
    Zeinab ASKARI, Avid AVOKH

    This paper deals with the problem of joint multicast routing, scheduling, and call admission control in multiradio multi-channel wireless mesh networks. To heuristically solve this problem, we propose a cross-layer algorithm named “extended MIMCR with scheduling and call admission control phases (EMSC)”. Our model relies on the ondemand quality of service (QoS) multicast sessions, where each admitted session creates a unique tree with a required bandwidth. The proposed scheme extends the MIMCR algorithm to fairly schedule multiple non-interfering transmissions in the same time slot. It also exploits a call admission control mechanism to protect the QoS requirements of the multicast traffics. EMSC reduces the number of occupied time slots, with consideration of spatial reuse, both Intra-flow and Inter-flow interferences, and selecting the minimuminterference minimum-cost paths. This subsequently leads to better radio resource utilization and increases the network throughput. Simulation results show that the proposed algorithm outperforms the other algorithms and improves the network performance.

  • RESEARCH ARTICLE
    Jian GAO, Jinyan WANG, Kuixian WU, Rong CHEN

    Solving a quantified constraint satisfaction problem (QCSP) is usually a hard task due to its computational complexity. Exact algorithms play an important role in solving this problem, among which backtrack algorithms are effective. In a backtrack algorithm, an important step is assigning a variable by a chosen value when exploiting a branch, and thus a good value selection rule may speed up greatly. In this paper, we propose two value selection rules for existentially and universally quantified variables, respectively, to avoid unnecessary searching. The rule for universally quantified variables is prior to trying failure values in previous branches, and the rule for existentially quantified variables selects the promising values first. Two rules are integrated into the state-of-the-art QCSP solver, i.e., QCSPSolve, which is an exact solver based on backtracking. We perform a number of experiments to evaluate improvements brought by our rules. From computational results, we can conclude that the new value selection rules speed up the solver by 5 times on average and 30 times at most. We also show both rules perform well particularly on instances with existentially and universally quantified variables occurring alternatively.

  • RESEARCH ARTICLE
    Neng HOU, Fazhi HE, Yi ZHOU, Yilin CHEN

    Hardware/software partitioning is an essential step in hardware/software co-design. For large size problems, it is difficult to consider both solution quality and time. This paper presents an efficient GPU-based parallel tabu search algorithm (GPTS) for HW/SW partitioning. A single GPU kernel of compacting neighborhood is proposed to reduce the amount of GPU global memory accesses theoretically. A kernel fusion strategy is further proposed to reduce the amount of GPU global memory accesses of GPTS. To further minimize the transfer overhead of GPTS between CPU and GPU, an optimized transfer strategy for GPU-based tabu evaluation is proposed, which considers that all the candidates do not satisfy the given constraint. Experiments show that GPTS outperforms state-of-the-art work of tabu search and is competitive with other methods for HW/SW partitioning. The proposed parallelization is significant when considering the ordinary GPU platform.

  • RESEARCH ARTICLE
    Jin LI, Quan CHEN, Jingwen LENG, Weinan ZHANG, Minyi GUO

    Robust regression plays an important role in many machine learning problems. A primal approach relies on the use of Huber loss and an iteratively reweighted l2 method. However, because the Huber loss is not smooth and its corresponding distribution cannot be represented as a Gaussian scale mixture, such an approach is extremely difficult to handle using a probabilistic framework. To address those limitations, this paper proposes two novel losses and the corresponding probability functions. One is called Soft Huber, which is well suited for modeling non-Gaussian noise. Another is Nonconvex Huber, which can help produce much sparser results when imposed as a prior on regression vector. They can represent any lq loss (12q<2) with tuning parameters, which makes the regression modelmore robust. We also show that both distributions have an elegant form, which is a Gaussian scale mixture with a generalized inverse Gaussian mixing density. This enables us to devise an expectation maximization (EM) algorithm for solving the regression model.We can obtain an adaptive weight through EM, which is very useful to remove noise data or irrelevant features in regression problems. We apply our model to the face recognition problem and show that it not only reduces the impact of noise pixels but also removes more irrelevant face images. Our experiments demonstrate the promising results on two datasets.

  • RESEARCH ARTICLE
    Yuling MA, Chaoran CUI, Jun YU, Jie GUO, Gongping YANG, Yilong YIN

    In higher education, the initial studying period of each course plays a crucial role for students, and seriously influences the subsequent learning activities. However, given the large size of a course’s students at universities, it has become impossible for teachers to keep track of the performance of individual students. In this circumstance, an academic early warning system is desirable, which automatically detects students with difficulties in learning (i.e., at-risk students) prior to a course starting. However, previous studies are not well suited to this purpose for two reasons: 1) they have mainly concentrated on e-learning platforms, e.g., massive open online courses (MOOCs), and relied on the data about students’ online activities, which is hardly accessed in traditional teaching scenarios; and 2) they have only made performance prediction when a course is in progress or even close to the end. In this paper, for traditional classroomteaching scenarios, we investigate the task of pre-course student performance prediction, which refers to detecting at-risk students for each course before its commencement. To better represent a student sample and utilize the correlations among courses, we cast the problem as a multi-instance multi-label (MIML) problem. Besides, given the problem of data scarcity, we propose a novel multi-task learning method, i.e., MIML-Circle, to predict the performance of students from different specialties in a unified framework. Extensive experiments are conducted on five real-world datasets, and the results demonstrate the superiority of our approach over the state-of-the-art methods.

  • RESEARCH ARTICLE
    Lei CHEN, Kai SHAO, Xianzhong LONG, Lingsheng WANG

    Survival analysis aims to predict the occurrence time of a particular event of interest, which is crucial for the prognosis analysis of diseases. Currently, due to the limited study period and potential losing tracks, the observed data inevitably involve some censored instances, and thus brings a unique challenge that distinguishes from the general regression problems. In addition, survival analysis also suffers from other inherent challenges such as the high-dimension and small-sample-size problems. To address these challenges, we propose a novel multi-task regression learning model, i.e., prior information guided transductive matrix completion (PigTMC) model, to predict the survival status of the new instances. Specifically, we use the multi-label transductive matrix completion framework to leverage the censored instances together with the uncensored instances as the training samples, and simultaneously employ the multi-task transductive feature selection scheme to alleviate the overfitting issue caused by high-dimension and small-sample-size data. In addition, we employ the prior temporal stability of the survival statuses at adjacent time intervals to guide survival analysis. Furthermore, we design an optimization algorithm with guaranteed convergence to solve the proposed PigTMC model. Finally, the extensive experiments performed on the real microarray gene expression datasets demonstrate that our proposed model outperforms the previously widely used competing methods.

  • RESEARCH ARTICLE
    Fangfang LIU, Yan SHEN, Tienan ZHANG, Honghao GAO

    Knowledge bases (KBs) are far from complete, necessitating a demand for KB completion. Among various methods, embedding has received increasing attention in recent years. PTransE, an important approach using embedding method in KB completion, considers multiple-step relation paths based on TransE, but ignores the association between entity and their related entities with the same direct relationships. In this paper, we propose an approach called EPTransE, which considers this kind of association. As a matter of fact, the dissimilarity of these related entities should be taken into consideration and it should not exceed a certain threshold. EPTransE adjusts the embedding vector of an entity by comparing it with its related entities which are connected by the same direct relationship. EPTransE further makes the euclidean distance between them less than a certain threshold. Therefore, the embedding vectors of entities are able to contain rich semantic information, which is valuable for KB completion. In experiments, we evaluated our approach on two tasks, including entity prediction and relation prediction. Experimental results show that our idea of considering the dissimilarity of related entities with the same direct relationships is effective.

  • RESEARCH ARTICLE
    Zhihan JIANG, Yan LIU, Xiaoliang FAN, Cheng WANG, Jonathan LI, Longbiao CHEN

    A comprehensive understanding of city structures and urban dynamics can greatly improve the efficiency and quality of urban planning and management, while the traditional approaches of which, such as manual surveys, usually incur substantial labor and time. In this paper, we propose a data-driven framework to sense urban structures and dynamics from large-scale vehicle mobility data. First, we divide the city into fine-grained grids, and cluster the grids with similar mobility features into structured urban areas with a proposed distance-constrained clustering algorithm (DCCA). Second, we detect irregular mobility traffic patterns in each area leveraging an ARIMA-based anomaly detection algorithm (ADAM), and correlate them to the urban social and emergency events. Finally, we build a visualization system to demonstrate the urban structures and crowd dynamics.We evaluate our framework using real-world datasets collected from Xiamen city, China, and the results show that the proposed framework can sense urban structures and crowd comprehensively and effectively.

  • RESEARCH ARTICLE
    Hui ZHONG, Zaiyi CHEN, Chuan QIN, Zai HUANG, Vincent W. ZHENG, Tong XU, Enhong CHEN

    Adaptive learning rate methods have been successfully applied in many fields, especially in training deep neural networks. Recent results have shown that adaptive methods with exponential increasing weights on squared past gradients (i.e., ADAM, RMSPROP) may fail to converge to the optimal solution. Though many algorithms, such as AMSGRAD and ADAMNC, have been proposed to fix the non-convergence issues, achieving a data-dependent regret bound similar to or better than ADAGRAD is still a challenge to these methods. In this paper, we propose a novel adaptive method weighted adaptive algorithm (WADA) to tackle the non-convergence issues. Unlike AMSGRAD and ADAMNC, we consider using a milder growing weighting strategy on squared past gradient, in which weights grow linearly. Based on this idea, we propose weighted adaptive gradient method framework (WAGMF) and implement WADA algorithm on this framework. Moreover, we prove that WADA can achieve a weighted data-dependent regret bound, which could be better than the original regret bound of ADAGRAD when the gradients decrease rapidly. This bound may partially explain the good performance of ADAM in practice. Finally, extensive experiments demonstrate the effectiveness of WADA and its variants in comparison with several variants of ADAM on training convex problems and deep neural networks.

  • LETTER
    Yonghao LONG, Zhihao ZANG, Xiangping CHEN, Fan ZHOU, Xiaonan LUO
  • RESEARCH ARTICLE
    Yao LU, Xinjun MAO, Tao WANG, Gang YIN, Zude LI

    College students majoring in computer science and software engineering need to master skills for highquality programming. However, rich research has shown that both the teaching and learning of high-quality programming are challenging and deficient in most college education systems. Recently, the continuous inspection paradigm has been widely used by developers on social coding sites (e.g., GitHub) as an important method to ensure the internal quality of massive code contributions. This paper presents a case where continuous inspection is introduced into the classroom setting to improve students’ programming quality. In the study, we first designed a specific continuous inspection process for students’ collaborative projects and built an execution environment for the process. We then conducted a controlled experiment with 48 students from the same course during two school years to evaluate how the process affects their programming quality. Our results show that continuous inspection can help students in identifying their bad coding habits, mastering a set of good coding rules and significantly reducing the density of code quality issues introduced in the code. Furthermore,we describe the lessons learned during the study and propose ideas to replicate and improve the process and its execution platform.