Dec 2010, Volume 4 Issue 4
    

  • Select all
  • Research articles
    Guoliang CHEN,
  • Research articles
    Xianghui XIE, Xing FANG, Sutai HU, Dong WU,
    Supercomputers are prevalent and vital to scientific research and industrial fields, and may be used to represent the level of national scientific development. A summary of the evolution of supercomputers will help direct the future development of supercomputers and supercomputing applications. In this paper, we summarize the accomplishments in supercomputing, predict the trend of future supercomputers, and present several breakthroughs in supercomputer architecture research.
  • Research articles
    Yunquan ZHANG, Jiachang SUN, Guoxing YUAN, Linbo ZHANG,
    The China HPC TOP100 list, an annual report of the 100 most powerful high performance computing (HPC) systems installed in mainland China, has traced the rapid growth of HPC technology in China since its first publication in 2002. This paper introduces the China HPC TOP100 list and reviews the current status of HPC systems in China in terms of system features, manufactures, and areas of application using the data reported in the most recent list, published on November 1st, 2009. We provide further analysis, prediction of future trends, and directions of the development of HPC systems in China referencing historical data accumulated through archived TOP100 lists and other publically available information. We predict that the aggregated Linpack performance of the top 100 HPC systems will reach 10 PFlops in 2011, a single system with 10 PFlops peak performance will appear between 2012 and 2013, the aggregated performance of the top 100 systems will reach 100 PFlops in 2014, and a single system with 100 PFlops peak performance will appear around 2015.
  • Research articles
    Xuejun YANG, Xiangke LIAO, Weixia XU, Junqiang SONG, Qingfeng HU, Jinshu SU, Liquan XIAO, Kai LU, Qiang DOU, Juping JIANG, Canqun YANG,
    In recent years, heterogeneous systems and cooperative computing have become popular research directions in the field of high performance computing. With fast scaling of the size of high performance computer systems, problems such as power consumption and reliability come to the forefront. The aim of high performance computing has thus shifted from merely seeking peak performance to comprehensively pursuing high efficiency, which takes into consideration many factors including performance, cost, power, reliability and so on. A heterogeneous computing system consisting of general-purpose CPU(s) and special-purpose accelerator(s) features high performance, lower power consumption and low cost, etc. Hence, it has already become the mainstream in the field of high performance computing. However, such systems still face many challenges and problems, for example, programmability and reliability. In this paper, we firstly analyze the main challenges facing heterogeneous computing systems. Then, we introduce the architecture of the first petaflop computing system in China, the Tianhe-1 (TH-1) heterogeneous system, including its hardware/software interface and interconnect network. During development of the TH-1 system, several challenges were encountered; research into the solutions of these challenges is subsequently presented.
  • Research articles
    Fei CHEN, Zheng CAO, Kai WANG, Xuejun AN, Ninhui SUN,
    This paper introduces the design of a hyper parallel processing (HPP) controller, which is a system controller used in heterogeneous high performance computing systems. It connects several heterogeneous processors via HyperTransport (HT) interfaces, a commercial Infiniband HCA card with PCI-express interface, and a customized global synchronization network with self-defined high-speed interface. To accelerate intra-node communication and synchronization, global address space is supported and some dedicated hardware is integrated in the HPP controller to enable intra-node memory and shared I/O resources. On the prototype system with the HPP controller, evaluation results show that the proposed design achieves high communication efficiency, and obvious acceleration to synchronization operations.
  • Research articles
    Qiang LI, Bo LI, Zhigang HUO, Ninghui SUN,
    An increasing number of supercomputers adopt a heterogeneous architecture, consisting of both general purpose CPUs and specialized accelerators. Such design is beneficial for scalability and power, but on the other hand, heterogeneity brings new challenges in communication systems to connect heterogeneous components and provide support for programming. The communication system of the Dawning 6000 connects two kinds of heterogeneous processors, Loongson and AMD, and adopts a three layer architecture with an intra-node layer between heterogeneous components. To efficiently connect heterogeneous components, the system forms a global address space and provides a mechanism for message transmission via an in-node global store; and employing Infiniband network, provides an OS-bypassing virtualization method to share an Infiniband card between nodes. To facilitate programming on heterogeneous processors, it supports unified parallel C (UPC), with a modified complier based on global address space. Also, a special collective network is implemented for collective operations. Results obtained from a prototype system prove these features to be both feasible and efficient.
  • Research articles
    Mingfa ZHU, Limin XIAO, Li RUAN, Qinfen HAO,
    Today, cluster-based computing is the mainstream architecture for high end computer systems. Balanced system design is critical for large scale cluster systems to achieve high efficiency. This paper addresses the practice on DeepComp high end computer systems toward a balanced system design. Methodologies of designing balanced large scale cluster systems are given. A method for balancing central processing unit (CPU) and memory hierarchy is addressed. For balancing computing nodes and I/O systems, two approaches are given: maximum bandwidth criterion and maximum number of computing nodes which can concurrently access I/O systems. Experiences of Lenovo high end cluster systems show that above methods are effective. Lenovo strategies toward a balanced system design for both peta and 10 peta scale high productivity computing systems (HPCSs).
  • Research articles
    Zeyao MO, Aiqing ZHANG, Xiaolin CAO, Qingkai LIU, Xiaowen XU, Hengbin AN, Wenbing PEI, Shaoping ZHU,
    The exponential growth of computer power in the last 10 years is now creating a great challenge for parallel programming toward achieving realistic performance in the field of scientific computing. To improve on the traditional program for numerical simulations of laser fusion in inertial confinement fusion (ICF), the Institute of Applied Physics and Computational Mathematics (IAPCM) initializes a software infrastructure named J Adaptive Structured Meshes applications INfrastructure (JASMIN) in 2004. The main objective of JASMIN is to accelerate the development of parallel programs for large scale simulations of complex applications on parallel computers. Now, JASMIN has released version 1.8 and has achieved its original objectives. Tens of parallel programs have been reconstructed or developed on thousands of processors. JASMIN promotes a new paradigm of parallel programming for scientific computing. In this paper, JASMIN is briefly introduced.
  • Research articles
    Jianxi FAN, Shukui ZHANG, Wujun ZHOU, Baolei CHENG, Kenli LI,
    The dimensions of twisted cubes are only limited to odd integers. In this paper, we first extend the dimensions of twisted cubes to all positive integers. Then, we introduce the concept of the restricted faulty set into twisted cubes. We further prove that under the condition that each node of the n-dimensional twisted cube TQn has at least one fault-free neighbor, its restricted connectivity is 2n − 2, which is almost twice as that of TQn under the condition of arbitrary faulty nodes, the same as that of the n-dimensional hypercube. Moreover, we provide an O(NlogN) fault-free unicast algorithm and simulations result of the expected length of the fault-free path obtained by our algorithm, where N denotes the node number of TQn. Finally, we propose a polynomial algorithm to check whether the faulty node set satisfies the condition that each node of the n-dimensional twisted cube TQn has at least one fault-free neighbor.
  • Research articles
    Ye TIAN, Bangchuan LIU, Zhenhua HE,
    With the success of Internet video-on-demand (VoD) streaming services, the bandwidth required and the financial cost incurred by the host of the video server becoming extremely large. Peer-to-peer (P2P) networks and proxies are two common ways for reducing the server workload. In this paper, we consider a peer-assisted Internet VoD system with proxies deployed at domain gateways. We formally present the video caching problem with the objectives of reducing the video server workload and avoiding inter-domain traffic, and we obtain its optimal solution. Inspired by theoretical analysis, we develop a practical protocol named PopCap for Internet VoD services. Compared with previous work, PopCap does not require additional infrastructure support, is inexpensive, and able to cope well with the characteristic workloads of Internet VoD services. From simulation-based experiments driven by real-world data sets from YouTube, we find that PopCap can effectively reduce the video server workload, therefore provides a superior performance regarding the video server’s traffic reduction.
  • Research articles
    Jiqiang LIU, Xun CHEN, Zhen HAN,
    Deniable authentication is a type of authentication protocol with the special property of deniability. However, there have been several different definitions of deniability in authentication protocols. In this paper, we clarify this issue by defining two types of deniable authentication: In the first type of deniable authentication, the receiver of the authenticated message cannot prove to a third party that the sender has authenticated any message to him. We call this type of deniability full deniability. In the second type of deniable authentication, whilst the receiver can prove to a third party that the sender has authenticated some message to him, but he cannot prove to a third party that the sender has authenticated any particular message to the receiver. We call this type of deniability partial deniability. Note that partial deniability is not implied by full deniability, and that it has applications different from those of full deniability. Consequently, we present two identity-based authentication schemes and prove that one is fully deniable while the other is partially deniable. These two schemes can be useful in different scenarios.
  • Research articles
    Xinguang TIAN, Xueqi CHENG, Miyi DUAN, Rui LIAO, Hong CHEN, Xiaojuan CHEN,
    Anomaly intrusion detection is currently an active research topic in the field of network security. This paper proposes a novel method for detecting anomalous program behavior, which is applicable to host-based intrusion detection systems monitoring system call activities. The method employs data mining techniques to model the normal behavior of a privileged program, and extracts normal system call sequences according to their supports and confidences in the training data. At the detection stage, a fixed-length sequence pattern matching algorithm is utilized to perform the comparison of the current behavior and historic normal behavior, which is less computationally expensive than the variable-length pattern matching algorithm proposed by Hofmeyr et al. At the detection stage, the temporal correlation of the audit data is taken into account, and two alternative schemes could be used to distinguish between normalities and intrusions. The method gives attention to both computational efficiency and detection accuracy, and is especially suitable for online detection. It has been applied to practical hosted-based intrusion detection systems, and has achieved high detection performance.
  • Research articles
    Zhixiong CHEN, Xiaoni DU,
    A construction of a family of generalized polyphase cyclotomic sequences of length pq is presented in terms of the generalized cyclotomic classes modulo pq. Their linear complexity and corresponding minimal polynomials are deduced. Some upper bounds on periodic and aperiodic autocorrelation values of resulting sequences are also estimated by using certain exponential sums.
  • Research articles
    Ruochen LIU, Licheng JIAO, Yangyang LI, Jing LIU,
    Inspired by the clonal selection theory together with the immune network model, we present a new artificial immune algorithm named the immune memory clonal algorithm (IMCA). The clonal operator, inspired by the immune system, is discussed first. The IMCA includes two versions based on different immune memory mechanisms; they are the adaptive immune memory clonal algorithm (AIMCA) and the immune memory clonal strategy (IMCS). In the AIMCA, the mutation rate and memory unit size of each antibody is adjusted dynamically. The IMCS realizes the evolution of both the antibody population and the memory unit at the same time. By using the clonal selection operator, global searching is effectively combined with local searching. According to the antibody-antibody (Ab-Ab) affinity and the antibody-antigen (Ab-Ag) affinity, The IMCA can adaptively allocate the scale of the memory units and the antibody population. In the experiments, 18 multimodal functions ranging in dimensionality from two, to one thousand and combinatorial optimization problems such as the traveling salesman and knapsack problems (KPs) are used to validate the performance of the IMCA. The computational cost per iteration is presented. Experimental results show that the IMCA has a high convergence speed and a strong ability in enhancing the diversity of the population and avoiding premature convergence to some degree. Theoretical roof is provided that the IMCA is convergent with probability 1.
  • Research articles
    Yong FENG, Zhongfu WU, Jiang ZHONG, Chunxiao YE, Kaigui WU,
    The central problem in training a radial basis function neural network (RBFNN) is the selection of hidden layer neurons, which includes the selection of the center and width of those neurons. In this paper, we propose an enhanced swarm intelligence clustering (ESIC) method to select hidden layer neurons, and then, train a cosine RBFNN based on the gradient descent learning process. Also, we apply this new method for classification of deep Web sources. Experimental results show that the average Precision, Recall and F of our ESIC-based RBFNN classifier achieve higher performance than BP, Support Vector Machines (SVM) and OLS RBF for our deep Web sources classification problems.
  • Research articles
    Zhipin DENG, Kebin JIA, Yui-Lam CHAN, Chang-Hong FU, Wan-Chi SIU,
    Multiview video involves a huge amount of data, and as such, efficiently encoding each view is a critical issue for its wider application. In this paper, a fast motion and disparity estimation algorithm is proposed, utilizing the close correlation between temporal and inter-view reference frames. First, a reliable predictor is found according to the correlation of motion and disparity vectors. Second, an iterative search process is carried out to find the optimal motion and disparity vectors. The proposed algorithm makes use of the prediction vector obtained in the previous motion estimation for the next disparity estimation and achieves both optimal motion and disparity vectors jointly. Experimental results demonstrate that the proposed algorithm can successfully save an average of 86% of computational time with a negligible quality drop when compared to the joint multiview video model (JMVM) full search algorithm. Furthermore, in comparison with the conventional simulcast coding, the proposed algorithm enhances the video quality and also greatly increases coding speed.
  • Research articles
    Jiangtao WANG, Debao CHEN, Jingyu YANG,
    Recognizing human action is a critical step in many computer vision applications. In this paper, the problem of human behavior classification is addressed from a periodic motion analysis viewpoint. Our approach uses human silhouettes as motion features that can be obtained efficiently, and then projected it into a lower dimensional space where matching is performed. After a periodic analysis, each action unit is represented as a closed loop in this lower dimensional space, and matching is done by computing the distances among these loops. The main contributions are twofold: (1) an efficient periodic action feature constructing method is introduced; and (2) the difference between action units with different phase is computed adaptively with a novel distance proposed in this work. To demonstrate the effectiveness of this approach, human behavior classification experiments were performed on an open dataset. Classification results are highly accurate and show that this approach is promising and efficient.