Sep 2010, Volume 5 Issue 3
    

  • Select all
  • Research articles
    Lei XU, Yanda LI,
  • Research articles
    Shun-ichi AMARI,
    The present article gives an introduction to information geometry and surveys its applications in the area of machine learning, optimization and statistical inference. Information geometry is explained intuitively by using divergence functions introduced in a manifold of probability distributions and other general manifolds. They give a Riemannian structure together with a pair of dual flatness criteria. Many manifolds are dually flat. When a manifold is dually flat, a generalized Pythagorean theorem and related projection theorem are introduced. They provide useful means for various approximation and optimization problems. We apply them to alternative minimization problems, YingYang machines and belief propagation algorithm in machine learning.
  • RESEARCH ARTICLE
    Erkki OJA, Zhirong YANG

    Nonnegativity has been shown to be a powerful principle in linear matrix decompositions, leading to sparse component matrices in feature analysis and data compression. The classical method is Lee and Seung’s Nonnegative Matrix Factorization. A standard way to form learning rules is by multiplicative updates, maintaining nonnegativity. Here, a generic principle is presented for forming multiplicative update rules, which integrate an orthonormality constraint into nonnegative learning. The principle, called Orthogonal Nonnegative Learning (ONL), is rigorously derived from the Lagrangian technique. As examples, the proposed method is applied for transforming Nonnegative Matrix Factorization (NMF) and its variant, Projective Nonnegative Matrix Factorization (PNMF), into their orthogonal versions. In general, it is well-known that orthogonal nonnegative learning can give very useful approximative solutions for problems involving non-vectorial data, for example, binary solutions. Combinatorial optimization is replaced by continuous-space gradient optimization which is often computationally lighter. It is shown how the multiplicative updates rules obtained by using the proposed ONL principle can find a nonnegative and highly orthogonal matrix for an approximated graph partitioning problem. The empirical results on various graphs indicate that our nonnegative learning algorithms not only outperform those without the orthogonality condition, but also surpass other existing partitioning approaches.

  • Research articles
    Jorma RISSANEN,
    This paper outlines a theory of estimation, where optimality is defined for all sizes of data — not only asymptotically. Also one principle is needed to cover estimation of both real-valued parameters and their number. To achieve this we have to abandon the traditional assumption that the observed data have been generated by a “true” distribution, and that the objective of estimation is to recover this from the data. Instead, the objective in this theory is to fit ‘models’ as distributions to the data in order to find the regular statistical features. The performance of the fitted models is measured by the probability they assign to the data: a large probability means a good fit and a small probability a bad fit. Equivalently, the negative logarithm of the probability should be minimized, which has the interpretation of code length. There are three equivalent characterizations of optimal estimators, the first defined by estimation capacity, the second to satisfy necessary conditions for optimality for all data, and the third by the complete Minimum Description Length (MDL) principle.
  • Research articles
    Alan YUILLE,
    This paper introduces computer vision from an information theory perspective. We discuss how vision can be thought of as a decoding problem where the goal is to find the most efficient encoding of the visual scene. This requires probabilistic models which are capable of capturing the complexity and ambiguities of natural images. We start by describing classic Markov Random Field (MRF) models of images. We stress the importance of having efficient inference and learning algorithms for these models and emphasize those approaches which use concepts from information theory. Next we introduce more powerful image models that have recently been developed and which are better able to deal with the complexities of natural images. These models use stochastic grammars and hierarchical representations. They are trained using images from increasingly large databases. Finally, we described how techniques from information theory can be used to analyze vision models and measure the effectiveness of different visual cues.
  • Research articles
    Jürgen SCHMIDHUBER,
    Most traditional artificial intelligence (AI) systems of the past decades are either very limited, or based on heuristics, or both. The new millennium, however, has brought substantial progress in the field of theoretically optimal and practically feasible algorithms for prediction, search, inductive inference based on Occam’s razor, problem solving, decision making, and reinforcement learning in environments of a very general type. Since inductive inference is at the heart of all inductive sciences, some of the results are relevant not only for AI and computer science but also for physics, provoking nontraditional predictions based on Zuse’s thesis of the computer-generated universe. We first briefly review the history of AI since Gödel’s 1931 paper, then discuss recent post-2000 approaches that are currently transforming general AI research into a formal science.
  • Research articles
    Raymond W. YEUNG,
    For a long time, store-and-forward had been the transport mode in network communications. In other words, information had been regarded as a commodity that only needs to be routed through the network, possibly with replication at the intermediate nodes. In the late 1990’s, a new concept called network coding fundamentally changed the way a network can be operated. Under the paradigm of network coding, information can be processed within the network for the purpose of transmission. It was demonstrated that compared with store-and-forward, the network throughput can generally be increased by employing network coding. Since then, network coding has made significant impact on different branches of information science. The impact of network coding has gone as far as mathematics, hysics, and biology. This expository work aims to be an introduction to this fast-growing subject with a detailed discussion of the basic theoretical results.
  • Research articles
    Runsheng CHEN, Geir SKOGERBØ - 12-d ,
    Since the launching of the human genome sequencing project in the 1990s, genomic research has already achieved definite results. At the beginning of the present century, the complete genomes of several model organisms have already been sequenced, including a number of prokaryote microorganisms and the eukaryotes yeast (Saccharomyces cerevisiae), nematode (C. elegans), fruit fly (Drosophila melanogaster) and thale cress (Arabidopsis thaliana) as well as the major part of the human genome. These achievements signified that a new era of data mining and analysis on the human genome had commenced. The language of human genetics would gradually be read and understood, and the genetic information underlying metabolism, development, differentiation and evolution would progressively become known to mankind. Large amounts of data are already accumulating, but at present many of the rules that should guide the understanding of this information are yet unknown. Bioinformatics research is thus not only becoming more important, but is also faced with severe challenges as well as great opportunities.
  • Research articles
    Yanda LI,
    A computer virus is merely a small piece of program in nature, and similar to that of a computer virus, an organism may be considered as an information system in nature. This paper analyzes above idea from different ways. 1) DNA sequence satisfies the basic requirements of an information system; 2) The controls of a man and a robot both obey the principle of cybernetics; 3) How a man can have ideas but a robot has no such capacity; 4) The advantages of understanding a living organism from the point of view of information systems.