On essential topics of BYY harmony learning: Current status, challenging issues, and gene analysis applications

Lei XU

doi:10.1007/s11460-012-0190-2

PDF(1760 KB)

Front. Electr. Electron. Eng. ›› 2012, Vol. 7 ›› Issue (1) : 147-196. DOI: 10.1007/s11460-012-0190-2

RESEARCH ARTICLE

On essential topics of BYY harmony learning: Current status, challenging issues, and gene analysis applications

Lei XU

Author information +

History +

Abstract

As a supplementary of [Xu L. Front. Electr. Electron. Eng. China, 2010, 5(3): 281-328], this paper outlines current status of efforts made on Bayesian Ying-Yang (BYY) harmony learning, plus gene analysis applications. At the beginning, a bird’s-eye view is provided via Gaussian mixture in comparison with typical learning algorithms and model selection criteria. Particularly, semi-supervised learning is covered simply via choosing a scalar parameter. Then, essential topics and demanding issues about BYY system design and BYY harmony learning are systematically outlined, with a modern perspective on Yin-Yang viewpoint discussed, another Yang factorization addressed, and coordinations across and within Ying-Yang summarized. The BYY system acts as a unified framework to accommodate unsupervised, supervised, and semi-supervised learning all in one formulation, while the best harmony learning provides novelty and strength to automatic model selection. Also, mathematical formulation of harmony functional has been addressed as a unified scheme for measuring the proximity to be considered in a BYY system, and used as the best choice among others. Moreover, efforts are made on a number of learning tasks, including a mode-switching factor analysis proposed as a semi-blind learning framework for several types of independent factor analysis, a hidden Markov model (HMM) gated temporal factor analysis suggested for modeling stationary temporal dependence, and a two-level hierarchical Gaussian mixture extended to cover semi-supervised learning, as well as a manifold learning modified to facilitate automatic model selection. Finally, studies are applied to the problems of gene analysis, such as genome-wide association, exome sequencing analysis, and gene transcriptional regulation.

Keywords

Bayesian Ying-Yang (BYY) harmony learning / harmony functional / automatic model selection / Gaussian mixture / hidden Markov model (HMM) gated temporal factor analysis / hierarchical Gaussian mixture / manifold learning / semi-supervised learning / semi-blind learning / genome-wide association / exome sequencing analysis / gene transcriptional regulation

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Lei XU. On essential topics of BYY harmony learning: Current status, challenging issues, and gene analysis applications. Front Elect Electr Eng, 2012, 7(1): 147‒196 https://doi.org/10.1007/s11460-012-0190-2

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Xu L. Bayesian Ying-Yang system, best harmony learning, and five action circling. A special issue on Emerging Themes on Information Theory and Bayesian Approach. Frontiers of Electrical and Electronic Engineering in China, 2010, 5(3): 281-328 CrossRef Google scholar

[2]	Xu L. Bayesian-Kullback coupled YING-YANG machines: Unified learning and new results on vector quantization. In: Proceedings of the International Conference on Neural Information Processing. 1995, 977-988 (A further version in NIPS8. In: Touretzky D S, et al. eds. Cambridge: MIT Press, 444-450)

[3]

Xu L. Codimensional matrix pairing perspective of BYY harmony learning: Hierarchy of bilinear systems, joint decomposition of data-covariance, and applications of network biology. A special issue on Machine Learning and Intelligence Science: IScIDE2010 (A). Frontiers of Electrical and Electronic Engineering in China, 2011, 6(1): 86-119

CrossRef Google scholar

[4]	Xu L. Advances on BYY harmony learning: Information theoretic perspective, generalized projection geometry, and independent factor autodetermination. IEEE Transactions on Neural Networks, 2004, 15(4): 885-902 CrossRef Google scholar

[5]	Xu L. Temporal BYY encoding, Markovian state spaces, and space dimension determination. IEEE Transactions on Neural Networks, 2004, 15(5): 1276-1295 CrossRef Google scholar

[6]	Xu L. Bayesian Ying Yang system, best harmony learning, and Gaussian manifold based family. In: Zurada et al. eds. Computational Intelligence: Research Frontiers, WCCI2008 Plenary/Invited Lectures. Lecture Notes in Computer Science, 2008, 5050: 48-78

[7]

Shi L, Tu S K, Xu L. Learning Gaussian mixture with automatic model selection: A comparative study on three Bayesian related approaches. A special issue on Machine Learning and Intelligence Science: IScIDE2010 (B). Frontiers of Electrical and Electronic Engineering in China, 2011, 6(2): 215-244

CrossRef Google scholar

[8]	Shore J. Minimum cross-entropy spectral analysis. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1981, 29(2): 230-237 CrossRef Google scholar

[9]	Burg J P, Luenberger D G, Wenger D L. Estimation of structured covariance matrices. Proceedings of the IEEE, 1982, 70(9): 963-974 CrossRef Google scholar

[10]	Jaynes E T. Information theory and statistical mechanics. Physical Review, 1957, 106(4): 620-630 CrossRef Google scholar

[11]	Schwarz G. Estimating the dimension of a model. Annals of Statistics, 1978, 6(2): 461-464 CrossRef Google scholar

[12]	MacKay D J C. A practical Bayesian framework for backpropagation networks. Neural Computation, 1992, 4(3): 448-472 CrossRef Google scholar

[13]	Attias H. A variational Bayesian framework for graphical models. Advances in Neural Information Processing Systems, 2000, 12: 209-215

[14]	McGrory C A, Titterington D M. Variational approximations in Bayesian model selection for finite mixture distributions. Computational Statistics & Data Analysis, 2007, 51(11): 5352-5367 CrossRef Google scholar

[15]	Amari S I, Cichocki A, Yang H. A new learning algorithm for blind separation of sources. In: Touretzky D S, Mozer M C, Hasselmo M E, eds. Advances in Neural Information Processing System 8. Cambridge: MIT Press, 1996, 757-763

[16]	Bell A J, Sejnowski T J. An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 1995, 7(6): 1129-1159 CrossRef Google scholar

[17]	Xu L. Independent subspaces. In: Ramón J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence. Hershey, PA: IGI Global, 2008, 903-912 CrossRef Google scholar

[18]	Bahl L, Brown P, de Souza P, Mercer R. Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: Proceedings of 1986 IEEE International Conference on Acoustics, Speech, and Signal Processing. 1986, 11: 49-52

[19]	Valtchev V, Odell J J, Woodland P C, Young S J. MMIE training of large vocabulary recognition systems. Speech Communication, 1997, 22(4): 303-314 CrossRef Google scholar

[20]	Liao J C, Boscolo R, Yang Y L, Tran L M, Sabatti C, Roychowdhury V P. Network component analysis: Reconstruction of regulatory signals in biological systems. Proceedings of the National Academy of Sciences of the United States of America, 2003, 100(26): 15522-15527 CrossRef Google scholar

[21]	Brynildsen M P, Tran L M, Liao J C. A Gibbs sampler for the identification of gene expression and network connectivity consistency. Bioinformatics, 2006, 22(24): 3040-3046 CrossRef Google scholar

[22]	Redner R A, Walker H F. Mixture densities, maximum likelihood, and the EM algorithm. SIAM Review, 1984, 26(2): 195-239 CrossRef Google scholar

[23]	Xu L, Jordan M I. On convergence properties of the EM algorithm for Gaussian mixtures. Neural Computation, 1996, 8(1): 129-151 CrossRef Google scholar

[24]	Xu L, Krzyzak A, Oja E. Rival penalized competitive learning for clustering analysis, RBF net, and curve detection. IEEE Transactions on Neural Networks, 1993, 4(4): 636-649 CrossRef Google scholar

[25]	Xu L. Best harmony, unified RPCL and automated model selection for unsupervised and supervised learning on Gaussian mixtures, three-layer nets and ME-RBF-SVM models. International Journal of Neural Systems, 2001, 11(1): 43-69 CrossRef Google scholar

[26]	Xu L. Bayesian Ying Yang system and theory as a unified statistical learning approach: (I) Unsupervised and semi-unsupervised learning. In: Amari S, Kassabov N, eds. Brain-like Computing and Intelligent Information Systems. Springer-Verlag, 1997, 241-274

[27]	Salah A A, Alpaydin E. Incremental mixtures of factor analyzers. In: Proceedings the 17th International Conference on Pattern Recognition. 2004, 1: 276-279

[28]	Williams P M. Bayesian regularization and pruning using a Laplace prior. Neural Computation, 1995, 7(1): 117-143 CrossRef Google scholar

[29]	Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B: Methodological, 1996, 58(1): 267-288

[30]	Figueiredo M A F, Jain A K. Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(3): 381-396 CrossRef Google scholar

[31]	Corduneanu A, Bishop C M. Variational Bayesian model selection for mixture distributions. In: Richardson T, Jaakkola T, eds. Proceedings of the Eighth International Conference on Artificial Intelligence and Statistics. 2001, 27-34

[32]	Wallace C S, Dowe D L. Minimum message length and Kolmogorov complexity. Computer Journal, 1999, 42(4): 270-283 CrossRef Google scholar

[33]

Xu L. Bayesian Ying Yang system and theory as a unified statistical learning approach (III): Models and algorithms for dependence reduction, data dimension reduction, ICA and supervised learning. In: Wong K M, et al. eds. Theoretical Aspects of Neural Computation: A Multidisciplinary Perspective. Springer-Verlag, 1997, 43-60

[34]	Tu S K, Xu L. Parameterizations make different model selections: Empirical findings from factor analysis. A special issue on Machine Learning and Intelligence Science: IScIDE2010 (B). Frontiers of Electrical and Electronic Engineering in China, 2011, 6(2): 256-274 CrossRef Google scholar

[35]	Xu L. BYY harmony learning, structural RPCL, and topological self-organizing on mixture models. Neural Networks, 2002, (8-9): 1125-1151 CrossRef Google scholar

[36]	Ghahramani Z, Beal M. Variational inference for Bayesian mixtures of factor analysers. Advances in neural information processing systems 12. Cambridge, MA: MIT Press, 2000, 449-455

[37]	Utsugi A, Kumagai T. Bayesian analysis of mixtures of factor analyzers. Neural Computation, 2001, 13(5): 993-1002 CrossRef Google scholar

[38]	Xu L. Learning algorithms for RBF functions and subspace based functions. In: Olivas E, et al. eds. Handbook of Research on Machine Learning, Applications and Trends: Algorithms, Methods and Techniques. Hershey, PA: IGI Global, 2009, 60-94 CrossRef Google scholar

[39]	Xu L. BYY P-Q factor systems and harmony learning. Invited talk. In: Proceedings of International Conference on Neural Information Processing (ICONIP’2000). 2000, 1: 548-558

[40]	Xu L. BYY harmony learning, independent state space, and generalized APT financial analyses. IEEE Transactions on Neural Networks, 2001, 12(4): 822-849 CrossRef Google scholar

[41]	Xu L. A unified perspective and new results on RHT computing, mixture based learning, and multi-learner based problem solving. Pattern Recognition, 2007, 40(8): 2129-2153 CrossRef Google scholar

[42]	Xu L. Bayesian Ying Yang learning. In: Zhong N, Liu J, eds. Intelligent Technologies for Information Analysis. Berlin: Springer, 2004, 615-706

[43]	Barron A, Rissanen J, Yu B. The minimum description length principle in coding and modeling. IEEE Transactions on Information Theory, 1998, 44(6): 2743-2760 CrossRef Google scholar

[44]	Bishop C M. Training with noise is equivalent to Tikhonov regularization. Neural Computation, 1995, 7(1): 108-116 CrossRef Google scholar

[45]	Zhou Z H. When semi-supervised learning meets ensemble learning. A special issue on Machine Learning and Intelligence Science: IScIDE2010 (A). Frontiers of Electrical and Electronic Engineering in China, 2011, 6(1): 6-16 CrossRef Google scholar

[46]	Xu L. RBF nets, mixture experts, and Bayesian Ying-Yang learning. Neurocomputing, 1998, 19(1-3): 223-257 CrossRef Google scholar

[47]	Xu L. Independent component analysis and extensions with noise and time: A Bayesian Ying-Yang learning perspective. Neural Information Processing—Letters and Reviews, 2003, 1(1): 1-52

[48]	Xu L. BYY learning, regularized implementation, and model selection on modular networks with one hidden layer of binary units. Neurocomputing, 2003, 51: 277-301 CrossRef Google scholar

[49]	Shilov G E, Gurevich B L. Integral, Measure, and Derivative: A Unified Approach. Silverman R trans. New York: Dover Publications, 1978

[50]	Povey D, Woodland P C. Minimum phone error and Ismothing for improved discriminative training. In: Proceedings of 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing. 2002, 1: 105-108

[51]	Juang B H, Katagiri S. Discriminative learning for minimum error classification. IEEE Transactions on Signal Processing, 1992, 40(12): 3043-3054 CrossRef Google scholar

[52]	Juang B H, Chou W, Lee C H. Minimum classification error rate methods for speech recognition. IEEE Transactions on Speech and Audio Processing, 1997, 5(3): 257-265 CrossRef Google scholar

[53]	Saul L K, Rahim M G. Maximum likelihood and minimum classification error factor analysis for automatic speech recognition. IEEE Transactions on Speech and Audio Processing, 2000, 8(2): 115-125 CrossRef Google scholar

[54]	Rissanen J. Modeling by shortest data description. Automatica, 1978, 14(5): 465-471 CrossRef Google scholar

[55]	Hinton G E, Dayan P, Frey B J, Neal R M. The “wake-sleep” algorithm for unsupervised neural networks. Science, 1995, 268(5214): 1158-1161 CrossRef Google scholar

[56]	Xu L, Oja E, Suen C Y. Modified Hebbian learning for curve and surface fitting. Neural Networks, 1992, 5(3): 441-457 CrossRef Google scholar

[57]	Xu L, Krzyzak A, Oja E. A neural net for dual subspace pattern recognition methods. International Journal of Neural Systems, 1991, 2(3): 169-184 CrossRef Google scholar

[58]	Hinton G E, Zemel R S. Autoencoders, minimum description length and Helmholtz free energy. In: Cowan J D, Tesauro G, Alspector J, eds. Advances in Neural Information Processing Systems 6. San Mateo: Morgan Kaufmann, 1994, 449-455

[59]	Xu L, Krzyzak A, Oja E. Unsupervised and supervised classifications by rival penalized competitive learning. In: Proceedings of the 11th International Conference on Pattern Recognition. 1992, I: 672-675

[60]	Xu L. BYY data smoothing based learning on a small size of samples. In: Proceedings of International Joint Conference on Neural Networks. 1999, 1: 546-551

[61]	Xu L. Temporal BYY learning for state space approach, hidden Markov model, and blind source separation. IEEE Transactions on Signal Processing, 2000, 48(7): 2132-2144 CrossRef Google scholar

[62]	Xu L. Machine learning problems from optimization perspective. Journal of Global Optimization, 2010, 47(3): 369-401 CrossRef Google scholar

[63]

Xu L. Bayesian Ying Yang system and theory as a unified statistical learning approach: (II) From unsupervised learning to supervised learning, and temporal modeling. In: Wong K M, King I, Yeung D Y, eds. Proceedings of Theoretical Aspects of Neural Computation: A Multidisciplinary Perspective. 1997, 29-42

[64]	Xu L. Bayesian-Kullback YING-YANG machines for supervised learning. In: Proceedings of the 1996 World Congress On Neural Networks. San Diego, CA, 1996, 193-200

[65]	Xu L. Bayesian Kullback Ying-Yang dependence reduction theory. Neurocomputing, 1998, 22(1-3): 81-111 CrossRef Google scholar

[66]	Xu L. Bayesian Ying-Yang system and theory as a unified statistical learning approach: (V) Temporal modeling for temporal perception and control. In: Proceedings of the International Conference on Neural Information Processing. 1998, 2: 877-884

[67]	Xu L. New advances on Bayesian Ying-Yang learning system with Kullback and non-Kullback separation functionals. In: Proceedings of 1997 IEEE-(INNS) Conference on Neural Networks. 1997, 3: 1942-1947

[68]	Xu L. Bayesian Ying-Yang machine, clustering and number of clusters. Pattern Recognition Letters, 1997, 18(11-13): 1167-1178 CrossRef Google scholar

[69]	Xu L. How many clusters?: A YING-YANG machine based theory for a classical open problem in pattern recognition. In: Proceedings of the 1996 IEEE International Conference on Neural Networks. 1996, 3: 1546-1551

[70]	Xu L. Bayesian Ying-Yang theory for empirical learning, regularization, and model selection: General formulation. In: Proceedings of International Joint Conference on Neural Networks. 1999, 1: 552-557

[71]	Xu L. Temporal BYY learning and its applications to extended Kalman filtering, hidden Markov model, and sensormotor integration. In: Proceedings of International Joint Conference on Neural Networks. 1999, 2: 949-954

[72]	Xu L. Temporal factor analysis: Stable-identifiable family, orthogonal flow learning, and automated model selection. In: Proceedings of International Joint Conference on Neural Networks. 2002, 472-476

[73]	Csiszár I, Tusnády G. Information geometry and alternating minimization procedures. Statistics and Decisions, 1984, (Suppl 1): 205-237

[74]	Xu L. Temporal Bayesian Ying-Yang dependence reduction, blind source separation and principal independent components. In: Proceedings of International Joint Conference on Neural Networks. 1999, 2: 1071-1076

[75]	Pang Z H, Tu S K, Su D, Wu X H, Xu L. Discriminative training of GMM-HMM acoustic model by RPCL learning. A special issue on Machine Learning and Intelligence Science: IScIDE2010 (B). Frontiers of Electrical and Electronic Engineering in China, 2011, 6(2): 283-290 CrossRef Google scholar

[76]	Amari S, Nagaoka H. Methods of Information Geometry. London, U.K.: Oxford University Press, 2000

[77]	Belouchrani A, Cardoso J. Maximum likelihood source separation by the expectation maximization technique: deterministic and stochastic implementation. In. Proceedings of NOLTA95. 1995, 49-53

[78]	McLachlan G J, Krishnan T. The EM Algorithms and Extensions. New York: John Wiley and Sons, 1997

[79]	Shi L, Tu S K, Xu L. Gene clustering by structural prior based local factor analysis model under Bayesian Ying-Yang harmony learning. In: Proceedings of the 2010 International Conference on Bioinformatics and Biomedicine. 2010, 696-699

[80]	Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B: Methodological, 1996, 58(1): 267-288

[81]	Park M Y, Hastie T. Penalized logistic regression for detecting gene interactions. Biostatistics, 2008, 9(1): 30-50 CrossRef Google scholar

[82]	Brown R G, Hwang P Y C. Introduction to Random Signals and Applied Kalman Filtering. 3rd ed. New York: John Wiley and Sons, 1997

[83]	Roweis S, Ghahramani Z. A unifying review of linear Gaussian models. Neural Computation, 1999, 11(2): 305-345 CrossRef Google scholar

[84]	Ghahramani Z, Hinton G E. Variational learning for switching state-space models. Neural Computation, 2000, 12(4): 831-864 CrossRef Google scholar

[85]	Shumway R H, Stoffer D S. An approach to time series smoothing and forecasting using the EM algorithm. Journal of Time Series Analysis, 1982, 3(4): 253-264 CrossRef Google scholar

[86]	Shumway R H, Stoffer D S. Dynamic linear models with switching. Journal of the American Statistical Association, 1991, 86(415): 763-769 CrossRef Google scholar

[87]	Digalakis V, Rohlicek J R, Ostendorf M. ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition. IEEE Transactions on Speech and Audio Processing, 1993, 1(4): 431-442 CrossRef Google scholar

[88]

Wang P H, Shi L, Du L, Liu H W, Xu L, Bao Z. Radar HRRP statistical recognition with temporal factor analysis by automatic Bayesian Ying-Yang harmony learning. A special issue on Machine Learning and Intelligence Science: IScIDE2010 (B). Frontiers of Electrical and Electronic Engineering in China, 2011, 6(2): 300-317

CrossRef Google scholar

[89]	Gales M J F, Young S. The application of hidden Markov models in speech recognition. Foundations and Trends in Signal Processing, 2008, 1(3): 195-304 CrossRef Google scholar

[90]	Cordell H J. Detecting gene-gene interactions that underlie human diseases. Nature Reviews Genetics, 2009, 10(6): 392-404 CrossRef Google scholar

[91]	Phillips P C. Epistasis — The essential role of gene interactions in the structure and evolution of genetic systems. Nature Reviews Genetics, 2008, 9(11): 855-867 CrossRef Google scholar

[92]	Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira M A, Bender D, Maller J, Sklar P, de Bakker P I, Daly M J, Sham P C. PLINK: A tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics, 2007, 81(3): 559-575 CrossRef Google scholar

[93]	Ritchie M D, Hahn LW, Moore J H. Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Genetic Epidemiology, 2003, 24(2): 150-157 CrossRef Google scholar

[94]	Xu L, Amari S. Combining classifiers and learning mixtureof- experts. In: Ramón J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence. Hershey, PA: IGI Global, 2008, 318-326 CrossRef Google scholar

[95]	Tu S K, Chen R S, Xu L. A binary matrix factorization algorithm for protein complex prediction. Proteome Science, 2011, 9(Suppl 1): S18 CrossRef Google scholar