On essential topics of BYY harmony learning: Current status, challenging issues, and gene analysis applications

Lei XU

PDF(1760 KB)
PDF(1760 KB)
Front. Electr. Electron. Eng. ›› 2012, Vol. 7 ›› Issue (1) : 147-196. DOI: 10.1007/s11460-012-0190-2
RESEARCH ARTICLE
RESEARCH ARTICLE

On essential topics of BYY harmony learning: Current status, challenging issues, and gene analysis applications

Author information +
History +

Abstract

As a supplementary of [Xu L. Front. Electr. Electron. Eng. China, 2010, 5(3): 281-328], this paper outlines current status of efforts made on Bayesian Ying-Yang (BYY) harmony learning, plus gene analysis applications. At the beginning, a bird’s-eye view is provided via Gaussian mixture in comparison with typical learning algorithms and model selection criteria. Particularly, semi-supervised learning is covered simply via choosing a scalar parameter. Then, essential topics and demanding issues about BYY system design and BYY harmony learning are systematically outlined, with a modern perspective on Yin-Yang viewpoint discussed, another Yang factorization addressed, and coordinations across and within Ying-Yang summarized. The BYY system acts as a unified framework to accommodate unsupervised, supervised, and semi-supervised learning all in one formulation, while the best harmony learning provides novelty and strength to automatic model selection. Also, mathematical formulation of harmony functional has been addressed as a unified scheme for measuring the proximity to be considered in a BYY system, and used as the best choice among others. Moreover, efforts are made on a number of learning tasks, including a mode-switching factor analysis proposed as a semi-blind learning framework for several types of independent factor analysis, a hidden Markov model (HMM) gated temporal factor analysis suggested for modeling stationary temporal dependence, and a two-level hierarchical Gaussian mixture extended to cover semi-supervised learning, as well as a manifold learning modified to facilitate automatic model selection. Finally, studies are applied to the problems of gene analysis, such as genome-wide association, exome sequencing analysis, and gene transcriptional regulation.

Keywords

Bayesian Ying-Yang (BYY) harmony learning / harmony functional / automatic model selection / Gaussian mixture / hidden Markov model (HMM) gated temporal factor analysis / hierarchical Gaussian mixture / manifold learning / semi-supervised learning / semi-blind learning / genome-wide association / exome sequencing analysis / gene transcriptional regulation

Cite this article

Download citation ▾
Lei XU. On essential topics of BYY harmony learning: Current status, challenging issues, and gene analysis applications. Front Elect Electr Eng, 2012, 7(1): 147‒196 https://doi.org/10.1007/s11460-012-0190-2

References

[1]
Xu L. Bayesian Ying-Yang system, best harmony learning, and five action circling. A special issue on Emerging Themes on Information Theory and Bayesian Approach. Frontiers of Electrical and Electronic Engineering in China, 2010, 5(3): 281-328
CrossRef Google scholar
[2]
Xu L. Bayesian-Kullback coupled YING-YANG machines: Unified learning and new results on vector quantization. In: Proceedings of the International Conference on Neural Information Processing. 1995, 977-988 (A further version in NIPS8. In: Touretzky D S, et al. eds. Cambridge: MIT Press, 444-450)
[3]
Xu L. Codimensional matrix pairing perspective of BYY harmony learning: Hierarchy of bilinear systems, joint decomposition of data-covariance, and applications of network biology. A special issue on Machine Learning and Intelligence Science: IScIDE2010 (A). Frontiers of Electrical and Electronic Engineering in China, 2011, 6(1): 86-119
CrossRef Google scholar
[4]
Xu L. Advances on BYY harmony learning: Information theoretic perspective, generalized projection geometry, and independent factor autodetermination. IEEE Transactions on Neural Networks, 2004, 15(4): 885-902
CrossRef Google scholar
[5]
Xu L. Temporal BYY encoding, Markovian state spaces, and space dimension determination. IEEE Transactions on Neural Networks, 2004, 15(5): 1276-1295
CrossRef Google scholar
[6]
Xu L. Bayesian Ying Yang system, best harmony learning, and Gaussian manifold based family. In: Zurada et al. eds. Computational Intelligence: Research Frontiers, WCCI2008 Plenary/Invited Lectures. Lecture Notes in Computer Science, 2008, 5050: 48-78
[7]
Shi L, Tu S K, Xu L. Learning Gaussian mixture with automatic model selection: A comparative study on three Bayesian related approaches. A special issue on Machine Learning and Intelligence Science: IScIDE2010 (B). Frontiers of Electrical and Electronic Engineering in China, 2011, 6(2): 215-244
CrossRef Google scholar
[8]
Shore J. Minimum cross-entropy spectral analysis. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1981, 29(2): 230-237
CrossRef Google scholar
[9]
Burg J P, Luenberger D G, Wenger D L. Estimation of structured covariance matrices. Proceedings of the IEEE, 1982, 70(9): 963-974
CrossRef Google scholar
[10]
Jaynes E T. Information theory and statistical mechanics. Physical Review, 1957, 106(4): 620-630
CrossRef Google scholar
[11]
Schwarz G. Estimating the dimension of a model. Annals of Statistics, 1978, 6(2): 461-464
CrossRef Google scholar
[12]
MacKay D J C. A practical Bayesian framework for backpropagation networks. Neural Computation, 1992, 4(3): 448-472
CrossRef Google scholar
[13]
Attias H. A variational Bayesian framework for graphical models. Advances in Neural Information Processing Systems, 2000, 12: 209-215
[14]
McGrory C A, Titterington D M. Variational approximations in Bayesian model selection for finite mixture distributions. Computational Statistics & Data Analysis, 2007, 51(11): 5352-5367
CrossRef Google scholar
[15]
Amari S I, Cichocki A, Yang H. A new learning algorithm for blind separation of sources. In: Touretzky D S, Mozer M C, Hasselmo M E, eds. Advances in Neural Information Processing System 8. Cambridge: MIT Press, 1996, 757-763
[16]
Bell A J, Sejnowski T J. An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 1995, 7(6): 1129-1159
CrossRef Google scholar
[17]
Xu L. Independent subspaces. In: Ramón J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence. Hershey, PA: IGI Global, 2008, 903-912
CrossRef Google scholar
[18]
Bahl L, Brown P, de Souza P, Mercer R. Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: Proceedings of 1986 IEEE International Conference on Acoustics, Speech, and Signal Processing. 1986, 11: 49-52
[19]
Valtchev V, Odell J J, Woodland P C, Young S J. MMIE training of large vocabulary recognition systems. Speech Communication, 1997, 22(4): 303-314
CrossRef Google scholar
[20]
Liao J C, Boscolo R, Yang Y L, Tran L M, Sabatti C, Roychowdhury V P. Network component analysis: Reconstruction of regulatory signals in biological systems. Proceedings of the National Academy of Sciences of the United States of America, 2003, 100(26): 15522-15527
CrossRef Google scholar
[21]
Brynildsen M P, Tran L M, Liao J C. A Gibbs sampler for the identification of gene expression and network connectivity consistency. Bioinformatics, 2006, 22(24): 3040-3046
CrossRef Google scholar
[22]
Redner R A, Walker H F. Mixture densities, maximum likelihood, and the EM algorithm. SIAM Review, 1984, 26(2): 195-239
CrossRef Google scholar
[23]
Xu L, Jordan M I. On convergence properties of the EM algorithm for Gaussian mixtures. Neural Computation, 1996, 8(1): 129-151
CrossRef Google scholar
[24]
Xu L, Krzyzak A, Oja E. Rival penalized competitive learning for clustering analysis, RBF net, and curve detection. IEEE Transactions on Neural Networks, 1993, 4(4): 636-649
CrossRef Google scholar
[25]
Xu L. Best harmony, unified RPCL and automated model selection for unsupervised and supervised learning on Gaussian mixtures, three-layer nets and ME-RBF-SVM models. International Journal of Neural Systems, 2001, 11(1): 43-69
CrossRef Google scholar
[26]
Xu L. Bayesian Ying Yang system and theory as a unified statistical learning approach: (I) Unsupervised and semi-unsupervised learning. In: Amari S, Kassabov N, eds. Brain-like Computing and Intelligent Information Systems. Springer-Verlag, 1997, 241-274
[27]
Salah A A, Alpaydin E. Incremental mixtures of factor analyzers. In: Proceedings the 17th International Conference on Pattern Recognition. 2004, 1: 276-279
[28]
Williams P M. Bayesian regularization and pruning using a Laplace prior. Neural Computation, 1995, 7(1): 117-143
CrossRef Google scholar
[29]
Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B: Methodological, 1996, 58(1): 267-288
[30]
Figueiredo M A F, Jain A K. Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(3): 381-396
CrossRef Google scholar
[31]
Corduneanu A, Bishop C M. Variational Bayesian model selection for mixture distributions. In: Richardson T, Jaakkola T, eds. Proceedings of the Eighth International Conference on Artificial Intelligence and Statistics. 2001, 27-34
[32]
Wallace C S, Dowe D L. Minimum message length and Kolmogorov complexity. Computer Journal, 1999, 42(4): 270-283
CrossRef Google scholar
[33]
Xu L. Bayesian Ying Yang system and theory as a unified statistical learning approach (III): Models and algorithms for dependence reduction, data dimension reduction, ICA and supervised learning. In: Wong K M, et al. eds. Theoretical Aspects of Neural Computation: A Multidisciplinary Perspective. Springer-Verlag, 1997, 43-60
[34]
Tu S K, Xu L. Parameterizations make different model selections: Empirical findings from factor analysis. A special issue on Machine Learning and Intelligence Science: IScIDE2010 (B). Frontiers of Electrical and Electronic Engineering in China, 2011, 6(2): 256-274
CrossRef Google scholar
[35]
Xu L. BYY harmony learning, structural RPCL, and topological self-organizing on mixture models. Neural Networks, 2002, (8-9): 1125-1151
CrossRef Google scholar
[36]
Ghahramani Z, Beal M. Variational inference for Bayesian mixtures of factor analysers. Advances in neural information processing systems 12. Cambridge, MA: MIT Press, 2000, 449-455
[37]
Utsugi A, Kumagai T. Bayesian analysis of mixtures of factor analyzers. Neural Computation, 2001, 13(5): 993-1002
CrossRef Google scholar
[38]
Xu L. Learning algorithms for RBF functions and subspace based functions. In: Olivas E, et al. eds. Handbook of Research on Machine Learning, Applications and Trends: Algorithms, Methods and Techniques. Hershey, PA: IGI Global, 2009, 60-94
CrossRef Google scholar
[39]
Xu L. BYY P-Q factor systems and harmony learning. Invited talk. In: Proceedings of International Conference on Neural Information Processing (ICONIP’2000). 2000, 1: 548-558
[40]
Xu L. BYY harmony learning, independent state space, and generalized APT financial analyses. IEEE Transactions on Neural Networks, 2001, 12(4): 822-849
CrossRef Google scholar
[41]
Xu L. A unified perspective and new results on RHT computing, mixture based learning, and multi-learner based problem solving. Pattern Recognition, 2007, 40(8): 2129-2153
CrossRef Google scholar
[42]
Xu L. Bayesian Ying Yang learning. In: Zhong N, Liu J, eds. Intelligent Technologies for Information Analysis. Berlin: Springer, 2004, 615-706
[43]
Barron A, Rissanen J, Yu B. The minimum description length principle in coding and modeling. IEEE Transactions on Information Theory, 1998, 44(6): 2743-2760
CrossRef Google scholar
[44]
Bishop C M. Training with noise is equivalent to Tikhonov regularization. Neural Computation, 1995, 7(1): 108-116
CrossRef Google scholar
[45]
Zhou Z H. When semi-supervised learning meets ensemble learning. A special issue on Machine Learning and Intelligence Science: IScIDE2010 (A). Frontiers of Electrical and Electronic Engineering in China, 2011, 6(1): 6-16
CrossRef Google scholar
[46]
Xu L. RBF nets, mixture experts, and Bayesian Ying-Yang learning. Neurocomputing, 1998, 19(1-3): 223-257
CrossRef Google scholar
[47]
Xu L. Independent component analysis and extensions with noise and time: A Bayesian Ying-Yang learning perspective. Neural Information Processing—Letters and Reviews, 2003, 1(1): 1-52
[48]
Xu L. BYY learning, regularized implementation, and model selection on modular networks with one hidden layer of binary units. Neurocomputing, 2003, 51: 277-301
CrossRef Google scholar
[49]
Shilov G E, Gurevich B L. Integral, Measure, and Derivative: A Unified Approach. Silverman R trans. New York: Dover Publications, 1978
[50]
Povey D, Woodland P C. Minimum phone error and Ismothing for improved discriminative training. In: Proceedings of 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing. 2002, 1: 105-108
[51]
Juang B H, Katagiri S. Discriminative learning for minimum error classification. IEEE Transactions on Signal Processing, 1992, 40(12): 3043-3054
CrossRef Google scholar
[52]
Juang B H, Chou W, Lee C H. Minimum classification error rate methods for speech recognition. IEEE Transactions on Speech and Audio Processing, 1997, 5(3): 257-265
CrossRef Google scholar
[53]
Saul L K, Rahim M G. Maximum likelihood and minimum classification error factor analysis for automatic speech recognition. IEEE Transactions on Speech and Audio Processing, 2000, 8(2): 115-125
CrossRef Google scholar
[54]
Rissanen J. Modeling by shortest data description. Automatica, 1978, 14(5): 465-471
CrossRef Google scholar
[55]
Hinton G E, Dayan P, Frey B J, Neal R M. The “wake-sleep” algorithm for unsupervised neural networks. Science, 1995, 268(5214): 1158-1161
CrossRef Google scholar
[56]
Xu L, Oja E, Suen C Y. Modified Hebbian learning for curve and surface fitting. Neural Networks, 1992, 5(3): 441-457
CrossRef Google scholar
[57]
Xu L, Krzyzak A, Oja E. A neural net for dual subspace pattern recognition methods. International Journal of Neural Systems, 1991, 2(3): 169-184
CrossRef Google scholar
[58]
Hinton G E, Zemel R S. Autoencoders, minimum description length and Helmholtz free energy. In: Cowan J D, Tesauro G, Alspector J, eds. Advances in Neural Information Processing Systems 6. San Mateo: Morgan Kaufmann, 1994, 449-455
[59]
Xu L, Krzyzak A, Oja E. Unsupervised and supervised classifications by rival penalized competitive learning. In: Proceedings of the 11th International Conference on Pattern Recognition. 1992, I: 672-675
[60]
Xu L. BYY data smoothing based learning on a small size of samples. In: Proceedings of International Joint Conference on Neural Networks. 1999, 1: 546-551
[61]
Xu L. Temporal BYY learning for state space approach, hidden Markov model, and blind source separation. IEEE Transactions on Signal Processing, 2000, 48(7): 2132-2144
CrossRef Google scholar
[62]
Xu L. Machine learning problems from optimization perspective. Journal of Global Optimization, 2010, 47(3): 369-401
CrossRef Google scholar
[63]
Xu L. Bayesian Ying Yang system and theory as a unified statistical learning approach: (II) From unsupervised learning to supervised learning, and temporal modeling. In: Wong K M, King I, Yeung D Y, eds. Proceedings of Theoretical Aspects of Neural Computation: A Multidisciplinary Perspective. 1997, 29-42
[64]
Xu L. Bayesian-Kullback YING-YANG machines for supervised learning. In: Proceedings of the 1996 World Congress On Neural Networks. San Diego, CA, 1996, 193-200
[65]
Xu L. Bayesian Kullback Ying-Yang dependence reduction theory. Neurocomputing, 1998, 22(1-3): 81-111
CrossRef Google scholar
[66]
Xu L. Bayesian Ying-Yang system and theory as a unified statistical learning approach: (V) Temporal modeling for temporal perception and control. In: Proceedings of the International Conference on Neural Information Processing. 1998, 2: 877-884
[67]
Xu L. New advances on Bayesian Ying-Yang learning system with Kullback and non-Kullback separation functionals. In: Proceedings of 1997 IEEE-(INNS) Conference on Neural Networks. 1997, 3: 1942-1947
[68]
Xu L. Bayesian Ying-Yang machine, clustering and number of clusters. Pattern Recognition Letters, 1997, 18(11-13): 1167-1178
CrossRef Google scholar
[69]
Xu L. How many clusters?: A YING-YANG machine based theory for a classical open problem in pattern recognition. In: Proceedings of the 1996 IEEE International Conference on Neural Networks. 1996, 3: 1546-1551
[70]
Xu L. Bayesian Ying-Yang theory for empirical learning, regularization, and model selection: General formulation. In: Proceedings of International Joint Conference on Neural Networks. 1999, 1: 552-557
[71]
Xu L. Temporal BYY learning and its applications to extended Kalman filtering, hidden Markov model, and sensormotor integration. In: Proceedings of International Joint Conference on Neural Networks. 1999, 2: 949-954
[72]
Xu L. Temporal factor analysis: Stable-identifiable family, orthogonal flow learning, and automated model selection. In: Proceedings of International Joint Conference on Neural Networks. 2002, 472-476
[73]
Csiszár I, Tusnády G. Information geometry and alternating minimization procedures. Statistics and Decisions, 1984, (Suppl 1): 205-237
[74]
Xu L. Temporal Bayesian Ying-Yang dependence reduction, blind source separation and principal independent components. In: Proceedings of International Joint Conference on Neural Networks. 1999, 2: 1071-1076
[75]
Pang Z H, Tu S K, Su D, Wu X H, Xu L. Discriminative training of GMM-HMM acoustic model by RPCL learning. A special issue on Machine Learning and Intelligence Science: IScIDE2010 (B). Frontiers of Electrical and Electronic Engineering in China, 2011, 6(2): 283-290
CrossRef Google scholar
[76]
Amari S, Nagaoka H. Methods of Information Geometry. London, U.K.: Oxford University Press, 2000
[77]
Belouchrani A, Cardoso J. Maximum likelihood source separation by the expectation maximization technique: deterministic and stochastic implementation. In. Proceedings of NOLTA95. 1995, 49-53
[78]
McLachlan G J, Krishnan T. The EM Algorithms and Extensions. New York: John Wiley and Sons, 1997
[79]
Shi L, Tu S K, Xu L. Gene clustering by structural prior based local factor analysis model under Bayesian Ying-Yang harmony learning. In: Proceedings of the 2010 International Conference on Bioinformatics and Biomedicine. 2010, 696-699
[80]
Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B: Methodological, 1996, 58(1): 267-288
[81]
Park M Y, Hastie T. Penalized logistic regression for detecting gene interactions. Biostatistics, 2008, 9(1): 30-50
CrossRef Google scholar
[82]
Brown R G, Hwang P Y C. Introduction to Random Signals and Applied Kalman Filtering. 3rd ed. New York: John Wiley and Sons, 1997
[83]
Roweis S, Ghahramani Z. A unifying review of linear Gaussian models. Neural Computation, 1999, 11(2): 305-345
CrossRef Google scholar
[84]
Ghahramani Z, Hinton G E. Variational learning for switching state-space models. Neural Computation, 2000, 12(4): 831-864
CrossRef Google scholar
[85]
Shumway R H, Stoffer D S. An approach to time series smoothing and forecasting using the EM algorithm. Journal of Time Series Analysis, 1982, 3(4): 253-264
CrossRef Google scholar
[86]
Shumway R H, Stoffer D S. Dynamic linear models with switching. Journal of the American Statistical Association, 1991, 86(415): 763-769
CrossRef Google scholar
[87]
Digalakis V, Rohlicek J R, Ostendorf M. ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition. IEEE Transactions on Speech and Audio Processing, 1993, 1(4): 431-442
CrossRef Google scholar
[88]
Wang P H, Shi L, Du L, Liu H W, Xu L, Bao Z. Radar HRRP statistical recognition with temporal factor analysis by automatic Bayesian Ying-Yang harmony learning. A special issue on Machine Learning and Intelligence Science: IScIDE2010 (B). Frontiers of Electrical and Electronic Engineering in China, 2011, 6(2): 300-317
CrossRef Google scholar
[89]
Gales M J F, Young S. The application of hidden Markov models in speech recognition. Foundations and Trends in Signal Processing, 2008, 1(3): 195-304
CrossRef Google scholar
[90]
Cordell H J. Detecting gene-gene interactions that underlie human diseases. Nature Reviews Genetics, 2009, 10(6): 392-404
CrossRef Google scholar
[91]
Phillips P C. Epistasis — The essential role of gene interactions in the structure and evolution of genetic systems. Nature Reviews Genetics, 2008, 9(11): 855-867
CrossRef Google scholar
[92]
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira M A, Bender D, Maller J, Sklar P, de Bakker P I, Daly M J, Sham P C. PLINK: A tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics, 2007, 81(3): 559-575
CrossRef Google scholar
[93]
Ritchie M D, Hahn LW, Moore J H. Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Genetic Epidemiology, 2003, 24(2): 150-157
CrossRef Google scholar
[94]
Xu L, Amari S. Combining classifiers and learning mixtureof- experts. In: Ramón J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence. Hershey, PA: IGI Global, 2008, 318-326
CrossRef Google scholar
[95]
Tu S K, Chen R S, Xu L. A binary matrix factorization algorithm for protein complex prediction. Proteome Science, 2011, 9(Suppl 1): S18
CrossRef Google scholar

RIGHTS & PERMISSIONS

2014 Higher Education Press and Springer-Verlag Berlin Heidelberg
PDF(1760 KB)

Accesses

Citations

Detail

Sections
Recommended

/