Department of Applied Mathematics, University of Washington, Seattle, WA 98195-3925, USA
yuchench@uw.edu
Show less
History+
Received
Accepted
Published
2019-12-06
2020-01-15
2020-06-15
Issue Date
Revised Date
2020-04-20
2020-01-13
PDF
(126KB)
Abstract
This tutorial presents a mathematical theory that relates the probability of sample frequencies, of M phenotypes in an isogenic population of N cells, to the probability distribution of the sample mean of a quantitative biomarker, when the N is very large. An analogue to the statistical mechanics of canonical ensemble is discussed.
Hong Qian, Yu-Chen Cheng.
Counting single cells and computing their heterogeneity: from phenotypic frequencies to mean value of a quantitative biomarker.
Quant. Biol., 2020, 8(2): 172-176 DOI:10.1007/s40484-020-0196-3
Statistical analyses of data and stochastic models of mechanisms are two very different, but complementary approaches in biological research. While the former obtains a quantitative representation of high-throughput measurements [1], the latter can provide “laws of nature” through limit theorems [2], widely called emergent phenomenon. A case in point is the theory of phase transition [3] which shows that a nonlinear stochastic dynamical system with bistability and cusp catastrophe, in the limit of time followed by system’s size , necessarily exhibits a discontinuous transition [4]. Another example is the recent work [5] which demonstrates that Gibbsian equilibrium chemical thermodynamics can be reformulated as a limit theorem in a mesoscopic chemical kinetic system, with N species and M reversible stochastic elementary reactions, as the system’s size becoming macroscopic.
With the rise of single-cell biology, one naturally is interested in the limiting behavior of the phenotypic frequencies among a population of cells, usually based on one, or several biomarkers. In this case, there is actually a very powerful mathematical result that is widely known to probabilists and statistical physicists. In this tutorial, we give an introduction of this theory and discuss its broader implications.
CHARACTERIZING HETEROGENEITY IN SINGLE CELLS
Asymptotic probaility distribution for sample frequencies of cellular phenotypes
To study phenotypic heterogeneity, let a population of N isogenic cells as independent and identically distributed (i.i.d.) realizations of random events from a set : There are totally M possible phenotypes. Among the N cells, let denotes the random number of cells in the state: . By phenotypic frequency, we mean .
Let denote the probability of a cell in the state. Then the probability distribution for the observed frequency being follows a multinomial distribution
Since usually N is very large in a high-throughput single-cell experiment, one can safely approximate Eq. (1) using Stirling’s formula and obtain:
Therefore, one has the asymptotic limit
In the theory of large deviations of probability, this is known as Sanov’s theorem [6]. Since except when , in the limit of , the probability of is zero, and the probability of is one. The frequency yields the probability for an infinitely large number of i.i.d. samples. Furthermore, Eq. (2) shows that () are the most probable sample frequencies for a finite but large N.
Asymptotic distribution for the mean value of a biomarker
Eqs. (1) and (2) give the probability for the frequencies within the N cells distributed among the M phenotypic states. We now consider a specific biomarker , which is assumed to be a well defined real-valued function of the phenotype of a cell: when a cell is in the state.
It is very clear that if one knows the frequencies , then the mean value for over the entire population of the N cells is determined:
since the frequencies are random, so is . Then when , one expects the approaching to the expected value . This is easy to show:
What is the probability distribution for when N is very large but not infinite? One can calculate this:
We obtain Eq. (6) because among the many sets of x that give the same value y, each has a probability of . Therefore, as , only the set with the smallest matters. Eq. (6) indicates that for very large N , the probability distribution for the mean value of the biomarker has the form , in which
In the theory of large deviations of probability, this result is known as contraction principle [6]. and are called a level-1 and a level-2 large deviations rate functions, respectively.
From phenotypic frequencies to biomarker mean values
The right-hand-side of Eq. (7) can be further carried out; this is a problem of constrained minimization using multivariate calculus:
Introducing Langrage multipliers for Eq. (8),
Then we can find , , and as the solution of
That is,in which is a function of y through Eq. (11b), which gives the function implicitly. We therefore obtainwhereand solves .
The above computation tells us that if one knows the values of a biomarker for all the M states of a cell, , together with a prior knowledge of , one should construct the function and calculate the given in Eq. (13). Then the probability distribution for the mean value of the biomarker is going to be:
It also tells us that if one observes the mean biomarker value being , then the most probable phenotypic frequencies will have a posterior form that deviates from its prior :
Both Eqs. (14) and (15) suggest that the functional relationship , between the mean value of the biomarker and the Lagrangian multiplier , or its inverse form , are very fundamental to the probabilistic problem, in the limit of infinite sample size .
BEYOND AN i.i.d. POPULATION
We derived the expression in Eq. (3) based on the assumption of a population of N cells that are i.i.d. samples of a single M-state random individual with probability . When there are cell-cell interactions among the individuals within a population, the mathematics immediately becomes much more involved.
Two types of research go beyond an i.i.d. population in the stochastic modeling; they were originally motivated, respectively, by chemical kinetics in solution [7] and Ising model for ferromagnetism of solid [8,9]. In chemical kinetics, rapid spatial movement of all “individual molecules” in an aqueous solution leads to the assumption that every individual collides with every other invidivual, and certain “reactions” can occur randomly. The Gibbs function in chemical thermodynamics is precisely like the function in Eq. (3), for complex chemical reaction systems in equilibrium [5]. Actually there is a general equation, first discovered by [10], whose solution can provide for non-i.i.d. populations. For reversible unimolecular reactions among M species, , with concentrations and arbitrary non-negative functions being the rates of the reaction between species j and the species M ,the equation reads
If = and = , then the solution to Eq. 17 recovers the Eq. 3,in which the p’s are functions of q’s and r’s,
The particular set of represents chemical reactions in an ideal solution. A reader who had a course on freshman chemistry might recognize Eq. (18) aswhere is the chemical potential of specie with mole fraction (not molarity) in an ideal solution, and =. Then =-=, where is the equilibrium constant between species i and j [11]. Apart from the RT , the Gibbs energy function is a consequence of statistical counting, which has very little to do with the energy of the atoms in the molecules [12].
In the second type, Ising model and alike, “individual atoms” are located at fixed lattice points in a solid, each one only interacts with its neighbours. The limit of of such an interacting particle system is known as hydrodynamic limit of the stochastic model.
Cell-cell interactions in a tissue or in a culture medium can have both types: When an interaction is mediated by rapidly diffusing small molecular factors, one can safely assume the interaction is between every two individual cells in a population. If an interaction between nearby cells is mediated by slowly diffusing molecules, or due to direct contacts via mechanical interactions, gap junctions, or synapses, then a lattice model is more appropriate. Combining these two types of mathematical descriptions leads to the “reaction diffusion” paradigm [13] which serves the foundation for describing living phenomena [14].
DISCUSSION
Statistical mechanics and Boltzmann’s law
A reader who had a course on statistical mechanics [15] will certainly recognize , , and in Eq. (13) as partition function, Helmholtz free energy, and temperature, if one identifies as the energy of the state of a mechanical system. Eq. (12) then shows that = where should be identified as “entropy” of the mechanical system with energy y; and it is related to through a Legendre transform. Most textbooks on statistical mechanics do not tell its readers, however, the clear mathematical logic of all these formulae. But actually, Boltzmann’s 1877 paper [16], by counting the molecules with different kinetic energy in an ideal gas, had proceeded exactly the steps we took and derived the celebrated Boltzmann’s law, in the form in Eq. (11a).
Variational Bayesian method
The obtained in Eq. (13) has a very important property: For any, arbitrary, normalized distribution ,
In the variational Bayesian method for inference [17], one often knows a target, posterior distribution but computing its normalization factor is expensive. Eq. (20) shows that to obtain the target distribution, one can simply minimizes the left-hand-side of Eq. (20) among a set of possible . This same idea had also been used by Gibbs in his variational method [18]: The free energy of an equilibrium state is the minumum among all others through a virtual change of state.
Maximum entropy principle
The constrained optimization in Eq. (8) leading to distribu- tion in Eq. (11a) has also become the foundation of maximum entropy principle (MEP) championed by Jaynes [19], which has played a productive role in data science. The axiomatic nature of MEP [20] and the role of conditional probability [21] have been elucidated.
The fundamental premises behind the large deviations principle (LDP) and the MEP are very different: Entropy, as a large deviation rate function, is used in the former to find the rare event that is the most probable, which is the only possible event in the limit: For an arbitrary set of n real values ,where =. This is the same idea in choosing only the term with the largest eigenvalue among the terms in a linear eigenvalue decomposition, in the limit of infinite time or system’s size. In MEP, however, entropy function is used as a measure for “unbias”. Actually, according to LDP, the in (11a) is not a probability distribution, it is the most probable frequency among N i.i.d. samples. In MEP, it is interpreted as the least biased probability distribution with maximum uncertainty.
Pence, C. H. (2011) “Describing our whole experience”: the statistical philosophies of W. F. R. Weldon and Karl Pearson. Stud. Hist. Philos. Biol. Biomed. Sci., 42, 475–485
[2]
Chibbaro, S., Rondoni, L. and Vulpiani, A. (2014) Reductionism, Emergence and Levels of Reality. New York: Springer
[3]
Anderson, P. W. (1972) More is different. Science, 177, 393–396
[4]
Qian, H., Ao, P., Tu, Y. and Wang, J. (2016) A framework towards understanding mesoscopic phenomena: Emergent unpredictability, symmetry breaking and dynamics across scales. Chem. Phys. Lett., 665, 153–161
[5]
Ge, H. and Qian, H. (2016) Mesoscopic kinetic basis of macroscopic chemical thermodynamics: A mathematical theory. Phys. Rev. E, 94, 052150
[6]
Dembo, A. and Zeitouni, O. (1998) Large Deviations Techniques and Applications, 2nd ed. New York: Springer
[7]
Kurtz, T. G. (1972) The relationship between stochastic and deterministic models for chemical reactions. J. Chem. Phys., 57, 2976–2978
[8]
Liggett, T. M. (1985) Interacting Particle Systems. New York: Springer-Verlag
[9]
Derrida, B. (1998) An exactly soluble nonequilibrium system: The asymmetric simple exclusion process. Phys. Rep., 301, 65–83
[10]
Gang, H. (1986) Lyapunov function and stationary probability distribution. Zeit. Physik B: Cond. Matt. 65, 103–106
[11]
Chang, R. and Goldsby, K. A. (2012) Chemistry, 11th ed. New York: McGraw-Hill
[12]
Qian, H. (2019) Stochastic population kinetics and its underlying mathematicothermodynamics. In: The Dynamics of Biological Systems, Bianchi, A., Hillen, T., Lewis, M., Yi, Y. eds., pp. 149–188. Springer: New York
[13]
Murray, J. D. (2011) Mathematical Biology II: Spatial Models and Biomedical Applications, 3rd ed. New York: Springer
[14]
von Bertalanffy, L. (1950) The theory of open systems in physics and biology. Science, 111, 23–29
[15]
Huang, K. (1963) Statistical Mechanics. New York: John Wiley & Sons
[16]
Sharp, K. and Matschinsky, F. (2015) Translation of Ludwig Boltzmann’s paper “On the relationship between the second fundamental theorem of the mechanical theory of heat and probability calculations regarding the conditions for thermal equilibrium”. Entropy (Basel), 17, 1971–2009
[17]
Ghahramani, Z. (2001) An introduction to hidden Markov models and Bayesian networks. Int. J. Pattern Recognit. Artif. Intell., 15, 9–42
[18]
Pauli, W. (1973) Pauli Lectures on Physics: Thermodynamics and the Kinetic Theory of Gas. Cambridge: The MIT Press
[19]
Jaynes, E. T. (2003) Probability Theory: The Logic of Science. London: Cambridge University Press
[20]
Shore, J. E. and Johnson, R. W. (1980) Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans. Inf. Theory, 26, 26–37
[21]
van Campenhout, J. M. and Cover, T. M. (1981) Maximum entropy and conditional probability. IEEE Trans. Inf. Theory, 27, 483–489
RIGHTS & PERMISSIONS
Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary 中Eng×
Note: Please be aware that the following content is generated by artificial intelligence. This website is not responsible for any consequences arising from the use of this content.