Bayesian method for fitting the low-energy constants in chiral perturbation theory

Hao-Xiang Pan; De-Kai Kong; Qiao-Yi Wen; Shao-Zhou Jiang

doi:10.1007/s11467-024-1430-7

Front. Phys. ›› 2024, Vol. 19 ›› Issue (6) :64203 DOI: 10.1007/s11467-024-1430-7

RESEARCH ARTICLE

Bayesian method for fitting the low-energy constants in chiral perturbation theory

Author information +

History +

PDF (4795KB)

Abstract

The values of the low-energy constants (LECs) are very important in the chiral perturbation theory. This paper adopts a Bayesian method with the truncation errors to globally fit eight next-to-leading order (NLO) LECs $L i r$ and next-to-next-leading order (NNLO) LECs $C i r$ . With the estimation of the truncation errors, the fitting results of $L i r$ in the NLO and NNLO are very close. The posterior distributions of $C i r$ indicate the boundary-dependent relations of these $C i r$ . Ten $C i r$ are weakly dependent on the boundaries and their values are reliable. The other $C i r$ are required more experimental data to constrain their boundaries. Some linear combinations of $C i r$ are also fitted with more reliable posterior distributions. If one knows some more precise values of $C i r$ , some other $C i r$ can be obtained by these values. With these fitting LECs, most observables provide a good convergence, except for the $π K$ scattering lengths $a 0 3 / 2$ and $a 0 1 / 2$ . An example is also introduced to test the improvement of the method. All the computations indicate that considering the truncation errors can improve the global fit greatly, and more prior information can obtain better fitting results. This fitting method can be extended to the other effective field theories and the perturbation theory.

Graphical abstract

Keywords

chiral perturbation theory / low-energy constants / Bayesian statistics

Cite this article

Download citation ▾

Hao-Xiang Pan, De-Kai Kong, Qiao-Yi Wen, Shao-Zhou Jiang. Bayesian method for fitting the low-energy constants in chiral perturbation theory. Front. Phys., 2024, 19(6): 64203 DOI:10.1007/s11467-024-1430-7

登录浏览全文

4963

注册一个新账户忘记密码

1 Introduction

Effective field theory (EFT) is a very important theory in dealing with interactions between particles under a low-energy scale. Chiral perturbation theory (ChPT) is a kind of EFT. It first focuses on the low-energy strong interactions between the low-energy pseudoscalar mesons and then extends to baryons and other mesons. ChPT is based on the

S U (3) L × S U (3) R

flavor symmetry in the chiral limit, in which the three lightest quarks are considered massless. The only constraints of the chiral Lagrangian are symmetries, such as charge conjugate symmetry, parity symmetry, and chiral symmetry. However, there are infinite independent terms satisfying these symmetries. The Weinberg power-counting scheme expands these terms by the mesonic momentum (

p

) [1]. The leading-order (LO,

O (p 2)

order) terms give the most contributions, and they are considered first. If one wants to obtain a higher precision, the terms in the next-to-leading order (NLO,

O (p 4)

order), the next-to-next-to-leading order (NNLO,

O (p 6)

order), etc., will be considered gradually. Each term contains a corresponding unknown parameter, called low-energy constant (LEC), which contains the information of the effective strong interactions. For the three-flavor ChPT, there are 2, 10+2, 90+4, and 1233+21 LECs in the LO, NLO, NNLO and next-to-next-to-next-to-leading order (

O (p 8)

order) [2–5], respectively. If all these LECs were known, all theoretical calculations would be obtained numerical values. However, the number of these LECs are too large, especially in the high orders. Besides, with CHPT itself, one cannot fix these LECs. The LECs are usually determined by the other approaches, such as global fit [6–8], lattice QCD [9–11], chiral quark model [12–15], resonance chiral theory [13–17], sum rules [18], holographic QCD [19], and dispersion relations [20–22]. Each method has its advantages and sphere of application. Until now, no approach can determine the exact values of these LECs. This paper only focuses on the global fit method.

There has been a lot of research based on global fits. Some LECs up to NNLO have been fitted. Ref. [23] fits

K ℓ 4

form factors and

π π

scatter lengths to get the values of

L 1 r

L 2 r

and

L 3 r

. Six years later,

L i r (i = 1, 2, 3, 5, 7, 8)

is determined by fitting the quark mass ratio

m s / m^

, the decay constant ratio

F K / F π

and the

K ℓ 4

form factor [24]. Another eleven years later, a new global fit appears, adding

π π

scattering lengths (

a 00

and

a 02

π K

scattering lengths (

a 0 1 / 2

and

a 0 1 / 3

) and scalar form factor threshold parameters (

⟨ r 2 ⟩ S π

and

c S π

L 1 − 8 r

and some

C i r

are obtained [6]. Ref. [7] adds some two-flavor LECs and updates the values of LECs fitted in Ref. [6]. The last two references not only fit the LECs

L 1 − 8 r

at the NLO, but also estimate a part of the NNLO LECs

C i r

. However, both of them ignore the higher-order truncated contributions. Ref. [8] proposes a geometric sequence model to introduce the higher-order truncated contributions. Its NLO fitting values of

L i r

are very close to the NNLO fitting values in Ref. [7]. This is because a physical quantity contains not only the sum of LO and NLO theoretical values, but also the sum of higher-order contributions, which sometimes cannot be ignored when compared to the NLO contributions. If the NLO fit includes the higher-order contributions,

L i r

will be closer to the true values. Hence, fitting

L i r

at NLO and NNLO yields closer results. This shows that the higher-order contributions indeed cannot be simply ignored in the ChPT fit. Above all references have adopted a classical statistical method to fit LECs. Theoretically, the precision of the fitting results is dependent on the amount and precision of the experimental data. In other words, more data and more precise data will lead to more precise LECs. However, there exist some problems in the classical statistics and some improvements are needed.

i) The geometric sequence model in Ref. [8] is too simple. The contribution at each order, in fact, needs not be a geometric sequence. In addition, in order to estimate the NNLO contribution, the geometric sequence itself requires the LO and NLO contributions. However, the LO contribution is sometimes zero, so the NNLO contribution cannot be estimated. In some special cases, the NNLO contribution may be larger than the NLO one, such as

π K

scattering lengths

a 0 1 / 2

and

a 0 3 / 2

[6, 7]. Hence, Ref. [8] adopts a special approach to deal with this problem. In most cases, two two-flavor NLO LECs

l ¯ 2, 3

have a bad convergence. It takes a long time to fit them, about one day with 20 cores in a CPU Intel Xeon Gold 6230. In addition, how to confirm the sign of the NNLO contribution is also a problem. These cause that the model is not consistent for all physical quantities. The model is not a universal approach.

ii) The number of

C i r

is much larger than the number of the input experimental data. There exists an overfitting problem in the NNLO fit. Refs. [6, 7] adopt a random walking algorithm, but the result is boundary-dependent. A Monte Carlo method is used to fit the LECs in Ref. [8], but its efficiency is low. Moreover, the complicated errors of

C i r

are hard to be estimated. They usually cannot be obtained as a normal distribution.

iii) Although the geometric sequence model gives a reasonable result in Ref. [8], this model is hard to extend to the other EFTs, because of the reasons discussed above. Furthermore, it is also hard to evaluate different models in order to select the best one, because

χ 2 / d . o . f .

(degrees of freedom) is too small and an overfitting problem exists. It is hard to select the best model from some overfitting models by

χ 2 / d . o . f .

A more universal method requires a credible quantified index. The best model can be selected by this index.

iv) Refs. [6–8] treat the two-flavor NLO LECs

l ¯ i

as the independent input experimental data, but some

l ¯ i

are possibly dependent on other experimental quantities. In fact, the

π π

scattering lengths

a 00

and

a 02

are dependent on

l ¯ 1

l ¯ 2

and

l ¯ 4

[25]. Hence, their covariance matrix needs to be considered.

v) The most important thing is that before a global fit, one has known something about ChPT and the fitting experimental data, but this information does not obviously embody in the fit. For example, for the NLO fitting, although the truncation errors are not known, the other references have given some approximate values of the NNLO LECs. With these NNLO LECs, even at the NLO fit, one can roughly obtain the signs of the truncation errors. If these signs are introduced into the NLO fit, the results may be more precise. Furthermore, ChPT assumes that the orders of magnitude of LECs at a given chiral order are nearly the same. If this information is considered in the global fit, the range of the unknown LECs even through the NNLO contribution can be estimated. Simply speaking, more information may lead to a more precise result.

In addition to classical statistics, Bayesian statistics, which has been successful in artificial intelligence, can play a better role in the global fit of EFTs. Bayesian statistics can make good use of the known information to give a more reasonable result. Even when the amount of data is small, Bayesian statistics can be better than classical statistics. Ref. [26] has applied Bayesian statistics to EFTs. It proposes two toy models and compares the results obtained by Bayesian and classical statistics. The advantages of Bayesian statistics in EFTs have been demonstrated. Later, Ref. [27] introduces Bayesian statistics into nuclear physics. A year later, a specific framework for using Bayesian statistics in EFTs appears [28]. Subsequently, Refs. [29–57] use Bayesian statistics to calculate the magnitudes of truncation errors in the different EFTs. This paper will improve the approach in Ref. [8]. The new approach contains the framework of Bayesian statistics and the application of Markov Chain Monte Carlo (MCMC). Some MCMC algorithms, such as the Metropolis-Hastings algorithm [58, 59], Hamiltonian Monte Carlo algorithm [60] and No-U-turn Sampler algorithm [61], will be used to fit the LECs with the help of the PyMC3 package [62]. The major improvements of the new approach and the motivations of this paper are as follows.

i) The geometric sequence is not required in the fit. It is replaced by a Bayesian method. Generally, the new method does not require the assumptions about how ChPT converges.

ii) The approach is more general. Some examples are carried out to check whether the approach works well. The parameters in the examples are completely random. Hence, this approach is not only used to fit the LECs in ChPT, but also can be applied to other EFTs and perturbation theory.

iii) The cost of time for this approach is greatly reduced with the help of MCMC. A better result will be obtained within ten minutes.

iv) The covariance matrix given in Ref. [25] will be considered in the fit, so it is maintained.

v) The Bayesian method is applied fully in the fit. More information under some reasonable assumptions is considered if possible, such as the assumptions of the signs and the order of magnitude of the truncation errors.

vi) Although the number of input values is not large enough, some clearer distribution of

C i r

and some more precise values of

L i r

will be obtained. In addition, the boundary dependence of

C i r

can be seen more clearly.

This paper is organized as follows: Section 2 gives a brief introduction to Bayesian statistics and MCMC. In Section 3, two Bayesian models and some evaluation criteria are introduced. One model contains truncation errors, but the other one not. Some details of the calculation are also discussed. One example is studied in Section 4, in order to evaluate the above models. The input physical observables mentioned in ChPT is given in Section 5. In Section 6, some NLO and NNLO LECs are fitted by the above models. A set of new LECs are obtained. Section 7 gives a summary and some discussions.

2 Bayesian statistics and MCMC

This section provides a brief introduction to fit data by Bayesian statistics and MCMC. More details can be found in Refs. [26, 27]. Some content is very basic and can also be found in textbooks about probability theory and Bayesian analysis. For convenience, some parameters are given meanings in ChPT, but it has a much wider scope of applications. They can be any parameters to be fitted in a problem.

Considering a general case, some parameters need to be fitted from a set of data.

D = (D 1, D 2, D 3, …, D m)

denotes a set of known input data. In physics, it is usually experimental data or physical constant quantities. All

D i

are not assumed independent.

a = (a 1, a 2, a 3, …, a n)

is a parameter vector. In physics, its components are usually some parameters needed to be fitted. In this paper,

a

means LECs. The rest of this section will introduce an approach to fit

a

by Bayesian statistics and MCMC. This approach is faster than only the Bayesian statistics without MCMC.

The core of Bayesian statistics is Bayes’ formula

(1)

p r (a | D) = p r (D | a) p r (a) p r (D) .

The meanings of Eq. (1) is as follows.

p r (a)

is the prior probability distribution function (PDF). It reflects the knowledge of

a

before

D

is observed. If one does not know anything about

a

p r (a)

is usually set to a uniform distribution. Usually, experiment or/and theory can give an approximated value. At least the order of magnitude is known before fitting in most cases. Due to the introduction of

p r (a)

, one would argue that Bayesian statistics are subjective. However,

p r (a)

is nothing more than some assumptions in the construction of a model. This is similar to the

χ 2

fit usually needing an initial value of a reasonable range.

ii)

p r (D | a)

is the likelihood function. It is related to

D

and reflects the confidence of

D

under the given

a

. It can be expressed as

(2)

p r (D | a) = exp ⁡ {− 12 [μ th (a) − D ¯] T (Σ D) − 1 [μ th (a) − D ¯]},

where

μ t h (a)

is the theoretical expected value of the data, which is dependent on

a

D ¯

is the expected value of the data, i.e., the experimental central value.

Σ D

is the covariance matrix of

D

. The errors and the correlation information of

D

are contained in

Σ D

iii)

p r (a | D)

is the posterior PDF. It is the result of Bayesian analysis. It also reflects the full knowledge of

D

from a fitting model.

p r (a | D)

is the PDFs of

a

, but not only some expected values.

p r (a | D)

can be viewed as an update of

p r (a)

after

D

have been observed. In addition,

p r (a | D)

in one fit can be regarded as

p r (a)

in another fit after appending some new

D

iv)

p r (D)

is called Bayesian evidence. It is known as the marginal likelihood PDF. It means the average probability of

D

in the fitting model. In addition, it can also be simply treated as a normalization coefficient. Because a fit is concerned with the relative PDFs of

a

rather than their absolute PDFs, this normalization coefficient does not play an important role in the fit. Ignoring

p r (D)

, Bayes’ formula can be expressed in a proportional form

(3)

p r (a | D) ∝ p r (D | a) p r (a) .

Hence,

p r (D | a) p r (a)

is also called the core of the posterior PDF.

There are some different methods to determine

p r (a | D)

without

p r (D)

, such as MCMC. We have tried three algorithms to generate the Markov chain, i.e., Metropolis−Hasting algorithm [58, 59, 63], Hamiltonian Monte Carlo algorithm [60] and No-U-turn Sampler algorithm [61]. The last two algorithms are a bit more complicated, but they have a faster computational efficiency. The details can be found in the above references. We have checked that all these algorithms can obtain almost the same distribution. The No-U-turn Sampler algorithm is the fastest one. It costs about half the time compared to the Metropolis-Hastings algorithm.

3 Models and details

3.1 Preparation

The above section gives a general approach to fit the parameter

a

in the known analytical relationship

μ t h (a)

by Bayesian statistics and MCMC. However, in ChPT, this approach cannot be adopted directly, because the strict theoretical relationship

μ t h (a)

is hard to be obtained. It is usually calculated order by order,

(4)

μ t h (a) = μ L O (a L O) + μ N L O (a L O, a N L O) + μ N N L O (a L O, a N L O, a N N L O) + ⋯,

where

μ L O

μ N L O

and

μ N N L O

are the theoretical chiral expansion of

μ t h (a)

at the LO, NLO and NNLO, respectively.

a L O

a N L O

and

a N N L O

are the LO, NLO and NNLO LECs, respectively, such as

L i r

and

C i r

. At present, the higher-order relationship

μ H O (a)

(i.e., truncation error) is lacking, so this paper only considers the expansion up to the NNLO. As discussed in the introduction,

μ H O (a)

may make a great impact on the results. Hence, it should be considered in the fit.

The introduction mentions that many references have discussed how to estimate the truncation errors, such as Ref. [29]. However, that approach cannot be used directly in the present case. There exist some serious problems. Ref. [29] knows

μ L O

μ N L O

and

μ N N L O

without errors to estimate the distribution of

μ t h

. However, in the present case,

μ t h

with systematical errors and the analytical expressions of

μ L O (a L O)

μ N L O (a L O, a N L O)

and

μ N N L O (a L O, a N L O, a N N L O)

are known, but

μ L O

μ N L O

μ N N L O

a N L O

a N N L O

and their distributions are needed to be fitted by

D

and

Σ D

. Ref. [29] computes the Bayesian evidence by a multidimensional integral (Eq. (8) in Ref. [29]). In several special cases, the Bayesian evidence can be integrated analytically, but it usually needs to be integrated numerically. A multi-dimensional numerical integral is usually hard to be done, and it may cost a lot of time. However, the MCMC approach avoids determining the Bayesian evidence, and the computational speed is faster. In addition, Ref. [29] requires Eq. (4) to be convergent order by order, but Refs. [6, 8] have already indicated

μ N L O > μ N N L O

for some physical quantities. Hence, a new approach is needed.

Generally, in an actual fit, some of

a L O

a N L O

and

a N N L O

may have dimensions, and their values may be very small or very large. For example, the NNLO LECs

C i

is about

10 − 3 G e V − 2

. For convenience, they are first removed the dimensions. For example, most literature provides

C i r

(defined in Ref. [64]) without dimension, but not

C i

. Moreover, very small or very large values may lead to numerical errors. Hence, all LECs divide by an order of magnitude, in order to make them roughly 1. This can be done in an actual fit. For example, both experiment and theory can estimate

C i r

is about

10 − 6

. The order of magnitude of LECs is regarded as a prior of LECs in this paper. For convenience, all the quantities in this section are assumed to be dimensionless, and all

a L O

a N L O

and

a N N L O

are assumed roughly 1. In fact, the number 1 is not very strict. As long as the number is not very large, the fit also works well. For convenience,

a L O

is assumed to be known, and it does not need to be fitted in this section. If one wants to fit

a L O

, there is no difference from fitting

a N L O

and

a N N L O

In the actual ChPT fit in this paper, the number of

D i

is less than the total number of

a L O

a N L O

and

a N N L O

. There exists an overfitting. Hence, some constraint conditions are introduced to decrease the parametric space. In order to consider the convergence of ChPT, Eq. (2) need to be introduced some information about the high orders. If one has no more information about the high orders, in this paper, the parameters in Eq. (2) are modified to

(5)

μ t h (a) → (μ t h (a) | μ N L O / μ L O | | μ N N L O / μ L O |),

(6)

D ¯ → (D ¯ 0.2 I 0.05 I),

(7)

Σ D → (Σ D 0.2 2 I 0.05 2 I),

where

I

means an identity matrix with a suitable dimension. These changes assume that

| μ i N L O / μ i L O |

satisfies a normal distribution

N (0.2, 0.2 2)

(

μ = 0.2, σ 2 = 0.2 2

) for any

i

(ignore the negative part), and

| μ i N N L O / μ i L O |

has a similar meaning. The values come from the convergence hypothesis of ChPT. According to ChPT,

| μ N L O / μ L O |

is about 0.1−0.3, and

| μ N N L O / μ N L O |

is also about 0.1−0.3. Both 0.2 and 0.05 are near the central values. The standard deviations are chosen the same as the expected values, in order to give a large enough possibility at a wide range, because the estimation may not be very exact. In order to make the model universal, we choose the relative difference, but not the absolute value. This is because EFT/ChPT can provide us an approximate ratio between two orders, but not their absolute values. Of course, if one knows an approximate absolute value of a special quantity at a given order, Eq. (5) can be replaced with this absolute value. Some similar constraints about the truncation errors will be discussed in Section 3.3. Of course, these constraints can be correspondingly modified to different values, if one has a better understanding about some physical quantities.

For convenience, only one input datum

D

or a component form

D i

is discussed. If one wants to consider more than one datum

D

, the discussion also works.

3.2 Model A

First, the truncation error is not considered in the fit, which is called Model A. Considering a physical quantity with an experimental value

D ± σ D

, its theoretical value is

μ ± σ

. The theoretical values to the NLO and NNLO without errors are

(8)

μ A (N L O) = μ A L O (a L O) + μ A N L O (a L O, a N L O),

(9)

μ A (N N L O) = μ A L O (a L O) + μ A N L O (a L O, a N L O) + μ A N N L O (a L O, a N L O, a N N L O),

respectively. The term with a superscript without a couple of parentheses means the theoretical value only at this order. For example,

μ A N L O

means the NLO theoretical value of

μ

. Eqs. (8) and (9) are applied in the NLO and the NNLO fit, respectively.

This model assumes

p r (a i N L O)

is the standard normal distribution, because the magnitudes of all

a N L O

and

a N N L O

are already normalized to roughly 1. In other words, one only introduces the information about the rough magnitudes of

a N L O

and

a N N L O

, but no more information is considered at present. The advantage of this assumption is that more information of

p r (a | D)

can be derived from the experimental data

D

themselves, in order to reduce the subjectivity. In addition, Eq. (2) is adopted in the fit, but not Eqs. (5)−(7). We have checked that both

a N L O

and

a N N L O

need not be very close to 1. The results change slightly, as long as their values are not very large. This is because the standard normal distribution has a not very small possibility in a wide range. The same conclusion is true for the below model.

In order to improve Model A, more information is appended. It is called Model B.

3.3 Model B

Generally, the truncation error can be simply considered as a normal distribution, and the parameters of the normal distribution are based on the known information from the knowledge of the theory. However, in some special cases, the sign of the truncation error is known, or the probability of the sign is known. This information from the sign is considered separately. Eqs. (8) and (9) are improved to

(10)

μ B (N L O) = μ B L O (a L O) + μ B N L O (a L O, a N L O) + (2 s − 1) e μ B L O,

(11)

μ B (N N L O) = μ B L O (a L O) + μ B N L O (a L O, a N L O) + μ B N N L O (a L O, a N L O, a N N L O) + (2 s − 1) e μ B L O,

respectively. The last terms on the right-hand side of Eqs. (10) and (11) represent the higher-order (HO) truncation error

μ B H O = (2 s − 1) e μ B L O

. For

μ B (N L O)

and

μ B (N N L O)

, it means the contribution higher than the NLO and NNLO, respectively. The parameter

s

relates to the sign of the truncation error. It is assumed to be a Bernoulli random variable with parameters 1,

(12)

p r {s = k} = p k (1 − p) 1 − k, k = 0, 1.

p

is the probability for

s = 1

. If one does not know the information of the sign,

p = 0.5

s

parameter can give a correct sign of the truncation error. If the estimating truncation error gives a narrow range with a wrong sign, the theoretical values will be far from the true value and the fit will be bad. The parameter

s

is introduced to solve this problem.

s

can change the wrong sign into a correct one. Oppositely, if the estimating truncation error gives a correct sign, or the range is too wide to cover the true truncation error,

s

will have no impact on this case. The parameter

e

reflects the relative magnitude of the truncation error, relative to

μ B L O

. One needs not know the absolute magnitude of the truncation error. However, if the EFT is satisfied, the relative magnitude at each order can be estimated. For example, the ratio between two adjacent orders is about

p / Λ

, where

p

is the momentum of the low-energy particles and

Λ

is the scale of the EFT. In ChPT,

| μ B N L O / μ B L O |

is about 0.1−0.3, and

| μ B N N L O / μ B N L O |

is also about 0.1−0.3, and so on. Therefore, it can be considered that

| μ B H O / μ B L O |

is about 5% (2%) for

μ B (N L O)

(

μ B (N N L O)

). Hence, the parameter

e

is assumed to be a Gaussian random variable

(13)

p r (e) = N (μ e, σ e 2),

where

μ e

is the expected magnitude of

μ B H O / μ B L O

, and

σ e

is its standard deviation. If one does not know more information about the truncation error, a possible and reasonable choice is

μ e = σ e = 0.05

(0.02) for

μ B (N L O)

(

μ B (N N L O)

The parameters

p

μ e

σ e

a N L O

and

a N N L O

sometimes can be estimated through the information of the data. Hence, they can be set to another values, even though the prior PDFs of them can be also set to another form, as long as the information is accurate enough.

There are two extreme cases in Model B, which will be adopted only for model evaluation in Section 4. These two cases are called Model B₁ and Model B₂, respectively.

Model

B 1

. In this case, one knows nothing about

μ B H O

, such as the sign and the rough magnitude. Only the approximate order of magnitude of

μ B H O

is known from ChPT, such as about 5% of LO at the NLO fit. As in the discussion above, for all quantities, we set

μ e = σ e = 0.05

(0.02) and

p = 0.5

for the NLO (NNLO) fit. At present, we do not consider more information about

a N L O

and

a N N L O

. Hence,

a N L O

and

a N N L O

are set to the standard normal distribution

N (0, 1)

. The convergence constraints are the same as Eqs. (5)−(7).

Model

B 2

. In this model, the magnitudes of each

μ B H O

all have a certain understanding. Hence, one can set different prior PDFs to different

μ B H O

, separately. The parameters

μ e

σ e

and

p

from different quantities can be set to different values. For example, if one knows the sign is positive,

p

is set to 1. The priors for

μ e

and

σ e

are set as

(14)

μ e N L O = | μ t r N L O / μ t r L O |, σ e N L O = max (0.3 μ e, 0.05), μ e N N L O = | μ t r N N L O / μ t r L O |, σ e N N L O = max (0.3 μ e, 0.02),

where the superscripts NLO (NNLO) represent the NLO (NNLO) fit, the subscript “tr” means true value. Because we have only adopted this model for the example in Section 4 to evaluate the models, all the true values are known. Similarly, the true ranges of

a N L O

and

a N N L O

are generated by some given parameters. Their true ranges are also known. Therefore, their prior ranges are given the same as their true ranges. In addition, the constraints can be set to different values for the different physical quantities.

Models B₁ and B₂ adopt two extreme priors, they are only used to fit the example in Section 4. Because this example are artificial, and the true values are known, we can select none or all prior information in the fit. For the actual experimental data, the known prior information is between Model B₁ and Model B₂. For example, one may have some information about a part of

D

, and the signs and the approximate magnitudes of

μ B H O

can be given as Model B₂. However, for another part of

D

, one may have no information about their

μ B H O

, because of the lack of the current theory and/or experiment. For this part of

D

, one can only give the prior PDFs as those in Model B₁. Besides these two cases, one may more possibly know some information of

μ B H O

. For example,

μ B H O

is more likely to be positive, or its value is possible around

1

− 2

. The prior PDF can be set according to this information. The fitting method of Models B, B₁ and B₂ are the same, except the prior PDFs are different. It can be expected that the general Model B is better than Model B₁, but worse than Model B₂. Therefore, in Section 6, we have uniformly used Model B to represent the new model proposed in this paper.

3.4 Calculation details

This section discusses some special cases in the fit.

Sometimes, one needs to fit the differentiation of

μ (t)

numerically, such as

f s ′

and

g p ′

in Section 5. The numerical deviation

Δ μ = μ (t + Δ t) − μ (t)

needs to calculate the difference between the two quantities

μ (t + Δ t)

and

μ (t)

, but each quantity has an error. If one adopts Eq. (10) or (11) to determine

μ (t + Δ t)

and

μ (t)

, the estimating truncation error of

μ ′ (t)

will contain the above two errors and become large. Therefore, the truncation error of

μ ′ (t)

is estimated from

μ ′, L O

μ ′, N L O

and

μ ′, N N L O

, but not the difference of Eq. (10) or (11). In other words,

μ ′ (t)

is treated as one quantity, but not a difference. However, for physical quantities with derivative values such as

⟨ r 2 ⟩ S π

and

c S π

, we place the HO terms in the denominator, which absorbs the effects of higher-order errors well.

Sometimes, in the NNLO fit, the amount of

a N N L O

is much larger than the number of

D

, but the total number of

a L O

and

a N L O

is less than the number of input

D

. The NNLO fit in ChPT is in this situation. All

a L O

a N L O

and

a N N L O

are fitted as follows.

i) All

a N N L O

first linearly combine into some linearly independent

a ~

. The number of

a ~

is equal to the number of

D

, and one

a ~ i

only correlates to one

D i

. This is also reasonable in ChPT, because the NNLO fit only contains the linear combinations of

C i r

. One can combine them to the linearly independent ones.

ii)

a L O

and

a N L O

are first fitted at the NLO by Model B. The results denote to

a^L O ± σ^L O

and

a^N L O ± σ^N L O

. This is called the NLO fit.

iii) In the NNLO fit,

a L O

a N L O

and

a ~

are fitted simultaneously. If no more information is known, the NNLO priors of

a L O

and

a N L O

are set to some suitable normal distributions

N (μ L O (N L O), (σ L O (N L O)) 2)

, where

(15)

μ L O (N L O) = a^L O (N L O), σ L O (N L O) = max (a^L O (N L O) / 2, σ^L O / (N L O)) .

The definition of

σ L O (N L O)

chooses the maximum of the two parameters

a^L O (N L O) / 2

and

σ L O (N L O)

. This is because either of them may be very small, this definition enlarges the prior ranges of

N (μ L O (N L O), (σ L O (N L O)) 2)

, in order to improve performance. The prior PDFs of

a ~

is set to the standard normal distribution, if one knows nothing about

a ~

. Otherwise, some more reasonable prior PDFs can be set according to the known information.

The prior PDFs of

a L O

a N L O

and

a ~

not only make good use of the information from the NLO fit, but also allow some free spaces for these parameters. Because the NLO fitting

a L O

and

a N L O

can give a reasonable order of magnitude in most cases, the NNLO fit also selects the NLO posterior PDFs to calculate the NNLO prior PDFs. In addition, the new parameter

a ~

is also introduced in the NNLO fit. Hence, the NNLO fit is not a repeated fit to the data, even if some of the NLO posterior information is used. We have also tried to do the NNLO fit without the posterior PDFs from the NLO fit for the example in Section 4, and set the prior of

a L O

a N L O

, and

a N N L O

uniformly to the standard normal distribution. However, this gives very poor results, which can deviate very far from the true values. Therefore, it is necessary to use some sensible information about LECs as a prior in the NNLO fit.

Of course, if some information about

a L O

and

a N L O

is known, one can set another sensible prior PDFs.

iv) Finally, all

a N N L O

are fitted with the posterior

a ~

obtained above, with some appropriate uniform distributions. The boundaries of the uniform distribution are dependent on the approximate order of magnitudes of the truncation errors. This is because the NLO research has usually been studied widely, and more information is known. However, the NNLO research is usually lacking, and the values of

a N N L O

are not quite sure. Hence, a uniform distribution can give a larger probability near the boundaries, in order to study the boundary-dependent property. After the fit, the posterior PDFs of the truncation error will be changed into better ones.

Models B is very efficient. For the actual fit in ChPT, which will be discussed below, a personal computer with CPU Intel i3-10105 only costs about ten minutes with 4 cores. This method greatly reduces the time compared with the method in Ref. [8], which costs about one day with 20-core CPU Intel Xeon Gold 6230.

All the numerical results are represented by the highest posterior density (HPD). The HPD is the minimum interval containing a certain proportion of probability density. The most common proportion is 95% HPD or 98% HPD, but we have chosen 68% HPD. Because it is similar to

1 σ

interval in the classical statistics [28], such as the minimum

χ 2

method. All the results in this paper have been compared. It indicates that the difference between 68% HPD and

1 σ

interval is very small, most last significant digits have no difference or a difference of 1 or 2. Only very few of them have a difference of 3 or 4. No one is larger than 4. Hence, we sometimes do not distinguish them in this paper.

3.5 Evaluation criteria

In order to evaluate which model is the best, there needs an evaluation criterion. This criterion is better to be quantified. One can evaluate different models by the quantified index. Bayesian evidence is one possible criterion, but it is too simple. The widely applicable information criterion (WAIC) and leave-one-out cross-validation (LOOCV) are introduced in recent years. WAIC considers how well the data fits the model and also penalizes complex models. LOOCV splits the data into a training set and validation set and repeats many times to evaluate the model. The definitions of WAIC and LOOCV involve some related concepts and formulas, which need a long discussion. Their definitions and a more detailed explanation can be found in Refs. [65, 66]. Simply speaking, if Model B has larger values of WAIC and LOOCV than Model A, Model B is considered better than Model A. Of course, only a couple of these values for one model are meaningless, because one does not know how large is enough. They are only meaningful for comparing different models.

For the example in Section 4, the true values of parameters

a i, t r

are known. In addition to both WAIC and LOOCV, the fitting results

a i, m o d e l

can be compared to the true values directly. For example,

a i, A

means the expected value of

a i

is fitted by Model A. It is more intuitive to see how well the fit is. Hence, we define the following two quantities as criteria.

(16)

P c t m o d e l = a i, m o d e l − a i, t r a i, t r × 100 %,

(17)

P c t σ m o d e l = a i, m o d e l − a i, t r σ i, m o d e l .

P c t m o d e l

is the relative error between the fitting value

a i, m o d e l

and the true value

a i, t r

. It indicates how well the fitting expected value is.

P c t σ m o d e l

is the ratio of the difference between the true value and the fitting value to the fitting standard error

σ i, m o d e l

. It indicates how well the fitting error is. The smaller these two values are, the better the model is. These two criteria are only used for the example in Section 4, because one does not know the true values in the actual fit.

In order to clarify the convergence of

μ

, the percentages at each order are defined as Ref. [8],

(18)

P c t o r d e r = μ ¯ m o d e l o r d e r μ ¯ m o d e l × 100 %,

where

μ ¯ m o d e l o r d e r

is defined in Eqs. (8)–(11).

μ ¯ m o d e l

means the fitting value obtained by a special model, containing all orders. The notation bar means the expected value. For example,

μ ¯ A N L O

means the NLO expected contribution obtained by Model A, and

μ ¯ A

is the expected value containing all orders obtained by Model A.

For the NNLO fit, the differences among WAIC, LOOCV,

P c t m o d e l

and

P c t σ m o d e l

among different models are small. It is more important to evaluate how well all

a i N N L O

are fitted, because the NNLO fitting

a i N L O

are usually precise enough, but

a i N N L O

usually have large errors. For the example in Section 4, the true values of

a i N N L O

are known, and

a i N N L O = a ~ i

, the fitting values can also compare to the true values directly. Usually, the contributions of

a i N N L O

do not mix with

a i L O

and

a i N L O

, such as ChPT. The contributions of

a i N N L O

can be separated, called

μ a i N N L O

. In order to see how well the fitting

a i N N L O

are, we defined

(19)

P M m o d e l = ∑ i = 1 n ((a i, t r N N L O − a ¯ i, m o d e l N N L O) a i, t r N N L O μ ¯ a i, m o d e l N N L O μ i, t r) 2 / n .

The subscript “tr” means the true values, the subscript “model” means the model which are adopted, and

μ i, t r

is the true value of the

i

-th physical quantity. The notation bar means the expected value.

n

is the number of physical quantities. In this paper

n = 17

. For example,

a i, A N N L O

means

a i N N L O

is fitted by Model A,

μ a i, A N N L O

means only the contribution from

a i N N L O

by Model A. The first fraction on the right side of Eq. (19) is the relative error of

a i N N L O

, while the second fraction is treated as its weight. The weight represents the contribution of

μ ¯ a i, m o d e l N N L O

μ i, t r

. The smaller the PM value is, the better the result is. A larger weight needs a more precise

a ¯ i, m o d e l N N L O

to reduce the PM value. PM value is only used in the example in Section 4, because the true values of this example are known, but in the actual case, the true values are not known.

The next section will evaluate the above models by these evaluation criteria.

4 Model evaluation

In order to quantitatively demonstrate the advantage of Model B based on Bayesian statistics, this section gives an example to fit the parameters similar to LECs. The same as the actual fit of the LECs in Section 6, a group of functions is generated randomly, each group containing 17 different quantities

O i

. They are shown in Eq. (A1) in Appendix A. The power of

t

is similar to the chiral dimension in ChPT. Taylor expanding these functions about

t

, the analytical results at each order can be obtained. The

t

t 2

and

t 3

orders correspond to LO, NLO and NNLO in ChPT, respectively. After the expansion,

t = 1

a i L O

a i N L O

and

a i N N L O

are similar to LO, NLO (

L i r

) and NNLO (

C i r

) LECs in ChPT, respectively.

b i

are some known constants, which are introduced to adjust the convergences of these Taylor series. All parameters

a i L O

a i N L O

a i N N L O

and

b i

are generated randomly and independently. For convenience, the parameters in each function are different, although they have the same name. For example,

b 1

O 1

and

O 2

are different. The values of

b i

and

a i L O

in the example can be found in in Appendix A. In fact, the LO LECs do not appear in the actual ChPT fit in this paper. Hence, we treat them as known constants and do not fit them. This section only discusses the impact from truncation errors, but it does not mention overfitting. Hence, each

O i

only contains one

a i N N L O

, i.e.,

a ~ i N N L O = a i N N L O

Since all the parameters

b i

a i L O

a i N L O

and

a i N N L O

in this example are known, all the analytical results

O i

can be calculated by these parameters directly. In this section, we define all the known values of these parameters as true values. The fitting values of these parameters are called theoretical values, which are fitted by the models in Section 3. In order to distinguish these two types of values, all the true values are marked by a subscript “tr”, such as

a i, t r N L O

, and all the theoretical values are marked by the model name, such as

a i, A N L O

In order to imitate the realistic experiment, the fitting data do not adopt the true values but with some experimental errors

σ i

. The imitative experimental data are generated by the distribution

N (O i, t r, σ i 2)

σ i / O i, t r = 0.02

in the example [26]. For convenience, these imitative experimental data are also called experimental data for short. Their values are in the third column of Tab.2 with a subscript “exp”, respectively.

Because the above true values are known, the true values of

μ L O

μ N L O

and

μ N N L O

can be also calculated analytically. The parameters of truncation errors in Model B₂ are set as Eq. (14) and the description above it. The values of

p

μ e

and

σ e

are given in in Appendix B. Similarly, the true values of the LECs are also known, so their prior distribution are set to the normal distribution

N (μ a i, σ a i 2)

, where

(20)

μ a i N L O / N N L O = N (a i, t r N L O / N N L O, (0.1 a i, t r N L O / N N L O) 2), σ a i N L O / N N L O = 0.5 μ a i N L O / N N L O .

We have deliberately given

μ a i

a deviation from the true value, in order to avoid fit at the true value. The distribution parameters of

μ a i

and

σ a i

at each order are given in in Appendix B.

4.1 The NLO fit of the example

The input parameters in Model B₂ are given in Columns 2 to 6 of in Appendix B. After the NLO fit, we have checked that the obtained Markov chain satisfies the assumption of the detailed balance condition, and the results are reliable. All the other fits in this paper have the same conclusion.

Fig.1 illustrates the distributions obtained by Models A and B₂. The shapes of the lines are similar to normal distributions, although the details have a little difference. We have checked that the boundaries of 68% HPD are almost the same as

1 σ

boundaries of a normal distribution. Hence, we sometimes do not distinguish them. It can be seen that the center values of Model B₂ are more closed to the true values. However, the errors of Model B₂ are larger than those of Model A. This is because Model B₂ considers the errors of the truncation errors, but Model A does not.

The numerical posterior information of

a i N L O

is listed in Tab.1. The WAIC and LOOCV of Model B₂ are the largest, but these values of Model A are the smallest. The WAIC and the LOOCV of Model B₁ are a bit smaller than those of Model B₂, but much larger than those of Model A. This means that Model B₂ gives the best results, but Model A is the worst. Model B₁ obviously improves the results of Model A, but a bit weaker than Model B₂. This conclusion can also be seen from

P c t A, B 1, B 2

. However, most

| P c t σ B 2 |

are still a bit larger than

| P c t σ B 1 |

. This is because the errors of

a i, B 2 N L O

are about half

a i, B 1 N L O

. Overall,

a i, B 2 N L O

is closer to the true value.

Fig.2(a) illustrates the proportions of

O i

at each order. The contributions at NLO and HO from Model B₂ are closer to the true values than those from Model B₁. This is because Model B₂ has utilized more information compared to Model B₁. Despite adopting relatively less information, Model B₁ still satisfies convergence well in its results. However, there are noticeable differences between Models B₁ and B₂ at the HO due to some truncation errors not being accurately estimated. Nevertheless, these discrepancies have a minimal impact on the results of

a i N L O

. Therefore, whether Model B₁ or Model B₂, their results closely approximate the true values. This indicates that even if one does not possess complete knowledge about all physical quantities’ truncation errors, Model B₁ still yields better results compared to Model A.

Tab.2 shows the comparison of the true values, the experimental values and the fitting results from Models B₁ and B₂. It can be seen that the theoretical values from both Model B₁ and Model B₂ are not obviously different from the experimental values and the true values. In particular, the theoretical values obtained by Model B₂ are closer to the true value than those obtained by Model B₁. The 1

σ

errors from Model B₁ and Model B₂ are roughly equal to the experimental data, but Model B₂ has smaller errors. Most true values fall within 1

σ

intervals of the theoretical values. A few true values are in the

1 σ

2 σ

intervals. No true values exceed the

2 σ

intervals. Tab.2 also indicates that more information leads to a better result.

4.2 The NNLO fit of the example

In the NNLO fit, the priors of

a i N L O

and

a i N N L O

in Models A and B₁ are the same as those discussed in Sections 3.2 and 3.3. The priors in Model B₂ adopt Eq. (15), and the parameters are given in Columns 7 to 11 of in Appendix B.

The numerical NNLO fitting results of

a i N L O

obtained by Models A, B₁ and B₂ are shown in Rows 12 to 20 of Tab.1. The NNLO fitting results of

a i N N L O

obtained by Models A, B₁ and B₂ are shown in Tab.3. Besides WAIC and LOO, the last row also gives the PM value defined in Eq. (19).

Tab.1 shows that the best results of

a i N L O

are obtained by Model B₂. The NNLO

P c t B 1

(

P c t σ B 1

P c t B 2

(

P c t σ B 2

) and their NLO values show that most of the results are improved. There exists a significant difference between the NLO and the NNLO results. This indicates that even though the NNLO prior PDFs are calculated from the NLO posterior PDFs, the NNLO fitting

a i N L O

does not stay at the prior PDFs, as it can change to the other ranges. In other words, the NNLO fit is not a repeated NLO fit.

Tab.3 shows that there are significant differences between

P c t A

(

P c t σ A

) and

P c t B 1

(

P c t σ B 1

) for

a i N N L O

. Although a few

| P c t B 1 |

have large values (the largest is 316.8%), and several

| P c t σ B 1 |

also have large values, Model B₁ still has a significant improvement over Model A. This can also be noticed from their PM values, which change significantly. Similarly, Model B₂ also shows a more significant improvement in the results. Most

P c t B 2

and

P c t σ B 2

are smaller than those from Models A and B₁. It can be seen that for the NNLO fit, the more useful information is known, the better the fitting results are.

Fig.2(b) illustrates the distributions obtained by Models A and B₂ at each order. Tab.4 gives a comparison among the true values, the experimental values, the fitting results from both Models B₁ and B₂. Both of them indicate the same conclusion as the NLO fit. Model B₂ can give better predictions of the truncation errors and the theoretical values.

4.3 Discussion

In the NLO fit, we have also removed one

O i

and fitted the rest. The results are almost no different from the 17-input fit. Moreover, the 16-input fit can predict the 17th quantities well. This also shows that our model has a good predictive ability.

We have fitted other examples and obtained the same conclusion. If an example converges faster than the example in this paper, but the experimental errors and the NNLO contributions are at the same order, the experimental errors will have an impact on the HO values. The NNLO fitting results are a little worse. An example of this type can be downloaded from the source file in the arXiv version of this paper (arXiv: 2311.10423).

5 Observables and inputs

In order to fit the actual data in ChPT and compare the results by different methods, almost the same physical quantities are chosen as those in Refs. [7, 8], besides the covariance matrix of

π π

scattering lengths

a 00

a 02

and the two-flavor LECs

l ¯ 1

l ¯ 2

and

l ¯ 4

is considered.

In Refs. [7, 8], 12 input values are used in the NLO fit, i.e., the quark mass ratio

m s / m^

[6, 24, 67, 68], the ratio of decay constants of

K

meson and

π

meson

F K / F π

[6, 7, 67, 68], the shape factors

F

and

G

at threshold and their slope

f s

g p

f s ′

and

g p ′

for

K ℓ 4

form factors [24],

π π

scattering lengths

a 00

and

a 02

[25],

π K

scattering lengths

a 0 1 / 2

and

a 0 3 / 2

[69], pion scalar radius

⟨ r 2 ⟩ S π

in the form factor

F S π (t)

. In addition, there are 5 more input values added for the NNLO fit, i.e., the pion scalar curvature

c S π

of the pion scalar form factor [69] and four two-flavor LECs

l ¯ i (i = 1, …, 4)

[25, 70]. The values of these 17 physical quantities are listed below. In this paper, both 12 and 17 inputs are considered in the NLO fit for comparison.

The values of

m s / m^

and

F K / F π

are

(21)

m s m^= 27. 3 − 1. 3 + 0. 7, F K F π = 1. 199 ± 0. 003.

The values of

f s

g p

f s ′

and

g p ′

are

(22)

f s = 5. 712 ± 0. 032, f s ′ = 0. 868 ± 0. 049, g p = 4. 958 ± 0. 085, g p ′ = 0. 508 ± 0. 122.

The values of

π π

scattering lengths

a 00

a 02

and the three relevant two-flavor LECs are

(23)

a 00 = 0.220 ± 0.005, a 02 = − 0.0444 ± 0.0010, l ¯ 1 = − 0.4 ± 0.6, l ¯ 2 = 4.3 ± 0.1, l ¯ 4 = 4.4 ± 0.2.

The covariance matrix of

a 00

a 02

and

l ¯ 1

l ¯ 2

l ¯ 4

is listed in Tab.5.

We have tested whether the covariance matrix is present or not, it has a slight impact on the final fitting results, because the errors of

l ¯ i

themselves are very large. Of course, in order to make the results more statistically significant, the covariance matrix is considered in the global fit.

The experimental values of

π K

scattering lengths

a 0 1 / 2 m π

and

a 0 3 / 2 m π

are

(24)

a 0 1 / 2 m π = 0.224 ± 0.022, a 0 3 / 2 m π = − 0.0448 ± 0.0077.

The experimental values of the scalar radius

⟨ r 2 ⟩ S π

and the pion scalar form factor

c S π

are

(25)

⟨ r 2 ⟩ S π = 0.61 ± 0.04 f m 2, c S π = 11 ± 1 G e V − 4 .

For

l ¯ 3

, the following result is adopted [70]:

(26)

l ¯ 3 = 3.2 ± 0.7.

6 Fitting the LECs in ChPT

This section adopts the Bayesian Model B mentioned in Section 3.3 to perform a global fit, in order to obtain a new set of some NLO and NNLO LECs. The truncation errors are considered in the fit. Most references in this paper indicate that all

L i r

(

C i r

) are at the order about

10 − 3

(

10 − 6

). Following the preparation in Section 3.1, they need to be first normalized by multiplying a factor

103

(

106

), respectively.

6.1 The NLO fitting $L i r$ by Model A

Although this paper does not adopt the minimum

χ 2

method [6–8] to fit

L i r

, it can still obtain similar results from the NLO fit by Model A. The fit does not add the covariance matrix and does not consider the truncation errors, in order to compare with the results in Ref. [7]. The fitting results with the first 12 inputs in Section 5 are shown in Tab.6. For comparison, the results in Ref. [7] are also given. Free fit means no assumptions in the fit. Otherwise,

L 4 r

are assumed to be some fixed values. It can be seen that these two approaches indeed give very close results. The classical statistics is very similar to the Bayesian statistic. The slight differences come from the prior of

L i r

. This proves that they are equivalent laterally. However, Bayesian statistic is easier to introduce extra information. The minimum

χ 2

method can also add some constraints in the definition of

χ 2

[6–8], but this information is restricted. For example, the prior PDF of LECs cannot be embodied in. In addition, the modified

χ 2

destroys the original definition of

χ 2

. In other words, the new

χ 2

may not satisfy a

χ 2

distribution in fact.

6.2 The NLO fitting $L i r$

In order to fit

L i r

, a similar approach to that of the example in Section 4 is adopted, but the parameters in HO are slightly different from the example.

m s / m^| 1

m s / m^| 2

F K / F π

f s

g p

a 00

a 02

a 0 1 / 2 m π

a 0 3 / 2 m π

l ¯ 1

l ¯ 2

l ¯ 3

and

l ¯ 4

are the same as the expansion in Eq. (10).

f s ′

g ′

⟨ r 2 ⟩ S π

and

c S π

involve a numerical differentiation. They are estimated with the method in Section 3.4. We have gotten some information about the higher-order experimental data and the range of the LECs, so the parameters are set in a way that is between Model B₁ and Model B₂. Therefore, from here, all data are fitted using Model B. In this subsection, besides fitting the whole 17 inputs (Model B¹⁷), we also fit the first 12 inputs (Model B¹²) in Section 5 for comparing to Refs. [7, 8].

The setting parameters can be found in Columns 2 to 7 in in Appendix B, the parameters about

a 0 1 / 2 m π

and

a 0 3 / 2 m π

are given by Ref. [7], which indicates that their convergences have been broken. The values about

f s

and

a 00

are given by their NNLO distributions, which are statistically obtained from the ranges of

L i r

and

C i r

collected by Refs. [7, 8, 71] and the references in them. The other parameters are given the same as Model B₁. The prior of

L i r

is given in Columns 2 and 5 in in Appendix B. They refer to the

L i r

ranges given in Refs. [7, 8, 71] and the references in them. Because the values in the different references are not very close, the prior ranges are wide enough to cover all possible ranges.

The numerical results of both fits can be found in Tab.7. It can be seen that the results obtained by both Models B¹² and B¹⁷ are close to the NNLO results in Refs. [7, 8]. Moreover, both of them also satisfy the large-

N c

limit, i.e.,

2 L 1 r − L 2 r

L 4 r

and

L 6 r

closing to zeros, although it does not give a strong prior of

L 4 r

. This shows that the contributions from truncation errors have a great impact on the NLO fit. It is also very possible that the truncation errors cannot be ignored in the NNLO fit. In addition, all theoretical errors from Model B are slightly larger than those in Refs. [7, 8]. This is because Ref. [8] does not consider the errors caused by the truncation errors. Ref. [7] even does not consider the truncation errors. Model B cannot only estimate these truncation errors, but also considers their PDFs. These PDFs lead the fitting errors to be slightly larger than those in Refs. [7, 8]. However, the difference is not very large, because the truncation errors are not very large. It also shows that the change between 12 and 17 inputs is not very large. The relative difference does not exceed 20%. However, since more inputs are added, all theoretical errors became smaller. In addition, since Model B¹² and Model B¹⁷ do not adopt the same inputs, the WAIC and LOOCV cannot be adopted as model evaluation criteria. Hence, we do not give these two values. The following discussion is based on the results of Model B¹⁷, because the fit becomes more accurate as the input value increases. The red part in Fig.3 is the corner plot of

L i r

with 17 inputs, from which one can see both the distributions and the potential correlations between

L i r

Tab.8 lists the 17-input theoretical contributions at each order.

l ¯ i r

is replaced by

l i r

[2], because

l i r

has a better convergence, theoretically. It can be seen that most expansions at each order conform to the convergence hypothesis very well. Most LO values contribute more than 70%, most NLO values contribute within 10% to 23%, and most HO values contribute less than 10%. All these percentages are neither too large nor too small. All theoretical results agree well with the experimental data. The ratios of the adjacent two orders are about 0.2, except for

a 0 1 / 2

and

a 0 3 / 2

, which HO contributions are larger than the NLO ones. This situation also exists in Refs. [7, 8]. There are two reasons. One is that the experimental values of both

a 0 1 / 2

and

a 0 3 / 2

are not very precise. Compared to

a 00

and

a 02

, their errors are too large and the estimating truncation errors are not so precise. It may lead to a poor convergence. The second reason is that there indeed exist broken convergence problems in the expansions of

a 0 1 / 2

and

a 0 3 / 2

. These two reasons are related to a more precise experiment and theoretical calculation, and we do not discuss it anymore in this paper. However, although the NLO fitting results of

a 0 1 / 2

and

a 0 3 / 2

in Ref. [8] are converged, it assumes a geometric sequence model. Ref. [7] also exists this problem. However, Model B introduces the priors and has a wider scope of application. A better prior can predict its theoretical value within a more reasonable range. In addition, the total contribution of

l 3 r

is basically occupied by the NLO and its HO value tends towards 0. This is because the error of

l 3 r

itself is very large, which is about 3.7 times its expected value. Therefore, the contribution of

l 3 r

in the fit becomes very small, and the fitting expected value can be far away from the experimental expected value. Hence, adopting the experimental values of

l 3 r

as a constraint to constrain LECs in Ref. [8] seems not particularly good. Model B adopts both the convergence assumption and the prior PDFs. It can handle most precise data, so most results also conform with the convergence assumption very well. Only a few results with poor convergence, because of the problem itself or the large experimental errors.

Refs. [7, 8] fit the NLO LECs only with the first 12 inputs in Section 5, because the remaining five physical quantities have zero value in the LO. Therefore, if the truncation errors are not considered, the NLO fit does not contain the NNLO contribution. The results would exhibit a large deviation, because the HO contributions may lead to large influences. Although Ref. [8] can estimate the truncation errors, it requires at least two-order values because of a geometric-sequence model. Hence, this model cannot work for these five physical quantities. At present, the Bayesian method only requires at least one-order values to estimate the truncation errors. In other words, with Model B, even physical quantities with zero LO can be used as part of data fitting in NLO. Therefore, we also perform a full fit of all 17 physical quantities at the NLO.

6.3 The NNLO fitting $L i r$ and $C ~ i$

The

C i r

to be fitted at the NNLO in this paper is the same as those in Ref. [8]. There exist 38

C i r

, while the number of observables are 17. Hence, these 38

C i r

are combined into 17 linearly independent

C ~ i

before the fit. The definitions of

C ~ i

are in Appendix A in Ref. [8]. In the NNLO fit,

L i r

and

C ~ i

are fitted simultaneously using the approach mentioned in Section 3.4.

The setting parameters are placed in Columns 8 to 10 in in Appendix B. All the parameters are given as Model B₁, because we have known nothing about the truncation errors. The prior of

C ~ i

can be found in Columns 6 and 7 in in Appendix B. They are referred to the

C i r

ranges given in Tab.9 in Ref. [8]. The blue part in Fig.3 shows the NNLO fitting corner plot of

L i r

. Column 4 in Tab.7 lists the NNLO fitting results of

L i r

. Both Fig.3 and Tab.7 indicate that there is no significant change of the theoretical expected values between the NLO and the NNLO fit. In addition, the NNLO fitting

L i r

and their correlations with smaller theoretical errors, because it is the introduction of the NNLO contributions. Tab.7 also indicates that the difference between the 17 inputs at NNLO and 12 or 17 inputs at NLO are not very large, all within 20%. This indicates that this method is stable and does not cause an obvious change of

L i r

as the order increases. This is exactly one of the motivations in this paper.

in Appendix B gives the posterior distributions of

C ~ i

. The introduction of the constraints in Eqs. (5)−(7) causes some

C ~ i

to deviate from normal distributions, but not very seriously. Tab.9 shows the numerical results of

C ~ i

. Compared with those results in Ref. [8], all standard deviations are slightly larger. The reason is that Eq. (11) considers the errors of the truncation errors and enlarges the theoretical errors.

Tab.10 gives the theoretical contributions at each order with the NNLO fit. It can be seen that most physical quantities satisfy the chiral convergence very well, except for

a 0 1 / 2

10 a 0 3 / 2

l 2 r

and

l 3 r

. This situation also exists in the NNLO fit and in Ref. [8]. The reason has been discussed in Section 6.2. It also leads to a large theoretical error of

l 3 r

. If a set of more precise experimental data are introduced, this problem may not exist anymore.

6.4 The NNLO fitting $C i r$

This section discusses the fit about

C i r

C ~ i

, which have been determined in Tab.9, are linear combinations of

C i r

. Although the number of

C ~ i

is less than the number of

C i r

, three

C i r

can be determined by solving the linear equations [8]. However, some of these values will be one to two orders of magnitude times larger than those in the other references. Some constraints are required to be introduced. For the other unsolvable

C i r

, their distribution is solved by the Monte Carlo method [8]. Although the approach in Ref. [8] can solve this problem, its efficiency is very low, and it needs to take a lot of time. Therefore, this paper adopts the MCMC algorithms in Section 2. We have repeated the computation many times, and some similar results are obtained. Randomness does not affect the results obtained by this method. The difference is that all the prior PDFs of parameters

C i r

are all set to the different uniform distributions. The boundaries of these prior uniform distributions are the same as Eq. (38) in Ref. [8]. The reason to use the prior uniform distributions instead of a prior normal distribution is that we want to explore the boundary dependence of each

C i r

in this overfitting problem. Normal distributions would generate fewer samples near the boundaries, and the efficiency is low.

Figure 5 in Appendix B illustrates the posterior distributions of

C i r

. It can be seen that different

C i r

have different shapes.

C i r (i = 3, 7, 8, 10, 16, 17, 18, 20, 22, 23, 28, 30, 32, 33, 36, 63, 66, 69, 83, 88, 90)

have a large probability near both boundaries. Their posterior PDFs are dependent on both sides. In addition,

C i r (i = 2, 6, 26, 29, 34)

only depend on one side. This can also be seen from their posterior distributions. One side has a shape similar to a half-Gaussian distribution. The constraint of these

C i r

at this side is reliable, but the other side gives no constraint of these

C i r

. Finally, these twelve

C i r (i = 1, 4, 5, 11, 12, 13, 14, 15, 19, 21, 25, 31)

give Gaussian-like posterior PDFs, so these twelve results have higher credibility. Of course, 17 data to fit 38

C i r

is far from adequate. There exists an overfitting problem. Hence, some

C i r

are boundary-dependent. This property is similar to those in Ref. [8].

Tab.11 gives the fitting values of

C i r

and compares the results in Refs. [7, 8, 71]. The brackets “[” and “]” denote that the results are strongly dependent on the lower and the upper boundaries, respectively. The parentheses “(” and “)” denote that the results are weakly dependent on the lower and the upper boundaries, respectively. We have tried to double the boundaries, the strong-dependent boundaries deviate from the original values a lot, while weak-dependent boundaries change the original values slightly. Of course, the boundaries chosen in Ref. [8] are wide enough, they cover almost all results in the other references [6, 7, 19, 71–80]. Hence, the true values have a large probability in the intervals in Tab.11.

7 Discussion and summary

This paper proposes a more general Bayesian model (Model B) with the truncation errors. This model is based on the idea of a simple truncation-error model [8] and the Bayesian model framework [28]. Compared to Refs. [7, 8], there are some advantages in Model B.

i) This model can transform the understanding of ChPT into the prior knowledge during the fitting process, containing the information of the LECs, the convergence of ChPT and the truncation errors. The prior information can be conveniently introduced by Eqs. (5)−(7). It does not need the other assumptions, such as the geometric-sequence assumption in Ref. [8]. It can also give a set of more precise NLO fitting LECs. A similar result is obtained at the NNLO fit in Ref. [7], see Tab.7. Hence, there are good reasons to believe that the NNLO fitting LECs are also more precise, although there lacks the higher-order fitting result to be compared to.

ii) With the help of the MCMC method, the distributions of the LECs can be obtained, and the computational speed is faster. The computational time of Model B is the shortest. The Bayesian method has another inherent advantage. Some clear distribution figures of LECs can be obtained, because Bayesian statistics can give more points in a given time. Therefore, one cannot only obtain the expected values and errors of the LECs, but also their distributions. Refs. [7, 8] cannot give the distributions of LECs, although they can give the errors.

iii) Model B gives a general fitting method. It can be used to fit the other problems. The two extremes of this model (Models B₁ and B₂) have been evaluated by a toy example in Section 4. It confirms that more prior information indeed gives more precise results. With the quantified evaluation criteria in Section 3.5, one can see the improvement of the prior information more clearly. The actual ChPT fit is between the two extremes. It is better than Model A. However, Model A gives a similar result as the minimum

χ 2

method in Ref. [7].

iv) For the NNLO LECs

C i r

, more smooth PDFs are given, comparing to Ref. [8] (Ref. [7] does not give PDFs). With these PDFs, one can see how the

C i r

depend on the boundaries more intuitively.

v) There also exist some slight improvements in this paper. The covariance matrix given in Ref. [25] is considered. The results are insensitive to the initial conditions, compared to Ref. [7].

In order to test the effectiveness of the model, one example is randomly generated, in order to imitate the actual ChPT. Some parameters

a i

and some quantities

O i

are introduced, which imitate the LECs and the experimental data, respectively. The exact values of

a i

and

O i

are known, and they are treated as the true values. Model A, which does not consider the truncation errors, is also introduced, in order to compare two ideal cases of Model B. One case knows nothing about the truncation errors, except the orders of magnitude. The other one knows the distributions of the truncation errors. The fitting results indicate that the prior information of the truncation errors can improve the fit greatly, even though this information is not so precise. Hence, Model B is adopted to fit the actual ChPT data.

In the actual ChPT fit, it indicates that the Bayesian method without the truncation errors are similar to the classical statistics. In other words, the classical statistics can be treated as a special case of Bayesian statistics. However, Bayesian statistics can be applied more widely. With the help of Model B, some

L i r

and

C i r

(defined in Ref. [64]) are fitted at the NLO and the NNLO. The fitting

L i r

are almost unchanged between the NLO and the NNLO fit. The change between 12 and 17 input data are also small, but all the theoretical errors decrease for the 17 inputs, because of the more precise estimation of the truncation errors. Model B also solves a problem in the free fit, which leads to

L 4 r

and

L 6 r

being very large, but they are zeros in the large-

N C

limit. Because the number of

C i r

to be fitted is larger than the number of the experimental data, some independent

C ~ i r

are fitted first, which are the linear combinations of the

C i r

to be fitted. From the posterior PDFs of

C i r

, the reliable intervals of twelve

C i r

are obtained, and five

C i r

are only constrained with the upper or the lower boundary of the intervals, and the other 21

C i r

are strongly dependent on both boundaries. It needs more experimental data to confirm these uncertain

C i r

. Because all the

C ~ i r

does not exist overfitting, they are more precise than

C i r

. If one knows some more values of these

C i r

, some other

C i r

can be restrained by these

C ~ i r

. For the physical quantities to be fitted, most theoretical contributions are well convergent, except

a 0 1 / 2

and

a 0 3 / 2

. It possibly comes from the large experimental errors, or some of these quantities are indeed not convergent. This needs more precise experimental data and theoretical calculations in the future. It can be seen that Model B can estimate the truncation errors very well.

Some input parameters are very rough, such as Eqs. (6) and (7). A more precise estimation beyond the simple convergence assumption will be studied in the future work. In addition, if more analytical and experimental results are introduced, the results should be more precise. However, the NNLO theoretical calculation is complicated. It needs to be studied in the future. In addition, this approach can also be used to fit the other LECs, such as pion-nucleon, meson-baryon chiral LECs. However, both their experimental data and theoretical results are less than the mesonic LECs at present.

In conclusion, truncation errors usually cannot be ignored in the global fit, and some prior information can improve the fit greatly, even though this information is sometimes not very exact. Model B provides a feasible implementation scheme. A new set of more reliable

L i r

and

C i r

are fitted by Model B. This model cannot only fit LECs in ChPT, but also fit other parameters in the other EFTs and the perturbation theory.

8 Appendix A: One testing example

Eq. (A1) gives the functions of the example Section 3. For convenience, the parameters with the same name in the different functions are different. The values of

b i

and

a i L O

can be found in . The values of

a i N L O

and

a i N N L O

are given in the second row in Tab.1 and the second column III, respectively, which are marked by a subscript “tr”.

O 1 = b 1 exp ⁡ (a 1 N N L O t 3 + a 4 N L O b 2 t 2 + a 7 N L O b 3 t 2 − a 1 L O t) − b 1, O 2 = b 1 sin ⁡ (b 2 exp ⁡ (b 5)) − b 1 sin ⁡ (b 2 exp ⁡ (− a 2 N N L O b 3 t 3 − a 8 N L O b 4 t 2 + b 5 exp ⁡ (− a 1 N L O b 6 t 2) − a 2 L O t)), O 3 = b 1 ln ⁡ (− a 3 N N L O b 3 t 3 − a 1 N L O b 2 t 2 − a 6 N L O b 4 t 2 − a 3 L O t + 1), O 4 = b 1 exp ⁡ (1 − b 2) − b 1 exp ⁡ (− a 4 N N L O b 7 t 3 + a 4 N N L O t 3 − a 1 N L O b 6 t 2 + a 3 N L O b 5 t 2 − a 4 N L O b 4 t 2 − b 2 cos ⁡ (a 3 N L O b 3 t 2) + a 4 L O t + 1), O 5 = − b 1 ln ⁡ (a 5 N N L O b 4 t 3 − a 5 N N L O b 6 t 3 − a 6 N L O b 2 b 3 t 2 − a 8 N L O b 5 t 2 − a 5 L O t + 1), O 6 = b 1 ln ⁡ (b 2 ln ⁡ (− a 6 N N L O b 4 t 3 + a 2 N L O b 3 t 2 + a 6 N L O b 5 t 2 + a 6 L O t + 1) + 1), O 7 = − b 1 exp ⁡ (b 2) + b 1 exp ⁡ (a 1 N L O b 6 t 2 + b 2 exp ⁡ (a 7 N N L O t 3 + a 5 N L O b 3 t 2 + a 5 N L O t 2 + a 8 N L O t 2) + b 4 sin ⁡ (a 4 N L O b 5 t 2) + a 7 L O t), O 8 = b 1 exp ⁡ (− b 2 exp ⁡ (− a 8 N N L O b 3 t 3 + a 3 N L O b 4 t 2 + a 7 N L O t 2 − b 5 a 8 L O t)) − b 1 exp ⁡ (− b 2), O 9 = b 1 ln ⁡ (b 2 + 1) − b 1 ln ⁡ (a 9 N N L O b 4 t 3 + a 9 N N L O b 5 t 3 + a 9 N N L O b 8 t 3 + b 2 exp ⁡ (a 5 N L O b 3 t 2) − b 6 sin ⁡ (a 9 N N L O b 7 t 3) + a 9 L O t + 1), O 10 = − b 1 sin ⁡ (b 2 + b 4 sin ⁡ (b 5)) + b 1 sin ⁡ (− a 10 N N L O t 3 − a 4 N L O t 2 + b 2 exp ⁡ (− a 5 N L O b 3 t 2) + b 4 sin ⁡ (b 5 exp ⁡ (a 2 N L O b 6 t 2)) + a 10 L O t), O 11 = b 1 ln ⁡ (− b 2 sin ⁡ (a 11 N N L O b 6 t 3 + a 2 N L O b 3 t 2 − a 3 N L O b 4 t 2 + a 7 N L O b 5 t 2 − a 11 L O t) + 1), O 12 = b 1 ln ⁡ (b 2 sin ⁡ (− a 12 N N L O b 6 t 3 + a 12 N N L O t 3 + a 2 N L O b 3 t 2 + a 4 N L O t 2 + a 5 N L O t 2 + a 6 N L O b 4 b 5 t 2 + a 12 L O t) + 1), O 13 = b 1 exp ⁡ (− a 8 N L O b 8 t 2 − b 2 sin ⁡ (a 4 N L O b 3 t 2) − b 4 exp ⁡ (a 13 N N L O b 6 t 3 − a 5 N L O b 5 t 2 + a 6 N L O b 7 t 2 + a 13 L O t) + a 13 L O t) − b 1 exp ⁡ (− b 4), O 14 = − b 1 ln ⁡ (− a 14 N N L O b 4 t 3 − b 2 ln ⁡ (a 14 N N L O b 3 t 3 + 1) − b 5 sin ⁡ (a 2 N L O b 6 t 2) + a 14 L O t + 1), O 15 = − b 1 exp ⁡ (b 3) + b 1 exp ⁡ (− a 15 N N L O b 2 t 3 − a 7 N L O b 5 t 2 − a 7 N L O b 7 t 2 − a 8 N L O b 6 t 2 + b 3 cos ⁡ (a 7 N L O b 4 t 2) − a 15 L O t), O 16 = − b 1 sin ⁡ (ln ⁡ (b 6 + 1) + 1) − b 1 sin ⁡ (a 16 N N L O b 5 t 3 + a 3 N L O b 2 t 2 + a 5 N L O b 4 t 2 + a 16 L O t − ln ⁡ (b 6 exp ⁡ (a 1 N L O b 7 t 2) + 1) − exp ⁡ (− a 3 N L O b 3 t 2)), O 17 = b 1 ln ⁡ (− b 2 exp ⁡ (b 3) − b 5 sin ⁡ (b 6) + 1) − b 1 ln ⁡ (− b 2 exp ⁡ (b 3 exp ⁡ (− a 8 N L O b 4 t 2)) − b 5 sin ⁡ (b 6 exp ⁡ (− C 17 b 7 t 3)) + a 17 L O t + 1) .

9 Appendix B: Some tables and figures for the fits

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	S. Weinberg, Phenomenological Lagrangians, Physica A 96(1−2), 327 (1979)

[2]	J. Gasser and H. Leutwyler, Chiral perturbation theory to one loop, Ann. Phys. 158(1), 142 (1984)

[3]	J. Gasser and H. Leutwyler, Chiral perturbation theory: Expansions in the mass of the strange quark, Nucl. Phys. B 250(1−4), 465 (1985)

[4]	J. Bijnens, G. Colangelo, and G. Ecker, The mesonic chiral Lagrangian of order p⁶, J. High Energy Phys. 02, 020 (1999)

[5]	J. Bijnens, N. Hermansson-Truedsson, and S. Wang, The order p⁸ mesonic chiral Lagrangian, J. High Energy Phys. 01(1), 102 (2019)

[6]	J.BijnensI. Jemos, A new global fit of the L^r at next-to-next-to-leading order in chiral perturbation theory, Nucl. Phys. B 854(3), 631 (2012)

[7]	J. Bijnens and G. Ecker, Mesonic low-energy constants, Annu. Rev. Nucl. Part. Sci. 64(1), 149 (2014)

[8]	Q. H. Yang, W. Guo, F. J. Ge, B. Huang, H. Liu, and S. Z. Jiang, New method for fitting the low-energy constants in chiral perturbation theory, Phys. Rev. D 102(9), 094009 (2020)

[9]	K. U. Can, G. Erkol, M. Oka, and T. T. Takahashi, Look inside charmed-strange baryons from lattice QCD, Phys. Rev. D 92(11), 114515 (2015)

[10]	K. U. Can, G. Erkol, B. Isildak, M. Oka, and T. T. Takahashi, Electromagnetic structure of charmed baryons in lattice QCD, J. High Energy Phys. 05(5), 125 (2014)

[11]	H.BahtiyarK. U. CanG.ErkolM.OkaT.T. Takahashi, Ξ_cγ → Ξ_c^′ transition in lattice QCD, Phys. Lett. B 772, 121 (2017)

[12]	T.M. YanH. Y. ChengC.Y. CheungG.L. LinY.C. Lin H.L. Yu, Heavy quark symmetry and chiral dynamics, Phys. Rev. D 46(3), 1148 (1992) [Erratum: Phys. Rev. D 55, 5851 (1997)]

[13]	R.J. DowdallC.T. H. DaviesG.P. LepageC.McNeile, V_us from π and K decay constants in full lattice QCD with physical u, d, s and c quarks, Phys. Rev. D 88, 074504 (2013), arXiv:

[14]	A. Bazavov, . (MILC), . Results for light pseudoscalar mesons, PoS LATTICE 2010, 074 (2010)

[15]	V. Bernard and E. Passemar, Chiral extrapolation of the strangeness changing Kπ form factor, J. High Energy Phys. 04, 001 (2010)

[16]	A.Bazavov. (MILC), ., MILC results for light pseudoscalars, in: Proceedings of 6th International Workshop on Chiral dynamics: Bern, Switzerland, July 6–10, 2009, PoS CD09, 007 (2009), arXiv:

[17]	A. Bazavov, D. Toussaint, C. Bernard, J. Laiho, C. DeTar, L. Levkova, M. B. Oktay, S. Gottlieb, U. M. Heller, J. E. Hetrick, P. B. Mackenzie, R. Sugar, and R. S. Van de Water, Nonperturbative QCD simulations with 2+1 flavors of improved staggered quarks, Rev. Mod. Phys. 82(2), 1349 (2010)

[18]	M.GoltermanK. MaltmanS.Peris, NNLO low-energy constants from flavor-breaking chiral sum rules based on hadronic τ-decay data, Phys. Rev. D 89(5), 054036 (2014)

[19]	P. Colangelo, J. J. Sanz-Cillero, and F. Zuo, Holography, chiral Lagrangian and form factor relations, J. High Energy Phys. 11, 012 (2012)

[20]	Z. H. Guo, J. J. Sanz Cillero, and H. Q. Zheng, Partial waves and large N_C resonance sum rules, J. High Energy Phys. 06, 030 (2007)

[21]	Z.H. GuoJ. J. Sanz-CilleroH.Q. Zheng, O(p⁶) extension of the large-N_C partial wave dispersion relations, Phys. Lett. B 661, 342 (2008), arXiv:

[22]	Z.H. GuoJ. J. Sanz-Cillero, ππ-scattering lengths at O(p⁶) revisited, Phys. Rev. D 79, 096006 (2009)

[23]	J.BijnensG. ColangeloJ.Gasser, K_l4 decays beyond one loop, Nucl. Phys. B 427(3), 427 (1994)

[24]	G.AmorósJ.BijnensP.Talavera, K_ℓ4 form-factors and π‒π scattering, Nucl. Phys. B 585, 293 (2000) [Erratum: Nucl. Phys. B 598, 665(2001)], arXiv:

[25]	G.ColangeloJ. GasserH.Leutwyler, ππ scattering, Nucl. Phys. B 603(1–2), 125 (2001)

[26]	M.R. SchindlerD.R. Phillips, Bayesian methods for parameter estimation in effective field theories, Ann. Phys. 324, 682 (2009) [Erratum: Ann. Phys. 324, 2051 (2009)], arXiv:

[27]	R.J. FurnstahlD.R. PhillipsS.Wesolowski, A recipe for EFT uncertainty quantification in nuclear physics, J. Phys. G 42(3), 034028 (2015)

[28]	S. Wesolowski, N. Klco, R. J. Furnstahl, D. R. Phillips, and A. Thapaliya, Bayesian parameter estimation for effective field theories, J. Phys. G 43(7), 074001 (2016)

[29]	J. A. Melendez, S. Wesolowski, and R. J. Furnstahl, Bayesian truncation errors in chiral effective field theory: Nucleon‒nucleon observables, Phys. Rev. C 96(2), 024003 (2017)

[30]	I. Svensson, A. Ekström, and C. Forssén, Bayesian parameter estimation in chiral effective field theory using the Hamiltonian Monte Carlo method, Phys. Rev. C 105(1), 014004 (2022)

[31]	A. Ekström, C. Forssén, C. Dimitrakakis, D. Dubhashi, H. T. Johansson, A. S. Muhammad, H. Salomonsson, and A. Schliep, Bayesian optimization in ab initio nuclear physics, J. Phys. G 46(9), 095101 (2019)

[32]	S. Wesolowski, R. J. Furnstahl, J. A. Melendez, and D. R. Phillips, Exploring Bayesian parameter estimation for chiral effective field theory using nucleon–nucleon phase shifts, J. Phys. G 46(4), 045102 (2019)

[33]	I. K. Alnamlah, E. A. C. Pérez, and D. R. Phillips, Effective field theory approach to rotational bands in odd-mass nuclei, Phys. Rev. C 104(6), 064311 (2021)

[34]	C. J. Yang, A. Ekström, C. Forssén, and G. Hagen, Power counting in chiral effective field theory and nuclear binding, Phys. Rev. C 103(5), 054304 (2021)

[35]	A. E. Lovell, F. M. Nunes, M. Catacora-Rios, and G. B. King, Recent advances in the quantification of uncertainties in reaction theory, J. Phys. G 48(1), 014001 (2020)

[36]	D. R. Phillips, R. J. Furnstahl, U. Heinz, T. Maiti, W. Nazarewicz, F. M. Nunes, M. Plumlee, M. T. Pratola, S. Pratt, F. G. Viens, and S. M. Wild, Get on the BAND Wagon: A Bayesian framework for quantifying model uncertainties in nuclear dynamics, J. Phys. G 48(7), 072001 (2021)

[37]	P.BedaqueA. BoehnleinM.CromazM.DiefenthalerL.ElouadrhiriT.HornM.KucheraD.Lawrence D.LeeS. LidiaR.McKeownW.MelnitchoukW.NazarewiczK.OrginosY.RoblinM.Scott SmithM.SchramX.N. Wang, A. I. for nuclear physics, Eur. Phys. J. A 57(3), 100 (2021)

[38]	S. Wesolowski, I. Svensson, A. Ekström, C. Forssén, R. J. Furnstahl, J. A. Melendez, and D. R. Phillips, Rigorous constraints on three-nucleon forces in chiral effective field theory from fast and accurate calculations of few-body observables, Phys. Rev. C 104(6), 064001 (2021)

[39]	M. A. Connell, I. Billig, and D. R. Phillips, Does Bayesian model averaging improve polynomial extrapolations? Two toy problems as tests, J. Phys. G 48(10), 104001 (2021)

[40]	Y. H. Lin, H. W. Hammer, and U. G. Meißner, Dispersion-theoretical analysis of the electromagnetic form factors of the nucleon: Past, present and future, Eur. Phys. J. A 57(8), 255 (2021)

[41]	T. Djärv, A. Ekström, C. Forssén, and H. T. Johansson, Bayesian predictions for A = 6 nuclei using eigenvector continuation emulators, Phys. Rev. C 105(1), 014005 (2022)

[42]	B. Acharya and S. Bacca, Gaussian process error modeling for chiral effective-field-theory calculations of np↔dγ at low energies, Phys. Lett. B 827, 137011 (2022)

[43]	D. Odell, C. R. Brune, D. R. Phillips, R. J. deBoer, and S. N. Paneru, Performing Bayesian analyses with AZURE2 using BRICK: An application to the ⁷Be system, Front. Phys. (Lausanne) 10, 888476 (2022)

[44]	A. E. Lovell, A. T. Mohan, T. M. Sprouse, and M. R. Mumpower, Nuclear masses learned from a probabilistic neural network, Phys. Rev. C 106(1), 014305 (2022)

[45]	G. Hagen, S. J. Novario, Z. H. Sun, T. Papenbrock, G. R. Jansen, J. G. Lietz, T. Duguet, and A. Tichai, Angular-momentum projection in coupled-cluster theory: Structure of ³⁴Mg, Phys. Rev. C 105(6), 064311 (2022)

[46]	T. Papenbrock, Effective field theory of pairing rotations, Phys. Rev. C 105(4), 044322 (2022)

[47]	S. S. Li Muli, B. Acharya, O. J. Hernandez, and S. Bacca, Bayesian analysis of nuclear polarizability corrections to the Lamb shift of muonic H-atoms and He-ions, J. Phys. G 49(10), 105101 (2022)

[48]	Q.Y. ZhaiM. Z. LiuJ.X. LuL.S. Geng, Z_cs(3985) in next-to-leading-order chiral effective field theory: The first truncation uncertainty analysis, Phys. Rev. D 106(3), 034026 (2022)

[49]	K. Fraboulet and J. P. Ebran, Addressing energy density functionals in the language of path-integrals I: Comparative study of diagrammatic techniques applied to the (0+0)D O(N)-symmetric φ⁴-theory, Eur. Phys. J. A 59(4), 91 (2023)

[50]	W. Jiang and C. Forssén, Bayesian probability updates using sampling/importance resampling: Applications in nuclear theory, Front. Phys. (Lausanne) 10, 1058809 (2022)

[51]	A. Ekström, C. Forssén, G. Hagen, G. R. Jansen, W. Jiang, and T. Papenbrock, What is ab initio in nuclear theory, Front. Phys. (Lausanne) 11, 1129094 (2023)

[52]	W. I. Jay and E. T. Neil, Bayesian model averaging for analysis of lattice field theory results, Phys. Rev. D 103(11), 114502 (2021)

[53]	M. Catacora-Rios, G. B. King, A. E. Lovell, and F. M. Nunes, Exploring experimental conditions to reduce uncertainties in the optical potential, Phys. Rev. C 100(6), 064615 (2019)

[54]	A. Ekström and G. Hagen, Global sensitivity analysis of bulk properties of an atomic nucleus, Phys. Rev. Lett. 123(25), 252501 (2019)

[55]	X.ZhangK. M. NollettD.R. Phillips, S-factor and scattering-parameter extractions from ³He + ⁴He → ⁷Be + γ, J. Phys. G 47, 054002 (2020)

[56]	B. K. Luna and T. Papenbrock, Low-energy bound states, resonances, and scattering of light ions, Phys. Rev. C 100(5), 054307 (2019)

[57]	E. Epelbaum, J. Golak, K. Hebeler, H. Kamada, H. Krebs, U. G. Meißner, A. Nogga, P. Reinert, R. Skibiński, K. Topolnicki, Y. Volkotrub, and H. Witała, Towards high-order calculations of three-nucleon scattering in chiral effective field theory, Eur. Phys. J. A 56(3), 92 (2020)

[58]	N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, Equation of state calculations by fast computing machines, J. Chem. Phys. 21(6), 1087 (1953)

[59]	W. K. Hastings, Monte Carlo sampling methods using Markov chains and their applications, Biometrika 57(1), 97 (1970)

[60]	S. Duane, A. Kennedy, B. J. Pendleton, and D. Roweth, Hybrid Monte Carlo, Phys. Lett. B 195(2), 216 (1987)

[61]	M. D. Homan and A. Gelman, The No-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo, J. Mach. Learn. Res. 15, 1593 (2014)

[62]	J. Salvatier, T. V. Wiecki, and C. Fonnesbeck, Probabilistic programming in python using PyMC3, PeerJ Comput. Sci. 2, e55 (2016)

[63]	P.Gregory, Bayesian Logical Data Analysis for the Physical Sciences, Cambridge: Cambridge University Press, 2005

[64]	J. Bijnens, G. Colangelo, and G. Ecker, Renormalization of chiral perturbation theory to order p⁶, Ann. Phys. 280(1), 100 (2000)

[65]	A.GelmanJ. B. CarlinH.S. SternD.B. DunsonA.Vehtari D.B. Rubin, Bayesian Data Analysis, 3rd Ed., Boca Raton: CPC Press, 2013

[66]	A. Vehtari, A. Gelman, and J. Gabry, Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC, Stat. Comput. 27(5), 1413 (2016)

[67]	G. Amoroós, J. Bijnens, and P. Talavera, Two-point functions at two loops in three flavor chiral perturbation theory, Nucl. Phys. B 568(1−2), 319 (2000)

[68]	J.Bijnens, Chiral perturbation theory, URL: home.thep.lu.se/~bijnens/chpt/ (2019)

[69]	J. Bijnens and P. Dhonte, Scalar form-factors in SU(3) chiral perturbation theory, J. High Energy Phys. 10, 061 (2003)

[70]	J. Gasser, C. Haefeli, M. A. Ivanov, and M. Schmid, Integrating out strange quarks in ChPT, Phys. Lett. B 652(1), 21 (2007)

[71]	S. Z. Jiang, Z. L. Wei, Q. S. Chen, and Q. Wang, Computation of the O(p⁶) order low-energy constants: An update, Phys. Rev. D 92(2), 025014 (2015)

[72]	S. Z. Jiang, Y. Zhang, C. Li, and Q. Wang, Computation of the p⁶ order chiral Lagrangian coefficients, Phys. Rev. D 81(1), 014001 (2010)

[73]	K. Kampf and B. Moussallam, Tests of the naturalness of the coupling constants in ChPT at order p⁶, Eur. Phys. J. C 47(3), 723 (2006)

[74]	M. Jamin, J. A. Oller, and A. Pich, Order p⁶ chiral couplings from the scalar Kπ form-factor, J. High Energy Phys. 02, 047 (2004)

[75]	J.BijnensP. Talavera, K_ℓ3 decays in chiral perturbation theory, Nucl. Phys. B 669(1–2), 341 (2003)

[76]	V. Cirigliano, G. Ecker, M. Eidemuüller, R. Kaiser, A. Pich, and J. Portolés, The ⟨ SPP⟩ Green function and SU(3) breaking in K_ℓ3 decays, J. High Energy Phys. 04, 006 (2005)

[77]	R. Unterdorfer and H. Pichl, On the radiative pion decay, Eur. Phys. J. C 55(2), 273 (2008)

[78]	V. Cirigliano, G. Ecker, M. Eidemüller, R. Kaiser, A. Pich, and J. Portolés, Towards a consistent estimate of the chiral low-energy constants, Nucl. Phys. B 753(1-2), 139 (2006)

[79]	V. Bernard and E. Passemar, Matching chiral perturbation theory and the dispersive representation of the scalar Kπ form-factor, Phys. Lett. B 661(2−3), 95 (2008)