A comparative study on segregation analysis and QTL mapping of quantitative traits in plants—with a case in soybean

Junyi GAI , Yongjun WANG , Xiaolei WU , Shouyi CHEN

Front. Agric. China ›› 2007, Vol. 1 ›› Issue (1) : 1 -7.

PDF (310KB)
Front. Agric. China ›› 2007, Vol. 1 ›› Issue (1) : 1 -7. DOI: 10.1007/s11703-007-0001-3
Research article
Research article

A comparative study on segregation analysis and QTL mapping of quantitative traits in plants—with a case in soybean

Author information +
History +
PDF (310KB)

Abstract

Two approaches of genetic analysis of quantitative traits were compared with a case study on soybean. One approach was the segregation analysis developed by Gai et al. (2003), which utilized information from individuals of one or multiple segregation populations as well as that from parents based on the principles of the major-gene plus polygene inheritance model, mixture distribution, joint maximum-likelihood function, IECM (Iterated Expectation and Conditional Maximization) algorithm, and Akaike’s information criterion and goodness of fit tests. Another approach was quantitative trait locus (QTL) mapping with molecular markers. A recombinant inbred line (RIL) population with 201 families derived from (Kefeng No.1x1138-2) F2:7:10 along with their parents were tested in a randomized block design experiment. The 171 RFLP, 60 SSR, and 79 AFLP molecular markers were used to mark the 201 families. The data of nine traits, i.e., number of days to flowering, number of days to maturity, plant height, number of nodes on main stem, number of pods per node, 100-seed weight, protein content, oil content, and plot yield, were analyzed with the segregation analysis procedure of RIL population with parents (Gai et al., 2003; Zhang and Gai, 2000; Zhang et al., 2001) to detect their genetic system, and those along with the molecular marker data were analyzed with WinQTL Cartographer (Basten et al., 1999; Zeng, 1993, 1994) to detect their QTL system. The results showed that both procedures could detect the main major genes or QTLs, and therefore, could be used as a mutual check and supplement. From the results that most of the traits were mainly controlled by three or four QTLs, it was impressed that the segregation analysis procedure of four major-gene plus polygene mixed inheritance model should be developed to fit the requirements. The results also showed that the QTLs of the involved traits concentrated on several linkage groups, such as C2, B1, F1, M, and N. Finally, the results showed that the experimental sample was not necessarily coincident with the theoretical population according to equality test, symmetry test, and representation test, and therefore, the sample should be checked, tested and then adjusted to fit the theoretical requirements through deleting the extra-biased families and markers.

Keywords

inheritance of quantitative trait / segregation analysis / QTL mapping / soybean

Cite this article

Download citation ▾
Junyi GAI, Yongjun WANG, Xiaolei WU, Shouyi CHEN. A comparative study on segregation analysis and QTL mapping of quantitative traits in plants—with a case in soybean. Front. Agric. China, 2007, 1(1): 1-7 DOI:10.1007/s11703-007-0001-3

登录浏览全文

4963

注册一个新账户 忘记密码

1 Introduction

Fisher (1918) established the earliest genetic model, p =g+e, g =a+d, based on Nelssen Ehle’s multiple factor hypothesis of quantitative traits. Under polygene hypothesis, various approaches for detecting the genetic system (genetic model) of quantitative traits were developed, including generation mean analysis (Mather, 1949; Mather and Jinks, 1989), and genetic variance and covariance analysis (Comstock and Robinson, 1948; Cockerham, 1954; Kempthorne, 1957).

A great number of genetic studies of quantitative traits, especially the studies of quantitative trait locus (QTL) marker analysis, indicated that there existed both major genes and minor genes in a quantitative trait genetic system, not necessarily all being minor genes even with equal effects. Gai and his group indicated that for a QTL system, the major gene plus minor gene model was the general model, while the pure major gene model, or pure minor gene model was the specific case of the general model (Gai et al., 2003; Gai, 2006). Accordingly, Gai and Wang (1998), Wang and Gai (1998), Gai and Zhang (2000), Zhang and Gai (2000a), and Gai et al. (2003) established the procedures of segregation analysis of quantitative trait to detect the genetic system, which was firstly established for genetic analysis of qualitative traits by Mendel and could utilize the information from individuals of segregating generations. This procedure includes the following major steps (Gai et al., 2003). First, under the supposition that the segregating population was composed of component distributions controlled by major gene(s) and modified by both polygenes and environments, seven groups and 32 types of genetic models, including a one major-gene, two major-gene, three-major gene, polygene, mixed one major-gene and polygene, mixed two major-gene and polygene, and a mixed three major-gene and polygene models were set up. Second, the joint maximum-likelihood function was constructed from the tested generations, including single generation and multiple generations to estimate the parameters of component distributions through an IECM (Iterated Expectation and Conditional Maximization) algorithm. Third, the best-fitting genetic model was chosen according to Akaike’s information criterion, a likelihood-ratio test and tests for goodness of fit. Fourth, the related genetic parameters, including gene effects, as well as the genetic variances of major genes and polygenes and their corresponding heritability values were calculated from the estimates of component distributions. The detailed theoretical bases from which the procedure was derived can be referred to the cited literature (Hasselblad, 1966; Akaike, 1977; Choi, 1969; Elkind and Cahaner, 1986, 1990; Dick and Bowden, 1973; Dempster et al., 1977; Jiang et al., 1994; Liu and Rubin, 1994; Wang and Gai, 1997a, 1997b; Zhang and Gai, 2000b; Zhang et al., 2001a). Segregation analysis of quantitative traits, in fact, is a procedure of genetic experiment and data analysis based on the Mendelian method, or in other words, a procedure to fit possible models with segregating data and then to pick up a best fitted one through a series of criteria and tests.

Quantitative trait locus mapping is a procedure utilizing some segregation data in linkageship analysis to locate QTLs to the nearest markers on a molecular genetic map. In comparing the segregation analysis with QTL mapping, both procedures, based on a similar set of genetic assumptions, can use the same set of segregation data, but the former can detect some major gene and polygene effects, does not need a genetic linkage map and therefore does not provide the information about the location of QTLs; while the latter can detect and locate the possible QTLs if a linkage map is available. It is obvious that the precision of both approaches depends on the precision of the experimental data, but the latter further depends on the precision of the genetic linkage map. Conventional breeders who have obtained the segregation data already can use the former, but the latter can be used when not only the segregation data but also the molecular marker data have been obtained.

From a breeder’s point of view, to reveal the inheritance of a quantitative trait is mainly for recognizing some major genes or QTLs with large effects so that the plant breeder can operate breeding procedures to converge the major part of the genes into a same individual. The objective of the present paper is to make a comparative study on segregation analysis and QTL mapping to see if both can provide similar results in recognizing major QTLs with the same set of data.

2 Materials and methods

A recombinant inbred line (RIL) population with 201 families derived from (Kefeng No.1x1138-2) F2:7:10 (derived family in F10 from a F7 plant randomly obtained from a F2:7 line) along with their parents were tested in a randomized block design experiment with 0.7 m x 0.7 m hill-plots, eight replications at Jiangpu Station, Nanjing Agricultural University, Nanjing, China. The number of days to flowering, number of days to maturity, plant height, number of nodes on main stem, number of pods per node, 100-seed weight, protein content, oil content, and plot yield were measured.

The molecular markers, including 171 RFLP, 60 SSR and 79 AFLP markers, plus two morphological markers in a total of 312 ones, were used to mark the 201 families. The methods and procedures of the molecular marker analysis used in the study are omitted here and can be referred to Wu et al. (2001) and Wang (2001).

A set of tests, including equality test, symmetry test, and representation test, by using χ2 criterion were designed to examine the coincidence of the practical RIL sample with the theoretical RIL population under the supposition of the distortion of the population was mainly due to the shifting environments during various seasons of generation derivation rather than the viability difference of gametes and zygotes. The detailed procedure will be given later in the context. As the results of the tests, 17 families from 201 ones and five RFLP markers from 171 ones were performed to be extra-biased segregates or outliers. After those outliers were removed, 184 families and 166 RFLP markers were left for next coincidence tests, which showed a good fit to its theoretical population.

The 184 families and 307 markers (including 166 RFLP, 60 SSR, 79 AFLP, and two morphological markers) were utilized to construct a genetic linkage map by using Map-maker/Exp 3.0 b (Lander and Bostein, 1989; Lander et al., 1987). Among the 307 markers, except two, 305 ones linked in 25 linkage groups with a total length of 3017.9 cM among which 22 groups were corresponding to those of Cregan’s integrated map (Cregan, Jarvik, Bush, Shoemaker, Lark, Kahler, Kaya, Van Toai, Lohnes, Chung, and Specht, 1999; Wang, 2001).

The segregation analysis procedure for RIL population with parents was applied to the obtained data according to Zhang et al. (2001b) and Gai et al. (2003). The same set of data was analyzed for QTL mapping by using WinQTL Cartographer (Basten et al., 1999; Zeng, 1993, 1994).

3 Results

3.1 Segregation analysis

The results from segregation analysis are shown in Tables 1 and 2. The segregation analysis procedure for RIL population with parents can detect a major gene up to three ones plus polygene as a whole. Model A means only 1 major gene, B means 2 major genes, C means only polygene, F means 3 major genes, D means 1 major gene plus polygenes, E means 2 major genes plus polygenes, and G means 3 major genes plus polygenes. For Models B and E, after the first dash “1” means without linkage between the two major genes, “2” means with linkage between the two major genes; while after the second dash, “1” means additive-additive x additive epistasis effect of major gene, “2” means additive effect, “3” means equal additive, “4” means dominance epistasis, “5” means recessive epistasis, and “6” means duplicate epistasis. For the other models, the numbers after the first dash means the same as those after the second dash of Models B and E. At present, the linkageship can be detected only for those models with two major genes, rather than those with more than two major genes due to the very complicated situation in deriving the formulae.

For days to flowering, three major genes were detected with additive and additive x additive epistasis, pretty high major gene heritability and pretty low polygene heritability. A similar situation was for days to maturity and number of nodes, except with larger polygene heritability. For plant height, only two major genes were detected without add.

xadd. effect, but with high major gene heritability and low polygene heritability. A similar case occurred for 100-seed weight except with add. x add. For plot yield, pods per node and oil content, only two major genes were detected with add. X add., medium major gene heritability, and different amount of polygene heritability. Large part of variation was due to environment for yield and number of pods per node. Anyway, no major gene was detected for protein content, but polygene accounted for a major part of genetic variation.

3.2 Quantitative trait locus mapping

The results from QTL mapping through QTL Cartographer are shown in Table 3. For days to flowering, seven QTLs were identified. The most important ones were fd3 and fd4 located on the C2 linkage group and the next important ones were fd7 and fd6 located on the F1 linkage group according to their LOD, r2, and additive effect values. The four QTLs accounted for about most of the total genetic variation. Thirteen QTLs were identified for days to maturity. The most important ones were md1, md2, md3 located on linkage group B1 and md9 located on linkage group G. They accounted for most of the genetic variation. It needs to be explained that the major QTLs of both growth period traits were not on the same linkage groups even though days to flowering being a part of days to maturity.

Twelve QTLs were detected for plant height. The most important ones were ht6, ht4, ht5 and ht7, all located on C2 and accounted for most of the genetic variation. Thirteen QTLs were identified for number of nodes on the main stem.

The most important ones were sn5, sn4, sn3, and sn6 located also on C2. It seems that QTLs for plant height and related traits mainly involved with C2 linkage group.

Three QTLs were detected for 100-seed weight as sw2, sw1, and sw3 located on D2a, B1, and K linkage group, respectively. They accounted for only a small part of total genetic variation. That means most of the genetic variation might be due to polygenes. Nine QTLs were identified for plot yield. The most important ones were yd8 on N linkage group and yd4, yd3, yd2 on C2 linkage group, which accounted for the most part of total genetic variation. Seven QTLs were detected for number of pods per node. The most important ones were pn4, pn3, pn5, pn2 and pn6, all located on C2. It seems that C2 is also a major linkage group for yield and related traits.

Three QTLs were identified for protein content, pt1 and pt2 on B1 and pt3 on D1b+W. Four QTLs were detected for oil content, ol2, ol3, and ol4 on M and ol1 on B2. The detected QTLs of both traits accounted for only a relatively small part of their total genetic variation. That means most of the genetic variation for both traits might be accounted for by polygenes.

From the above results, linkage groups C2, B1, F1, M, N are more likely involved with the nine agronomic and quality traits.

3.3 Comparisons between the results from segregation analysis and QTL mapping

The comparisons are summarized in Table 4. The number of major genes from segregation analysis was two to three. It might be more than three, but the capacity of the procedure was limited to three or less since the developed models have a capacity of only up to three major genes. Assuming each QTL from mapping analysis could be equivalent to a major gene, the number of main major genes for each of the nine traits was about four. Therefore, the segregation analysis can detect most of the main major genes or QTLs and leaves the left minor effect QTLs as polygenes.

Segregation analysis provided an overall concept about the genetic system of a trait, including the major gene plus polygene model, all kinds of genetic effects of individual major genes (additive, dominance, epistasis), all kinds of genetic effects of polygene as a whole, heritability values of individual major genes and that of entire polygenes, while QTL mapping could locate the QTLs on linkage groups but could not give the epistasis effects for the present Cartographer version of Composite Interval Mapping (CIM).

The accuracy and precision of the results from segregation analysis depended on those of the experiment, but those from QTL mapping depended not only on it, but also on those of the linkage map.

Segregation analysis is simple, needs only precise data and a corresponding computer program, and can provide plant breeders with information about the genetic system of the major breeding target traits at only a little resource consumption, while QTL mapping needs some additional conditions in molecular technological equipment and financial resources. Therefore, segregation analysis can be used independently or as a preliminary analysis of the data set before QTL mapping. Both segregation analysis and QTL mapping can be used as a mutual supplement and check.

At present, the segregation analysis procedure has been developed for up to three major-gene plus polygene mixed inheritance models. As indicated in Table 4, the analytical procedure of four major-gene plus polygene mixed inheritance models is expected to be developed, but to do so is very complicated. Unfortunately, it is really tedious for developing analytical procedures of models with four major genes and polygenes since even a four Mendelian gene analysis alone is complicated enough.

3.4 Test and correction for coincidence of experimental sample with theoretical population

It was indicated above that the experiment sample was adjusted from 201 to 184 families. The results of segregation analysis, map construction and QTL mapping from the adjusted data set and unadjusted data set were quite different, which indicated that a test and correction for coincidence of the practical sample with the theoretical population was really necessary.

The coincidence test was designed as including three sets of χ2 tests: (1) equality test, to test whether the germplasm (markers) from both parents are equal, or p(P1) : p(P2) =1 : 1; (2) symmetry test, to test whether the families with p(P1) : p(P2)>1 : 1 and the families with p(P1) : p(P2)<1 : 1 are equal (in a symmetry distribution); (3) representation test, to test whether each family as well as the whole experiment sample is a random sample from the corresponding theoretical population. To do the representation test, two steps were taken. The first step was to test each family to see if it was an extra-biased family (χ220.05). The second step was to look at the rate of extra-biased family to see if it was less than the expected rate obtained from a procedure of sampling the simulated population, called Simulated Population Sampling Criteria (SPSC). If the rate was larger than the SPSC rate, the extra-biased families should be checked and deleted one by one until the adjusted sample fitting the SPSC rate. The simulation procedure was referred to Tanksley and Nelson (1996). The developed software for SPSC was named GenoSim. The markers were checked, tested, and adjusted in a similar way until the rate of extra-ordinary biased marker fitted the SPSC rate.

Table 5 shows the results of coincidence test of the NJRIKY population before and after adjustment. The equality test of NJRIKY population before adjustment showed a very large χ2C value (30.59), indicating not equal genetic contribution from both parents; and that after adjustment showed a very smallχC value (0.10), indicating a good fit after adjustment. The symmetry test of the unadjusted NJRIKY fitted basically a 1:1 ratio (χC=3.16, less than 3.86), but after adjustment the x tests showed a better fit to 1:1 ratio (χC =0.92).

The representation test of the unadjusted NJRIKY did not fit the SPSC requirements. According to the SPSC, the critical value of the rate of extra-biased marker should be less than 20.36% and that of family should be less than 24.47%. Unfortunately, those of the unadjusted NJRIKY were 21.05% and 29.35%, respectively. After the five most extra-biased markers and 17 most extra-biased families (according to their χC values) were deleted, the two rates of the adjusted NJRIKY became 19.28% and 24.45%, respectively, less than the critical values, which therefore fitted the SPSC requirements.

4 Discussion

Genetic information about breeding target traits, especially those of quantitative traits, is extremely important to plant breeders in designing their breeding procedures, choosing parents for crossing, progeny selection, gene pyramiding, etc. Segregation analysis can provide genetic information on the number of major genes, their kinds of genetic effects, heritability values as well as genetic information on all kinds of genetic effects and heritability value of whole polygenes without any extra requirements on lab conditions except a precise experiment. Therefore, it is a simple and useful tool in the plant breeder’s hands.

Quantitative trait locus mapping is an advanced tool for plant breeders if they have a molecular biological lab or have a molecular geneticist cooperating with them. Based on QTL mapping, marker-assisted selection can be used for effective and efficient selection of quantitative traits. It is suggested to conduct segregation analysis first before QTL mapping so that plant breeders can have a first impression on the genetic system of the involved trait. Both segregation analysis and QTL mapping can be used as a check for each other.

Both the segregation analysis and QTL mapping procedures need to be further improved and completed. Since the accuracy and precision are related with the experiment design, replicated test of lines or families are preferred for both procedures. For segregation analysis, the analytical procedure of the four major-gene plus polygene mixed inheritance model needs to be developed, linkage between more than two genes should be considered, and procedures of more segregating generations to resolve more estimates of genetic parameters should be studied. For QTL mapping, ghost problems and noises among QTLs need to be further resolved and a procedure for the estimation of epistasis effects should be considered.

References

[1]

Akaike H (1977). On entropy maximum principle. In: Krishnaiah P R, ed. Applications of Statistics. Amsterdam: North-Holland Publishing Company, 27-41

[2]

Basten C J, Weir B S, Zeng Z B (1999). QTL Cartographer. Version 1.13. Raleigh (NC): Department of Statistics, North Carolina State University

[3]

Choi K (1969). Estimators for the parameters of distributions. Ann Inst Statist Math, 21: 107-116

[4]

Cockerham C C (1954). An extension of the concept of partitioning hereditary variance for analysis of covariance among relatives when epistasis is present. Genetics, 39: 859-882

[5]

Comstock R C, Robinson H F (1948). The component of quantitative variance in populations. Biometrics, 4: 254-266

[6]

Cregan P B, Jarvik T, Bush A L, Shoemaker R C, Lark K G, Kahler A L, Kaya N, Van Toai T T, Lohnes D G, Chung J, Specht A L (1999). An integrated genetic linkage map of the soybean genome. Crop Sci, 39: 1464-1490

[7]

Dempster A P, Laird N M, Robin D B (1977). Maximum likelihood from incomplete data via the EM algorithm. J R Statist Soc B, 39: 1-38

[8]

Dick N P, Bowden D C (1973). Maximum likelihood estimation for mixtures of two normal distributions. Biometrics, 29: 781-790

[9]

Elkind Y, Cahaner A (1986). A mixed model for the effects of single gene, polygenes and their interaction on quantitative traits. 1. The model and experimental design. Theor Appl Genet, 72: 377-383

[10]

Elkind Y, Cahaner A (1990). A mixed model for the effects of single gene, polygenes and their interaction on quantitative traits. 2. The effects of the major gene and polygenes on tomato fruit softness. Heredity, 64: 205-213

[11]

Fisher R A (1918). The correlations between relatives on the supposition of Mendelian inheritance. Trans Roy Soc Edin, 52: 399-433

[12]

Gai J Y, Wang J (1998). Identification and estimation of a QTL model and its effects. Theor Appl Genet, 97(7): 1162-1168

[13]

Gai J Y, Zhang Y, Wang J (2000). A joint analysis of multiple generations for QTL models extended to mixed two major genes plus polygene. Acta Agronomica Sinica, 26(4): 385-391 (in Chinese)

[14]

Gai J Y, Zhang Y, Wang J (2003). Genetic System of Quantitative Traits in Plants. Beijing: Academic Press (in Chinese)

[15]

Gai J Y (2006). Segregation analysis on genetic system of quantitative traits in plants. Front Biol China, 1: 85-92

[16]

Hasselblad V (1966). Estimation of parameters for a mixture of normal distributions. Technometrics, 8: 431-444

[17]

Jiang C, Pan X, Gu M (1994). The use of mixture models to detect effects of major genes on quantitative characters in plant breeding experiment. Genetics, 136: 383-394

[18]

Kempthorne O (1957). An Introduction to Genetics Statistics. New York: Wiley

[19]

Lander E S, Bostein D R (1989). Mapping Mendelian factors underlying quantitative traits using RFLP linkage map. Genetics, 121: 185-189

[20]

Lander E S, Green P, Abrahamson J, Barlow A, Daly M J, Lincoln S E, Newburg L (1987). Mapmaker: An interactive computer package for constructing genetic linkage maps of experimental and natural populations. Genomics, 1: 174-181

[21]

Liu C, Rubin D R (1994). The ECME algorithm: A simple extension of ECM with faster monotone convergence. Biometrika, 81(4): 633-648

[22]

Mather K (1949). Biometrical Genetics. London: Methum

[23]

Mather K, Jinks J L (1989). Biometrical Genetics. 3rd ed. London: Chapman and Hall Tanksley S D, Nelson J C (1996). Advanced backcross QTL analysis: A method for the simultaneous discovery and transfer of valuable QTLs from unadapted germplasm into elite breeding lines. Theor Appl Genet, 92: 191-203

[24]

Wang J, Gai J Y (1997a). Identification of major-polygene mixed inheritance model of quantitative traits from F2 population. Acta Genetica Sinica, 24(3): 181-190 (in Chinese)

[25]

Wang J, Gai J Y (1997b). EM algorithm in the analysis of major gene and polygene mixed inheritance. Journal of Biomathmatics, 12(5): 540-548 (in Chinese)

[26]

Wang J, Gai J Y (1998). Identification of major gene and polygene mixed inheritance model of quantitative traits by using joint analysis of P1, F1, P2, F2 and F2:3. Acta Agronomica Sinica, 24(6): 651-659 (in Chinese)

[27]

Wang Y (2001). Establishment and adjustment of RIL population and its application to map construction, mapping genes resistant to SMV, and QTL analysis of agronomic and quality traits in soybeans. Dissertation for the Doctoral Degree. Nanjing: Nanjing Agricultural University (in Chinese)

[28]

Zeng Z B (1993). Theoretical basis of separation of multiple linked gene effects on mapping quantitative trait loci. Proc Natl Sci USA, 90: 10972-10976

[29]

Zeng Z B (1994). Precision mapping of quantitative trait loci. Genetics, 136: 1457-1468

[30]

Wu X L, He C Y, Wang Y J, Chen S Y, Gai J Y, Wang S C (2001). Construction and analysis of a genetic linkage map of soybean. Acta Genetica Sinica, 28(11): 1051-1061 (in Chinese)

[31]

Zhang Y, Gai J Y (2000a). Identification of mixed major-gene and polygene inheritance model of quantitative traits by using DH or RIL population. Acta Genetica Sinica, 27(7): 634-640 (in Chinese)

[32]

Zhang Y, Gai J Y (2000b). The IECM algorithm for estimation of component distribution parameters in segregating analysis of quantitative traits. Acta Agronomica Sinica, 26(6): 699-706 (in Chinese)

[33]

Zhang Y, Gai J Y, Qi C (2001a). The precision of segregating analysis of quantitative trait and its improving methods. Acta Agronomica Sinica, 27(6): 787-793 (in Chinese)

[34]

Zhang Y, Gai J Y, Wang Y (2001b). An expansion of joint segregation analysis of quantitative trait for using P1, P2 and DH or RIL populations. Hereditas (Beijing), 23(5): 467-470 (in Chinese)

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (310KB)

1383

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/