1. Center for Applied Statistics, Renmin University of China, Beijing 100872, China
2. School of Mathematics and Statistics, Tianshui Normal University, Tianshui 741001, China
3. Dependment of Mathematics, Southeast University, Nanjing 210096, China
pole1999@163.com
Show less
History+
Received
Accepted
Published
2023-06-15
Issue Date
Revised Date
2023-11-16
PDF
(594KB)
Abstract
Generalized exponential distribution is a class of important distribution in lifedata analysis, especially in some skewed lifedata. The Parameter estimation problem for generalized exponential distribution model with grouped and right-censored data is considered. The maximum likelihood estimators are obtained using the EM algorithm. Some simulations are carried out to illustrate that the proposed algorithm is effective for the model. Finally, a set of medicine data is analyzed by generalized exponential distribution.
Gupta and Kundu (1999) [2] proposed the two-parameter generalized exponential distribution as an alternative to the Gamma, Weibull and log-normal distributions, and investigated some of the different properties between them. The two-parameter generalized exponential distribution has important applications in survival analysis, product life analysis and reliability engineering, especially for skewed failure life data. Many results have been discussed for generalized exponential distributions for general sample data [1, 3−5, 10−12]. However, in survival analysis and lifetime data studies, the complex situation of grouping and right-censored sample data [7−9] often arises. In this paper, we use EM algorithms to consider parameter estimation for two-parameter generalized exponential distributions with such sample data.
The probability density function, survival function and hazard rate function of the two-parameter generalized exponential distribution model are
where are the shape and scale parameters of the model, respectively. When the shape parameter , the model is a general exponential distribution model. The hazard function of the model (1) does not depend on , but only on . And when , is increasing; when , is decreasing; when , is a constant [3]. Section 2 of this paper briefly describes the maximum likelihood estimation of the parameters. Section 3 considers the estimation of the parameters of the model using the EM algorithm. Section 4 presents the numerical simulations. Section 5 analyzes a set of real data using a generalized exponential distribution model.
2 The log-likelihood function of model (1) for the grouped data and right-censored case
Suppose the lifetime of a product follows the two-parameter generalized exponential distribution (1), its distribution function is
Now take products for the lifetime test, and get the data as follows: is divided into intervals, and the first interval is denoted as , where . Then use to denote the number of failed products falling into , and use to denote the number of products censored at . Then we have .
The likelihood function is
The log-likelihood function is
The log-likelihood function takes the partial derivatives of the parameters, and let
Solving the above system of equations yields a maximum likelihood estimate for the parameters and . However, due to the complexity of the above system of non-linear equations, it is not possible to obtain explicit expressions for the parameter estimates, and even using numerical solutions (e.g., Newton’s method) to find the maximum likelihood estimates is quite complicated. The following uses EM algorithm to obtain maximum likelihood estimation of parameters more efficiently.
3 Parameter estimation methods
3.1 Introduction to the EM algorithm
The EM algorithm, proposed by Dempster et al. in 1977, is an iterative algorithm for solving the MLE of a model with missing data, mainly using observed data. Assuming that the complete data consists of the observed data and the missing data , suppose be the joint probability density of the complete data , and be the conditional density of the missing data given the observed data , where is the evaluated parameter, and the MLE of is obtained by finding the maximum value of the log-likelihood of the observed data . To maximize , consider the log-likelihood given by the complete data. The algorithm consists of two steps: Step and Step .
Step : Given an initial value of , assume that the estimate of obtained after the th iteration of the algorithm is , and define the expectation of the log-likelihood of the complete data, the so-called function, as
Step : Maximize to give as an update to .
Repeat Steps and so that the estimate gradually approaches the true parameter, i.e., is less than some small value , which proves that maximizing is equivalent to maximizing the function. In practice, several different should be taken for comparison to prevent the algorithm from falling into a local maximum.
3.2 The steps of parameter estimation
Suppose that the lifetimes of products are independently and identically distributed in the two-parameter generalized exponential distribution (1). The products are subjected to a lifetime test and fall into the interval or are censored at . We can only observe , the number of in the interval and , the number of censored at , where ; . The lifetime of the product is , but is unobservable, called missing data in the algorithm, and the observable is , which together form the complete data . To apply the algorithm, we then introduce random variables , which denote the product lifetimes falling in the interval and are censored at , respectively. In the following, we obtain the maximum likelihood estimates of the estimated parameters based on the Steps and in the EM algorithm.
Since the information of contains all the information of the observation , we have . The log-likelihood of the complete data from the probability density function of the generalized exponential distribution (1) is
Given the initial values of the parameters and , the steps of algorithm are:
Step : Given the estimates at step of the parameters, then the function at the step is
In the above function, the conditional probability density functions of and are denoted as
and
Thus we get
Step : Maximize the function to obtain the estimators for the -step of the parameters , i.e., the point of the extreme value of obtained by deriving for parameters , respectively.
Derivation of gives respectively:
Let . We obtain
The solved above is the desired , thus completing one iteration , repeating equations (5), (6) above until converge.
4 Simulation study
Suppose are independent identically distributed samples from the generalized exponential distribution model (1), and we consider the simulation example used in [1]: the true parameters are assumed to be , and the sample data are divided into i.e., 10 groups, taking , , , with an error precision as 0.001. And for , assuming that the probability of a product being censored at is , and that all products are not invalid at the end of are censored. Consider the estimated effect of each replicate trial times for sample sizes , respectively. If the estimate obtained on the th trial is , then the final estimate and the estimated mean square error are
where denotes the th component of , and the corresponding results are estimated in Tab.1 and Tab.2. All the calculations in this paper have been done using Matlab2009b.
From Tab.1 and Tab.2, we can see that the algorithm has good estimation effects for the generalized exponential distribution with grouped data and right-censored data. And the overall estimation effects become better as the sample size increases and the number of simulation repetitions increases.
5 Analysis of a set of clinical data
This section analyzes a set of real data to illustrate the practical implications of the methodology of this paper. Angina pectoris is a clinical syndrome caused by acute, transient ischemia and hypoxia of the myocardium due to inadequate coronary blood supply, mostly in men. The following data on 2418 male patients with angina pectoris are taken from the work of Parker et al. [6]. The survival time was calculated in years from the time of diagnosis, with 16 intervals, the first 15 intervals being one year long, i.e., . The number of deaths and cases lost to follow-up in each interval is shown in Tab.3.
The above data were estimated in [6] using a non-parametric product-limit estimation method for the survival and hazard rate functions. It was concluded that mortality was highest in the first year after diagnosis and remained essentially constant from the end of the first year to the beginning of the 10th year, fluctuating between 0.09 and 0.12, the hazard rate function generally higher after 10 years. Thus, regardless of age, sex or race, patients who survive beyond one year have a better prognosis than those who are newly diagnosed, with a 5-year survival rate of 0.5193. In this paper, we consider using the generalized exponential distribution model (1) to analyze this dataset, and use our EM algorithm to estimate the shape parameter and scale parameter as and , respectively, then the survival function and hazard rate function are
Since the shape parameter is estimated to be , the hazard rate function is decreasing. Fig.1 shows that the hazard rate function is monotonically decreasing, with relatively large values in the first two years, 0.1501 in the 1st year, 0.1341 in the 2nd year; slowly decreasing from the 3rd year to the 10th year; very slowly decreasing from the 10th year to the 30th year, remaining at about 0.106−0.115. According to the fitted life expectancy models (7) and (8), the average life expectancy is 7.9264 (years) and the 5-year survival rate is 0.4953 (which is close to the analysis in [6]). Another important life indicator is the average remaining life, which at time is given by
This gives an average remaining life expectancy of 8.4710 (years) at 1 year, 8.9964 (years) at 5 years, and 9.2205 (years) at 10 years. Again, it can be concluded that patients who have been alive for several years have a longer average remaining life expectancy than those who have just been diagnosed, regardless of age, sex, or race. Of course, with the continuous improvements in modern medical technology and the gradual improvement in the effectiveness of the drugs used to treat angina, the survival rate and the average remaining life expectancy of patients with angina have greatly improved.
Chen D G, Lio Y L. Parameter estimations for generalized exponential distribution under progressive type-I interval censoring. Comput Stat Data Anal2010; 54(6): 1581–1591
[2]
Gupta R D, Kundu D. Generalized exponential distributions. Austr New Zealand J Statist1999; 41(2): 173–188
[3]
Gupta R D, Kundu D. Generalized exponential distribution: existing results and some recent developments. J Statist Plann Inference2007; 137(11): 3537–3547
[4]
Gupta R D, Kundu D. Generalized exponential distribution: Bayesian estimations. Comput Statist Data Anal2008; 52(4): 1873–1883
[5]
Kundu D, Pradhan B. Estimating the parameters of the generalized exponential distribution in presence of hybrid censoring. Commun Stat Theory Methods2009; 38(12): 2030–2041
[6]
LeeE TWangJ W. Statistical Methods for Survival Data Analysis, 3rd ed. New York: John Wiley & Sons, 2003
[7]
Liu L P. Estimation of MLE for Weibull distribution with grouped and censored data. Chinese Journal of Applied Probability and Statistics2001; 17(2): 133–138
[8]
Liu X, Chen H, Fei H L. Estimation of the parameters in the lognormal distribution with grouped and right-censored data. Chinese Journal of Applied Probability and Statistics2008; 24(4): 371–380
[9]
Pettitt A N. Re-weighted least squares estimation with censored and grouped data: an application of the EM algorithm. Royal Statistical Society1985; 47(2): 253–260
[10]
Raqab M Z. Inferences for generalized exponential distribution based on record statistics. J Statist Plann Inference2002; 104(2): 339–350
[11]
Raqab M Z, Madi M T. Bayesian inference for the generalized exponential distribution. J Statist Comput Simul2005; 75(10): 841–852
[12]
Sarhan A M. Analysis of incomplete, censored data in competing risks models with generalized exponential distribution. IEEE Trans Reliability2007; 56(1): 132–138
RIGHTS & PERMISSIONS
Higher Education Press 2023
AI Summary 中Eng×
Note: Please be aware that the following content is generated by artificial intelligence. This website is not responsible for any consequences arising from the use of this content.