Heterogeneous length-of-stay modeling of post-acute care residents in the nursing home with competing discharge dispositions

Nazmus SAKIB , Xuxue SUN , Nan KONG , Chris MASTERSON , Hongdao MENG , Kelly SMITH , Mingyang LI

Front. Eng ›› 2022, Vol. 9 ›› Issue (4) : 577 -591.

PDF (2141KB)
Front. Eng ›› 2022, Vol. 9 ›› Issue (4) : 577 -591. DOI: 10.1007/s42524-022-0203-7
RESEARCH ARTICLE
RESEARCH ARTICLE

Heterogeneous length-of-stay modeling of post-acute care residents in the nursing home with competing discharge dispositions

Author information +
History +
PDF (2141KB)

Abstract

Post-acute care (PAC) residents in nursing homes (NHs) are recently hospitalized patients with medically complex diagnoses, ranging from severe orthopedic injuries to cardiovascular diseases. A major role of NHs is to maximize restoration of PAC residents during their NH stays with desirable discharge outcomes, such as higher community discharge likelihood and lower re/hospitalization risk. Accurate prediction of the PAC residents’ length-of-stay (LOS) with multiple discharge dispositions (e.g., community discharge and re/hospitalization) will allow NH management groups to stratify NH residents based on their individualized risk in realizing personalized and resident-centered NH care delivery. Due to the highly heterogeneous health conditions of PAC residents and their multiple types of correlated discharge dispositions, developing an accurate prediction model becomes challenging. Existing predictive analytics methods, such as distribution-/regression-based methods and machine learning methods, either fail to incorporate varied individual characteristics comprehensively or ignore multiple discharge dispositions. In this work, a data-driven predictive analytics approach is considered to jointly predict the individualized re/hospitalization risk and community discharge likelihood over time in the presence of varied residents’ characteristics. A sampling algorithm is further developed to generate accurate predictive samples for a heterogeneous population of PAC residents in an NH and facilitate facility-level performance evaluation. A real case study using large-scale NH data is provided to demonstrate the superior prediction performance of the proposed work at individual and facility levels through comprehensive comparison with a large number of existing prediction methods as benchmarks. The developed analytics tools will allow NH management groups to identify the most at-risk residents by providing them with more proactive and focused care to improve resident outcomes.

Graphical abstract

Keywords

nursing home / predictive analytics / individualized prediction / competing risks / health outcomes

Cite this article

Download citation ▾
Nazmus SAKIB, Xuxue SUN, Nan KONG, Chris MASTERSON, Hongdao MENG, Kelly SMITH, Mingyang LI. Heterogeneous length-of-stay modeling of post-acute care residents in the nursing home with competing discharge dispositions. Front. Eng, 2022, 9(4): 577-591 DOI:10.1007/s42524-022-0203-7

登录浏览全文

4963

注册一个新账户 忘记密码

1 Introduction

Nursing homes (NHs), or skilled nursing facilities, are mainly responsible for caring for the frail and vulnerable population of older adults with 24/7 personal care and assistance. Historically, NHs have been considered as a major healthcare setting for providing custodial care for long-term care (LTC) residents. In recent decades, while maintaining their conventional role as LTC providers, modern NHs have increasingly been responsible for caring for post-acute care (PAC) residents who are recently hospitalized patients and require extended rehabilitation and recovery after an acute care hospital stay. Recent studies reported that the percentage of residents (within NHs) admitted from the hospital increased from 67% in 2000 to 85% in 2015 (Fashaw et al., 2020). Several policy and market changes contributed to the shifts of the NH population composition. First, the rising trend toward rapid hospital discharge with the reduced hospital length-of-stay (LOS) generated a growing population of quicker-and-sicker patients and drove many NHs to expand their sub-acute care and rehabilitation services (Murad, 2011). Second, Medicare covers qualified PAC residents with a higher reimbursement rate than Medicaid in NHs, the latter of which is the primary payer for qualified LTC residents who mainly require custodial care. In 2017, the US average reimbursement rate of Medicaid was $206 per resident day, which is less than half the rate paid by Medicare, $503 per resident day (NIC, 2018). Thus, a strong financial incentive for NH providers to accept more PAC Medicare beneficiaries was observed. Finally, due to the impact of a series of laws and initiatives (Moon et al., 1997; Ginsburg and Supreme Court of the US, 1998; Eiken et al., 2014) in recent decades, there has been a considerable increase in home and community-based services for the overall LTC population to advocate age-in-place and divert or delay the expensive NH placement.

With the growing demand of PAC residents in NHs, the major goal of NH in caring PAC residents is to return them to the community successfully and efficiently with lower re/hospitalization rate. Thus, successful prediction on how long every individual PAC resident will stay in the NH and what will be his/her discharge disposition is greatly important for both NH administrators and individual residents (and their family). To NH administrators, successfully predicting LOS and the discharge disposition of each resident at individual level with individual risk factors identified will help administrators identify the most at-risk residents (e.g., residents with shorter LOS and higher re/hospitalization risk, and residents with longer LOS and lower community discharge likelihood), and more targeted care can be provided to improve the care quality of the overall facility. At the facility level, an accurate predictive model of LOS with incorporation of varied individual characteristics will further allow accurate evaluation of the NH utilization (measured in average LOS) of a heterogeneous population of PAC residents with varied individual characteristics. To an individual resident and his/her family, accurate prediction of how long he or she will stay in an NH will improve the communication between caregivers and care recipients. It will further assist the family of residents to prepare the informal care resources to accommodate the needs of residents to be discharged.

Accurate prediction of LOS and discharge disposition of PAC residents is challenging. First, PAC residents admitted in NHs are often medically complex with a high level of functional dependence and with a variety of clinical diagnoses, ranging from severe orthopedic injuries (e.g., hip and pelvic fractures) to cardiovascular diseases (e.g., stroke and myocardial infarction). It is unclear which individual characteristics will affect LOS and discharge disposition. Many of the existing LOS models in literature are distribution-based methods and considered various distributions, such as Exponential, Phase-type, Log-normal and Gamma distributions (Xie et al., 2005; Faddy et al., 2009), to characterize LOS of patients. They failed to take into account and quantify the influence of various possible individual characteristics for improving LOS prediction. Second, PAC residents have multiple possible discharge dispositions. They may be discharged to residential community for further recovery or transferred to hospital due to occurrence of critical events (e.g., infection and fall). Community discharge and re/hospitalization are mutually exclusive events, wherein whichever comes first will terminate the NH dwelling duration of a resident. Existing LOS modeling approaches, such as regression-based methods (Carey, 2002; Kelly et al., 2010; Kramer and Zimmerman, 2010) and machine learning methods (Hachesu et al., 2013; Pendharkar and Khurana, 2014; Turgeman et al., 2017), mainly focused on predicting time-to-discharge without differentiating the disposition difference and overlooked the complexity arising in the competing risks between the dispositions. Thus, there is a need to develop an advanced LOS modeling approach for PAC residents that incorporates both individual characteristics and considers multiple competing discharge dispositions.

After realizing superior individual prediction of LOS and discharge disposition with relevant factors identified, there is still need to evaluate the facility-level LOS and discharge outcome performance for a population of residents with varied individual characteristics. This will better inform the NH on resource preparedness and evaluate the facility-level quality outcome. Computer simulation, such as discrete event simulation and agent-based simulation, are often considered in healthcare system engineering by modeling and simulating each individual patient as a discrete event or agent, as well as further evaluating the system level performance (e.g., average LOS of a population of individuals at the facility level) (Hoot et al., 2008; Taboada et al., 2011; Wang et al., 2012). Among these simulation approaches, a key step is to develop sampling algorithm for simulating LOS observations for a population of individuals. Existing sampling algorithms are only applicable to simulating LOS observations characterized by distribution-based models, which ignored various resident characteristics influencing the LOS (Cappanera et al., 2014; Zhang et al., 2019), or regression models (Austin et al., 2002) which did not take the multiple discharge dispositions into account. There is a need to develop a sampling algorithm, which allows simulations of LOS observations for a population of individuals with multiple competing discharge dispositions, as well as varied individual characteristics.

To fill the aforementioned research gaps, we propose a heterogeneous LOS modeling approach for PAC residents by taking multiple discharge dispositions into account and incorporating the varied individual characteristics that may potentially influence each discharge disposition. At individual level, a semi-parametric hazard regression model is considered to characterize the heterogeneous LOS observations of PAC residents with improved prediction accuracy on individual re/hospitalization risk and community discharge likelihood. Various factors affecting re/hospitalization risk and/or community discharge likelihood are also identified with their influence quantified. At facility level, a simulation algorithm is further developed to realize the simulation of both LOS observations and multiple discharge dispositions for a heterogeneous population of PAC residents with accurate predicted samples obtained. A real case study using large-scale NH data is provided to demonstrate the superior prediction performance of the proposed work at both individual and facility levels.

The remaining of the paper is organized as follows. The next section first presents the proposed LOS modeling framework with an individual-level LOS model, which quantifies various influencing factors and considers competing discharge dispositions, as well as subsequently introduces the proposed sampling algorithm for facility-level performance evaluation. Then, a real-data case study is provided to demonstrate the efficacy of the proposed work. Conclusive remarks are provided in the end.

2 Methodology

2.1 Model formulation

Considering a cohort of N PAC residents in an NH facility, each PAC resident may be discharged to residential community for further recovery or transferred to hospital due to critical events (e.g., infections and falls). Community discharge and re/hospitalization are mutually exclusive events, whichever comes first will terminate the NH dwelling duration of a resident. Unlike many of existing LOS modeling works (Faddy et al., 2009; Kelly et al., 2010), which focused on modeling a single discharge disposition, the proposed model formulation aims to take into account the multiple and competing discharge dispositions of PAC residents, namely, re/hospitalization and community discharge. Moreover, unlike many of existing discharge outcome prediction models, such as hospital re/admission models, which focused on predicting the risk of critical outcomes at a fixed time period, e.g., 30-day or 90-day re/hospitalization risk (Incalzi et al., 1992), the proposed model will capture the re/hospitalization risk as well as community discharge likelihood of each individual PAC resident over time. The instantaneous discharge rate of the ith PAC resident with discharge disposition type s (community (C) or hospital (H)) can be characterized as

di,s(t|xi)=limΔt0Pr(tTi,st+Δt|Ti,mint,xi)Δt,i=1,...,n;s{C,H},

where Ti,min=min{Ti,C,Ti,H} is the LOS of resident i, and Ti,C and Ti,H are the latent time-to-discharge quantities with discharge disposition of community and hospital, respectively. xi is a ps-dimensional observed vector which contains varied individual covariates that may potentially influence di,s(t), such as individual demographics, clinical diagnoses, cognitive deficits, and physical functional performance. To associate the individual characteristics explicitly with di,s(t), the hazard regression is considered as follows

di,s(t)=dsb(t)exp(βsTxi),i=1,...,n;s{C,H},

where dsb(t) is the population average instantaneous discharge rate with disposition s in the absence of the influence of xi. βs is a ps-dimensional disposition-specific coefficient vector that quantifies the influence of xi on di,s(t).

A benefit of the model in Eq. (2) is that, for any time t, individual re/hospitalization risk until time t, i.e., Pr(Ti,Ht), and community discharge likelihood till time t, i.e., Pr(Ti,Ct), can be written as

(3a)Pr(Ti,Ht)=0tdi,H(τ|xi)exp[0τs{C,H}di,s(y|xi)dy]dτ,

(3b)Pr(Ti,Ct)=0tdi,C(τ|xi)exp[0τs{C,H}di,s(y|xi)dy]dτ.

In other words, given a specific time period until time t, the proposed model can always be converted into evaluating both the re/hospitalization risk and community discharge likelihood of resident i at a fixed time period based on Eqs. (3a) and (3b). It bypasses the conventional discharge outcomes modeling approach, which requires the discretization of discharge outcomes data in advance based on a pre-specified time period and then formulates a classification model for outcome prediction. It also allows the comparison of re/hospitalization risk and community discharge likelihood among different individuals with varied individual characteristics xi over time.

2.2 Model estimation

Given observed data D={ti,zi,s,xi}i=1n, where zi,s=1 if PAC resident i is discharged to disposition s; 0 otherwise, s{C,H}, with unknown parameters/functions θ=s{C,H}θs, where θs={dsb(t),βs}. The joint likelihood function L(θ|D) can be written as follows

L(θ|D)=i=1ns{C,H}{di,sb(ti)exp(βsTxi)exp[s{C,H}0tidi,sb(τ)exp(βsTxi)dτ]}zi,s.

Let index set As={i:zi,s=1}, s{C,H}. Then the joint likelihood function can be simplified as

L(θ|D)=iAcdi,Cb(ti)exp(βCTxi)iAHdi,Hb(ti)exp(βHTxi)i=1nexp[s{C,H}0tidi,sb(τ)exp(βsTxi)dτ].

When θs are mutually exclusive, L(θ|D) can be multiplicatively decomposed into L(θ|D)=s{C,H}Ls(θs|D), in which Ls(θs|D) can be expressed as

Ls(θs|D)=iAsdi,sb(ti)exp(βsTxi)i=1nexp[0tidi,sb(τ)exp(βsTxi)dτ],s{C,H}.

Thus, maximizing L(θ|D) can be equivalent to maximizing Ls(θs|D), separately. To maximize Ls(θs|D) by treating dsb(t) as an unknown function, we will first maximize the partial likelihood Ls(βs|D) written as (Cox, 1972)

Ls(βs|D)=iAsexp(βsTxi)jB(ti)exp(βsTxj),s{C,H},

where B(ti) is a set of residents who are still in the NH before ti. Maximum likelihood estimation will be considered by solving maxβslogLs(βs|D), which can be realized by a numerical optimization algorithm, such as the Newton-Raphson method (Gonzàlez et al., 2008).

Based on the estimated βs, we will estimate dsb(t) by maximizing the profile likelihood Ls(dsb|D) written as

Ls(dsb|D)iAsds,ibexp[ds,ibjB(ti)exp(βsTxj)],s{C,H},

where ds,ib=dsb(ti), iAs, and d^sb=0t{ti}iAs. The profile maximum likelihood estimator can be obtained as (Cole et al., 2014)

d^s(ti)b=1jB(ti)exp(βsTxj),s{C,H}.

2.3 Sampling algorithm

Based on the proposed model formulation and developed estimation algorithms, as illustrated in Sections 2.1 and 2.2, the re/hospitalization risk and community discharge likelihood over time of any individual i with individual characteristics xi can be predicted. To further investigate the service utilization of the sample of PAC residents in an NH facility and evaluate the facility-level performance for a heterogeneous population of PAC residents with varied individual characteristics, it becomes important to utilize computer simulation to mimic the patient flow of individual PAC residents and evaluate the system performance at the aggregate level. An essential basis of such computer simulation requires simulating the realization of LOS for each PAC resident. Existing simulation algorithm for LOS models mainly focused on simulating realization based on distribution-based models (El-Darzi et al., 1998; McGuire, 2007; New et al., 2015), such as Weibull distribution and Log-normal distribution. For the developed semi-parametric regression models with multiple competing discharge dispositions in Section 2.1, existing sampling algorithms are not applicable and there is a need to develop the corresponding simulation algorithm to facilitate generating predictive samples of LOS realizations given a heterogeneous population of PAC residents with varied individual characteristics xi. Ti,s of resident i with disposition s would be simulated based on the developed sampling algorithm summarized as follows.

3 Case study

3.1 Data description

To demonstrate the performance of the proposed model and sampling algorithm, the Minimum Data Set (MDS) 3.0 of a certified NH in Tampa Bay Area, Florida, is considered. The MDS 3.0 is a rich data set containing comprehensive assessment of clinical and functional status of all residents in a Medicare/Medicaid-certified NH during their stays. The dataset is mandated federally and is required by the Centers for Medicare and Medicaid Services (CMS) (CMS, 2017). Each resident is assessed upon admission, periodically during the stay, and upon discharge or in case of any event causing significant change in their functional status. Each assessment contains over 680 data covariates representing information on identification, admission and discharge dates, socio-demographics, financial details, various functional performance metrics, diseases and chronic conditions, medication and therapy information, discharge outcomes/dispositions of each resident, and facility administrative details.

The data collected includes stays of all residents admitted to the NH in a one-year period. For this case study, a sub-cohort of PAC residents are selected according to “short-stay” criteria defined by the CMS (CMS, 2013). The CMS differentiates between “short-stay” and “long-stay” residents by examining episodes of care of the resident. An episode consists of one or more consecutive stays with breaks no more than 30 days. If the cumulative LOS(s) in the NH is equal to or less than 100 days, the resident is labelled as a “short-stayer”, otherwise considered as a “long-stayer”. Most recent episode coinciding with the end-date of the consideration time period is used for the categorization. In the selected data, each data instance refers to the LOS observation of a short-stay resident with his/her individual characteristics.

A total of 710 LOS observations with complete information from 611 individual residents are selected for analysis. 98.02% of the LOS observations may be considered post-acute, which meant that the resident was either admitted directly from the hospital to the NH, and/or covered under Medicare Part A insurance plan (Holup et al., 2017). LOS observations with community discharge or re/hospitalization are included. LOS observations with other discharge dispositions, such as death and transfer to another facility, are excluded because they form a very small portion in the dataset with negligible influence on the overall model building. Left-truncated and/or right-censored observations are also neglected due to their negligible portions.

Tab.1 provides a summary of descriptive statistics of the selected cohort and stays, which includes socio-demographics (e.g., age, gender, ethnicity, marital status, and so on), care utilization details of the stay (e.g., LOS, admission origin, discharge disposition, payment source, and so on), and health characteristics (e.g., body measures, various functional performance scores, disease and chronic conditions, and so on). The calculated mean LOS of the short-stay residents was 20.33 days, with a majority of 97.3% being admitted from hospital, a majority of 79.9% being discharged to community and the rest 20.1% being readmitted to hospital. The discharge disposition, socio-demographics, and health characteristics form possible covariates influencing LOS of the resident and consequentially, their care service utilization.

3.2 Feature selection

Because the MDS dataset contains numerous elements of data, consider information directly relevant to the LOS. Guided by domain knowledge in NH care, a subset of data most related to care utilization, i.e., socio-demographics, functional performance scores, disease diagnoses and chronic conditions observed on admission, is considered. Although MDS data monitors the stay over time, only assessments upon admission are relevant to LOS prediction of an unknown cohort of residents in the facility, which are also known as baseline observations.

After summarizing the data (i.e., calculating LOS, deriving various functional performance scores and converting categorical covariates into dummy variables) and preprocessing (i.e., removing low-frequency covariates, removing one from each highly correlated pair, and checking for multicollinearity), 68 covariates relevant to LOS are selected. It is still a large number of covariates and incorporating all of them for developing a predictive model may yield model overfitting. To reduce the dimensionality of the input variables, a collection of 10 popular feature selection methods are applied to the dataset.

The feature selection methods include four linear feature selection methods, which are stepwise regression (e.g., Stepwise Akaike Information Criterion (AIC)), recursive feature elimination (RFE), simulated annealing (SA), and regularized linear regression (e.g., LASSO). Each feature selection method implements different algorithm to determine the best subset of covariates. For example, Stepwise AIC trains linear regression models by progressively adding covariates and evaluating model performance with AIC. By contrast, RFE ranks all covariates, progressively removes unimportant ones, trains, and reevaluates a linear model at each step. SA performs a random heuristic search for best combination in the covariate space. LASSO regression penalizes unimportant covariates to zero coefficient value with L1-norm regularization.

Moreover, six nonlinear feature selection methods are applied, including Filtering with Random Forest, RFE with Bagged Trees, RFE Random Forest, Genetic algorithm with Random Forest, SA with Random Forest, and Boruta. Filtering uses a preprocessing step to test strength of individual relationship between each covariate and the response variable before training a predictive model. RFE, Genetic and SA use a subset selection heuristic similar to that applied in training linear models. However, tree-based algorithms are trained instead of linear models at each step. In each case, the tree-based model with the highest accuracy evaluated identifies the best subset of covariates.

The most significant covariates influencing LOS are identified by each of the above feature selection methods. To keep most of the information without missing any relevant covariates, the union of all feature selection results, namely, 35 covariates in total, are considered for further predictive modeling. Tab.2 displays the final selected covariates, while Tab.3 shows the significant covariates identified by each feature selection algorithm.

3.3 Prediction performance comparison

To compare the prediction performance between the proposed model and alternative prediction methods in the literature, the dataset is split randomly into 90% training and 10% test datasets of observations with stratified sampling to preserve the proportion of discharge dispositions. The previous section of feature selection is conducted based on the training data set without touching the independent test dataset. The proposed model is compared to others by evaluating the C-index values of training and test datasets for each discharge dispositions, which are community and hospital (Harrell et al., 1996; D’Agostino and Nam, 2003). A C-index value beyond 0.5 indicates that the model is consistently satisfactory in predicting discharge risks, rather than making random predictions. A higher C-index value indicates the better predictive capability. The proposed model takes about 0.726 seconds by fitting all the LOS observations and making the predictions, which is efficient for real applications. Several semi-parametric and parametric survival models are compared under the competing risk framework, where the characteristic of competing discharge dispositions, i.e., community discharge and re/hospitalization, is incorporated. Semi-parametric models include the Cox regression with LASSO, or Elastic Net regularization, where the baseline hazard is non-parametric, while regularization attempts to avoid over-fitting. Parametric models comprise Weibull, Logistic, Log-normal, Log-logistic, and Exponential hazard regression, where the baseline hazards are parameterized based on the named distributions. Furthermore, several alternative regularized/unregularized linear and non-linear machine learning methods independent of competing risk are considered, including the linear regression, LASSO regression, Ridge regression, Tobit regression, Decision Tree, Boosting Tree and Random Forest. Thus, a total of 15 different models are evaluated for each of the discharge dispositions based on the same 35 covariates identified in the previous section. Tab.4 provides the list of models considered with their abbreviations and corresponding training and test C-index values. Fig.1 further visualizes the results.

As observed in Fig.1(a), for predicting LOS before community discharge, the proposed model outperforms other models with the training and test C-index values of 0.75 and 0.76, respectively. The regularized Cox regression models, i.e., LASSO and Elastic Net, yielded lower C-index values. Both values are still above 0.5, indicating that Cox baseline hazard is flexible in representing the LOS data with improved prediction performance. The reduced performance with regularization suggests that penalization of covariate coefficients in the Cox model is unnecessary, probably because an optimal set of covariates has been chosen based on the previous step of feature selection. Parametric survival models have poorer performance than the Cox model family with C-index values ranging from 0.21 to 0.25, which are much lower than 0.5, indicating that the models are consistently poor at prediction than a random guess. The Cox model family outperforms parametric survival models because its baseline hazard is non-parametric and is able to capture LOS data with more flexibility. Regularized/Unregularized linear models perform slightly better than survival models with C-index values ranging between 0.22 and 0.28. Conversely, tree-based methods perform better than linear models with Decision Tree and Random Forest producing C-index around 0.3. The improvement achieved from tree-based methods indicate a non-linear relationship between LOS and the covariates.

Similarly, from Fig.1(b), the proposed model outperforms other models for predicting LOS before transferring to hospital. The performance patterns are similar to those of predicting community discharge likelihood with a few differences. LASSO regularization in Cox regression performs poorly for predicting re/hospitalization risk, because a minimum penalty term was not found to be more effective for improving prediction than a random guess. Within linear models, LASSO regularization produces improved results, but were still inadequate for accurate prediction. Within the nonlinear models, the Decision Tree produces the best result. The test C-index value is generally lower than the training C-index value, because the test dataset is serving as an independent dataset untouched during the model development phase to evaluate the future prediction performance of the model.

3.4 Identification of risk/protective factors

Apart from producing superior prediction performance, the proposed competing risk Cox regression model identifies important risk/protective factors that influence a resident’s LOS. Tab.5 shows the significant covariates identified by the proposed model for predicting community discharge likelihood and re/hospitalization risk. The significance level α is set at 0.05.

The magnitude and sign of the coefficient values quantify the influence of the covariates on the probability of being discharged/transferred. A higher probability of being discharged/transferred implies a shorter LOS, and vice versa. For a resident being ultimately discharged to community, if s/he has higher ADL score, higher Mood score, or any of the disease diagnoses, her/his discharge likelihood decreases due to the negative sign and the LOS increases. Alternately, if a resident is ultimately transferred to hospital, having a higher ADL, or being diagnosed with anemia, uropathy or diabetes, her/his hospitalization risk will be increased due to her/his positive signs, implying a shorter LOS before being transferred to hospital. For both community and hospital dispositions, ADL is the most significant factor on influencing LOS, confirming the domain knowledge that residents with high dependency for daily living activities (e.g., eating, bathing, toileting, dressing, etc.) require greater NH care. ADL also has an opposite effect on dispositions, indicating the importance of incorporating multiple discharge dispositions. Such identified risk/protective factors are valuable for the healthcare provider to better identify and target on the most “at-risk” NH residents with more focused care and resources.

3.5 Marginal effects of covariates on community discharge likelihood and re/hospitalization risk

As opposed to a single value to quantify predicted LOS obtained from many existing predictive models, the proposed competing risk model further provides information on disposition-specific probability of being discharged/transferred. Such information can be visualized and compared by plotting the survival probability (i.e., 1-probability of being discharged/transferred) over time. Furthermore, because the proposed model is a proportional hazard model, marginal effects of survival curves can be visualized under different values of covariates. Based on such survival curves, the influence of each individual covariate on the probability of being discharged/transferred can be visualized. The probability of being discharged/transferred over time among different individuals with different individual characteristics can be visualized and compared as well. Fig.2 and Fig.3 provide examples of marginal effects of various baseline ADL values on the LOS of an example resident over time for specific discharge dispositions. All variables other than the ADL score is fixed at the mean level of the observed sample.

As observed in Fig.2, a community discharge resident with a higher baseline ADL score (more physical functional dependency) has a curve (red) higher than the average (blue) resident. In such a case, the probability to remain in the facility is higher than average at any point in time, which further increases the LOS. By contrast, a resident with a lower baseline ADL score (more functionally independent) has a survival curve (green) lower than average and tends to have a shorter LOS. Fig.3 shows the survival curves for a re/hospitalized resident. ADL score has an opposite effect on the curves, reaffirming the competing risk assumption. A higher baseline ADL score results in a shorter stay, while a lower one increases the stay. Because the curves evolve differently over time, it is possible to assess the probability of being discharged/transferred at any time point during the resident’s stay.

Similarly, a resident’s disposition outcome may also be examined over time for various combinations of varied individual characteristics. For instance, in Fig.4, a hypothetical resident with better health conditions (e.g., lower ADL and Mood/Depression scores, and less number of diseases diagnosed at baseline) tends to have a shorter LOS with community discharge (red curve) but a longer LOS with re/hospitalization (blue). By contrast, in Fig.5, a hypothetical resident with worse health conditions (e.g., higher ADL and Mood/Depression scores, and more diseases diagnosed at baseline) tends to remain in the facility for a long time for recovery before being discharged to the community (red), and a relatively short stay if being transferred to hospital (blue).

3.6 Performance of the proposed sampling algorithm for generating LOS predictive samples

Survival models are different in predicting response compared with conventional machine learning models. The former models characterize the predictive distribution by providing a probabilistic prediction, e.g., the predicted probability of being discharged/transferred over time for each of the resident, while the latter models often provide a single point prediction quantity, e.g., the predicted LOS value. To simulate the residents’ flow in a typical NH facility using computer simulation, an important step is to simulate LOS predictive samples accurately. Our proposed sampling algorithm is capable of generating predictive LOS samples accurately and simultaneously providing the corresponding discharge dispositions as well. Sampling performance may be evaluated by comparing survival plots of observed LOS samples and simulated LOS samples. The survival curves are calculated by the Kaplan-Meier curves, which provide the disposition specific observed and simulated survival curves of a sample. In Fig.6, survival curves are compared for each disposition and the full dataset. The sampling algorithm is very effective in generating predictive samples of LOS, because the simulated (light-colored) curves are very close to their observed ones (dark-colored). The green curves (light and dark) are slightly lower than the full dataset (black and grey), indicating that residents transferred to hospital tend to have shorter LOSs than the average, as opposed to blue curves, indicating that residents discharged to the community have slightly longer LOSs than the average. Fig.7 shows the performance of the sampling algorithm in predicting discharge dispositions with 100% classification accuracy.

The accuracy of the proposed algorithm is further compared with the simulation results based on several alternative models, such as survival models and machine learning models. Simulation is performed using the Cox Weibull regression and Log-normal regression, where the baseline hazard functions are fitted with Weibull and Log-normal distributions, respectively, and under the competing risk framework. As shown in the survival curves of Fig.8–Fig.9, the Cox Log-normal regression performs poorly in generating samples as compared to the observed data, while Cox Weibull performs better due to its increased flexibility in fitting the LOS data. The prediction performance of the proposed work is also compared with popular linear and non-linear machine learning models, such as linear regression, L1-regularized linear regression (LASSO), Decision Tree, and Random Forest. Overall, the proposed work generates the most accurate LOS samples, when compared with other methods due to the incorporation of non-parametric baseline hazard as well as the proposed simulation algorithm.

3.7 Simulation-based facility-level performance evaluation

The proposed sampling algorithm is not only able to predict the probability of being discharged/transferred to a specific disposition for a specific individual resident, it can be further used to generate predictive LOS samples of a heterogeneous population of NH residents with varied individual characteristics. This will allow the users to evaluate the system level performance of an NH facility, given a census composition scenario of a heterogeneous population of NH residents.

To explain the functionality of the proposed work, eight different cohorts of residents are defined in an increasing order of acuity. Simulation data is generated for each cohort using various segmentation and distributions of the significant covariates identified by Cox regression in Tab.5. The setting for each cohort is provided in Tab.6. For each acuity scenario, 1000 random admission observations are generated in the following process. First, the ADL score is randomly generated with a truncated normal distribution with mean fixed at desired level corresponding to acuity and upper and lower limits set with range of observed data. Second, the Mood score is randomly generated similarly with a truncated normal distribution. Lastly, for each observation, any of the 13 diseases are randomly selected and binary value (0 or 1) is generated through a Bernoulli distribution, where the rate of success is sampled from a Beta prior distribution with shape parameters set according to the desired acuity level. The sampling process is repeated 20 times for each acuity scenario to account for the stochastic simulation uncertainty. Mean LOS values and disposition-specific discharge rates are estimated with standard errors reported as well. The results obtained are summarized in Tab.7 and Fig.11. The computational cost of generating 1000 predictive LOS samples is around 0.7 seconds under various acuity scenarios, which is quite time efficient for real application.

The proposed sampling algorithm generates the LOS and predicts the discharge disposition for each simulated resident in each cohort. As seen in Fig.11(a), as acuity increases, the mean LOS increases in the samples. More residents are transferred to hospital. Residents being discharged to community have increasingly longer stays, which further increased the mean LOS of the samples across acuity scenarios. The whiskers represent dispersion among samples in each acuity and are mostly non-overlapping, indicating a significant difference of LOSs between acuity scenarios. As shown in Fig.11(b), increasing acuity has a sharper decreasing effect on community discharge rates over time, while hospital discharge rates increase at a more gradual rate. The phenomenon occurs because a larger number of diseases influenced the community discharge likelihood than re/hospitalization risk. The results further emphasized the competing nature of two dispositions. As the resident acuity increases, LOS tends to increase for residents discharged to community, while LOS tends to decrease for residents transferred to hospital. Depending on the proportion of the residents finally discharged to community or transferred to hospital, the mean LOS varies accordingly. The sampling algorithm successfully mimics the competing phenomenon of two dispositions. Furthermore, the algorithm can also provide disposition-specific probability of being discharged over time in a continuous time scale for the collective cohort and individual resident, allowing for a greater understanding of facility utilization and resident outcome (i.e., re/hospitalization risk) over the course of the stay. Fig.11(b) shows several discharge rate curves at discrete times of 30-, 45-, and 60-days.

4 Conclusions

In this paper, a heterogeneous LOS modeling approach was proposed by considering multiple discharge dispositions and incorporating varied individual characteristics for NH PAC residents. At individual level, several popular predictive models, such as machine learning and survival models, are considered to predict LOS and their performances are compared with the proposed model. The proposed model outperforms other models by jointly predicting the re/hospitalization risk and community discharge likelihood over time. It is also capable of identifying disposition-specific risk/protective factors for influencing the disposition-specific probability of being discharged/transferred over time. Furthermore, to enable the facility-level performance evaluation of the NH, a novel simulation algorithm was proposed for generating LOS predictive samples of residents by incorporating varied individual characteristics and competing discharge dispositions. The proposed algorithm is capable of accurately generating samples for a heterogeneous population of NH residents with varied individual characteristics, which allows the evaluation of facility performance measures, such as facility-level re/hospitalization rate and mean LOS. A real case study based on a large-scale de-identified data from an NH in Tampa Bay area was considered to illustrate the proposed work and demonstrate its superior prediction performance. The proposed approach would allow NH administrators and health practitioners to identify the most at-risk residents and design more targeted care delivery, facilitate optimal resource allocation strategies at the facility level for achieving greater quality outcomes at reduced costs, and further improve communication of prognostic information among everyone involved in the care delivery process.

References

[1]

Austin, P C Rothwell, D M Tu, J V (2002). A comparison of statistical modeling strategies for analyzing length of stay after CABG surgery. Health Services and Outcomes Research Methodology, 3( 2): 107–133

[2]

Cappanera, P Visintin, F Banditori, C (2014). Comparing resource balancing criteria in master surgical scheduling: A combined optimisation-simulation approach. International Journal of Production Economics, 158: 179–196

[3]

Carey, K (2002). Hospital length of stay and cost: A multilevel modeling analysis. Health Services and Outcomes Research Methodology, 3( 1): 41–56

[4]

Centersfor MedicareMedicaidServices (CMS) (2013). MDS 3.0 Quality Measures: User’s Manual. Research Triangle Park, NC: RTI International

[5]

Centers for Medicare and Medicaid Services (CMS) (2017). Long-Term Care Facility Resident Assessment Instrument 3.0 User’s Manual

[6]

Cole, S R Chu, H Greenland, S (2014). Maximum likelihood, profile likelihood, and penalized likelihood: A primer. American Journal of Epidemiology, 179( 2): 252–260

[7]

Cox, D R (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B, Methodological, 34( 2): 187–202

[8]

D’Agostino, R B Nam, B H (2003). Evaluation of the performance of survival analysis models: Discrimination and calibration measures. Handbook of Statistics, 23: 1–25

[9]

EikenSSredlKGoldLKastenJBurwellBSaucierP (2014). Medicaid Expenditures for Long-Term Services and Supports in FFY 2012. Bethesda, MD: Truven Health Analytics

[10]

El-Darzi, E Vasilakis, C Chaussalet, T Millard, P H (1998). A simulation modelling approach to evaluating length of stay, occupancy, emptiness and bed blocking in a hospital geriatric department. Health Care Management Science, 1( 2): 143–149

[11]

Faddy, M Graves, N Pettitt, A (2009). Modeling length of stay in hospital and other right skewed data: Comparison of phase-type, gamma and log-normal distributions. Value in Health, 12( 2): 309–314

[12]

Fashaw, S A Thomas, K S McCreedy, E Mor, V (2020). Thirty-year trends in nursing home composition and quality since the passage of the Omnibus Reconciliation Act. Journal of the American Medical Directors Association, 21( 2): 233–239

[13]

GinsburgR BSupreme Court of theUS (1998). US Reports: Olmstead v. L.C., 527 US 581

[14]

GonzàlezDPiñaMTorresL (2008). Estimation of parameters in Cox’s proportional hazard model: Comparisons between Evolutionary Algorithms and the Newton-Raphson Approach. In: Mexican International Conference on Artificial Intelligence. Berlin, Heidelberg: Springer, 513–523

[15]

Hachesu, P R Ahmadi, M Alizadeh, S Sadoughi, F (2013). Use of data mining techniques to determine and predict length of stay of cardiac patients. Healthcare Informatics Research, 19( 2): 121–129

[16]

Harrell Jr, F E Lee, K L Mark, D B (1996). Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine, 15( 4): 361–387

[17]

Holup, A A Hyer, K Meng, H Volicer, L (2017). Profile of nursing home residents admitted directly from home. Journal of the American Medical Directors Association, 18( 2): 131–137

[18]

Hoot, N R LeBlanc, L J Jones, I Levin, S R Zhou, C Gadd, C S Aronsky, D (2008). Forecasting emergency department crowding: A discrete event simulation. Annals of Emergency Medicine, 52( 2): 116–125

[19]

Incalzi, R A Gemma, A Capparella, O Terranova, L Porcedda, P Tresalti, E Carbonin, P (1992). Predicting mortality and length of stay of geriatric patients in an acute care general hospital. Journal of Gerontology, 47( 2): M35–M39

[20]

Kelly, A Conell-Price, J Covinsky, K Cenzer, I S Chang, A Boscardin, W J Smith, A K (2010). Length of stay for older adults residing in nursing homes at the end of life. Journal of the American Geriatrics Society, 58( 9): 1701–1706

[21]

Kramer, A A Zimmerman, J E (2010). A predictive model for the early identification of patients at risk for a prolonged intensive care unit length of stay. BMC Medical Informatics and Decision Making, 10( 1): 27

[22]

McGuire, L C Ford, E S Okoro, C A (2007). Natural disasters and older US adults with disabilities: Implications for evacuation. Disasters, 31( 1): 49–56

[23]

MoonMGageBEvansA (1997). An examination of key Medicare provisions in the Balanced Budget Act of 1997. New York: The Commonwealth Fund

[24]

Murad, Y (2011). Skilled nursing facilities and post-acute care. Journal of Gerontology & Geriatric Research, 1( 101): 1–4

[25]

NationalInvestment Center for Seniors Housing & Care (NIC) (2018). Skilled Nursing Data Report: Key Occupancy & Revenue Trends. 4Q2017

[26]

New, P W Stockman, K Cameron, P A Olver, J H Stoelwinder, J U (2015). Computer simulation of improvements in hospital length of stay for rehabilitation patients. Journal of Rehabilitation Medicine, 47( 5): 403–411

[27]

Pendharkar, P C Khurana, H (2014). Machine learning techniques for predicting hospital length of stay in Pennsylvania federal and specialty hospitals. International Journal of Computer Science & Applications, 11( 3): 45–56

[28]

Taboada, M Cabrera, E Iglesias, M L Epelde, F Luque, E (2011). An agent-based decision support system for hospitals emergency departments. Procedia Computer Science, 4: 1870–1879

[29]

Turgeman, L May, J H Sciulli, R (2017). Insights from a machine learning model for predicting the hospital Length of Stay (LOS) at the time of admission. Expert Systems with Applications, 78: 376–385

[30]

Wang, J Li, J Tussey, K Ross, K (2012). Reducing length of stay in emergency department: A simulation study at a community hospital. IEEE Transactions on Systems, Man, and Cybernetics: Part A, Systems and Humans, 42( 6): 1314–1322

[31]

Xie, H Chaussalet, T J Millard, P H (2005). A continuous time Markov model for the length of stay of elderly people in institutional long-term care. Journal of the Royal Statistical Society, Series A (Statistics in Society), 168( 1): 51–61

[32]

Zhang, X Barnes, S Golden, B Myers, M Smith, P (2019). Lognormal-based mixture models for robust fitting of hospital length of stay distributions. Operations Research for Health Care, 22: 100184

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (2141KB)

4609

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/