1. Ingram School of Engineering, Texas State University, San Marcos, TX 78666, USA
2. School of Mechanical Engineering, Northwestern Polytechnical University, Xi’an 710072, China; Key Laboratory of Industrial Engineering and Intelligent Manufacturing (Ministry of Industry and Information Technology), Xi’an 710072, China
wenjin.zhu@nwpu.edu.cn
Show less
History+
Received
Accepted
Published
2023-12-22
2024-03-13
2024-09-15
Issue Date
Revised Date
2024-04-28
2024-02-26
PDF
(1343KB)
Abstract
Reliability-redundancy allocation, preventive maintenance, and spare parts logistics are crucial for achieving system reliability and availability goal. Existing methods often concentrate on specific scopes of the system’s lifetime. This paper proposes a joint redundancy-maintenance-inventory allocation model that simultaneously optimizes redundant component, replacement time, spares stocking, and repair capacity. Under reliability and availability criteria, our objective is to minimize the system’s lifetime cost, including design, manufacturing, and operational phases. We develop a unified system availability model based on ten performance drivers, serving as the foundation for the establishment of the lifetime-based resource allocation model. Superimposed renewal theory is employed to estimate spare part demand from proactive and corrective replacements. A bisection algorithm, enhanced by neighborhood exploration, solves the complex mixed-integer, nonlinear optimization problem. The numerical experiments show that component redundancy is preferred and necessary if one of the following situations occurs: extremely high system availability is required, the fleet size is small, the system reliability is immature, the inventory holding is too costly, or the hands-on replacement time is prolonged. The joint allocation model also reveals that there exists no monotonic relation between spares stocking level and system availability.
Tongdan JIN, Shubin SI, Wenjin ZHU.
Allocating redundancy, maintenance and spare parts for minimizing system cost under decentralized repairs.
Front. Eng, 2024, 11(3): 377-395 DOI:10.1007/s42524-024-0145-3
In the integrated product-service paradigm, many original equipment manufacturers (OEMs) strive for delivering high-reliability products along with responsive repair and maintenance services. However, achieving these goals at the same time can be challenging due to resource, time, and cost constraints. Therefore, it is important to develop a holistic framework that can effectively coordinate reliability design, maintenance policy, repair capacity, and spares provisioning throughout the entire product lifetime.
Various models have been proposed to achieve high system reliability at a low cost, including reliability-redundancy allocation (RRA), preventive (or predictive) (PM), and spare parts logistics (SPL). These models often focus on specific phases of the product lifetime. RRA primarily addresses product design and manufacturing, while PM and SPL are concerned with the aftermarket period. However, since these models are often implemented independently, they often lead to suboptimal solutions. To gain a better understanding of RRA, references such as Coit and Zio (2019) and Si et al. (2020) can be consulted. For comprehensive reviews on PM, including condition-based maintenance (CBM), refer to Alaswad and Xiang (2017) and Hu et al. (2022). Basten and van Houtum (2014) and Zhang et al. (2021) provide insights into SPL models. Recently, there has been a growing research stream on the coordination of RRA and PM, RRA and SPL, and PM and SPL, which will be discussed in Section 2. Despite the aforementioned studies, there is a lack of a holistic framework in which RRA, PM (or CBM), and SPL are jointly optimized over the product lifetime (Jin, 2023). A holistic approach can guide firms in maintaining market competitiveness and achieving a win-win result between the OEM and customers. With emerging technologies such as digital twin and Internet of Things, the integration of all product phases, including design, manufacturing, and aftermarket, is the basis for minimizing product lifetime cost without compromising reliability and availability performance (Wang, 2021).
This paper aims to fill this gap by proposing a joint RRA, PM, and SPL optimization model to minimize costs across system design, manufacturing, and aftermarket. To that end, we present a mixed-integer, redundancy-maintenance-inventory allocation model that optimizes redundancy level, replacement time, spares inventory, and repair and renewing capacity. The goal is to minimize annualized system cost while satisfying reliability and availability criteria. The proposed model is applied in the semiconductor equipment industry, where zero system downtime is desirable for high production throughput. Our study shows that the OEM opts to adopt a redundancy strategy if: 1) extraordinary system availability, such as 0.999, is required; 2) the system fleet size is small; 3) parts holding costs are extremely high; 4) system reliability is immature; or 5) a prolonged replacement time occurs. The joint allocation model also reveals that the correlation between spares inventory and system availability is not necessarily monotonic.
The remainder of the article is organized as follows: Section 2 reviews the related literature. Section 3 characterizes Erlang-C repair and renewal queues under superimposed renewal processes. Section 4 presents a unified system availability model incorporating redundancy, maintenance, spares, and repair capacity. In Section 5, a joint redundancy-maintenance-inventory allocation model is formulated, and the bisection search algorithm is also elaborated. In Sections 6 and 7, the proposed model is demonstrated on semiconductor test equipment comprised of single and multiple redundant subsystems, respectively. Section 8 concludes the paper.
2 Literature review
This section reviews the works pertaining to three research streams: 1) joint allocation of RRA and SPL; 2) joint decision on RRA and PM; and 3) joint optimization of PM and SPL.
2.1 Joint allocation of reliability-redundancy and spares inventory
Much effort has been dedicated to managing spare parts inventory through the consideration of component reliability and installed base data (Louit et al., 2011; Dekker et al., 2013; Selviaridis and Wynstra, 2015). For example, Jin and Tian (2012) treat component reliability as an endogenous variable and combine it with an adaptive inventory policy to minimize the overall cost of the growing installed base throughout its lifecycle. This model has been further expanded by Jin et al. (2017) to integrate redundancy, along with reliability and spares stocking, in order to minimize system lifetime cost. Selçuk and Agrali (2013) study the trade-off between reliability investment and parts base-stock level to minimize the cost of a multi-item system fleet. Öner et al. (2013) propose an on-site, cold-standby redundancy strategy to mitigate equipment downtime, utilizing performance measures such as parts availability, expected backorders, and inventory cost. In our paper, we aim to optimize maintenance time and repair capacity, along with component redundancy and spares inventory for attaining the system availability goal.
Xie et al. (2014) present a continuous-time Markov chain model to maximize the system availability by jointly optimizing active redundancy and the base-stock level. Sleptchenko and van der Heijden (2016) jointly allocate redundancy and spare parts for a -out-of- system with different standby modes and part types. They find that high redundancy levels are only beneficial when components are relatively inexpensive and part replacement times are long. The latter also echoes our finding. Zhao et al. (2019) concurrently allocate repairmen, cold standby redundancy, and spares inventory to maximize system availability. A common assumption in these RRA-SPL models is that component lifetimes follow an exponential distribution with a constant failure rate. In our paper, we relax the constant failure rate assumption, and consider time-varying failure rates to generalize component lifetime distribution.
2.2 Joint decision on reliability-redundancy and maintenance
Some researchers argue that it is necessary to combine RRA and PM decisions because these decisions influence each other and collectively impact the total cost of a system’s lifetime. For instance, Levitin and Lisnianski (1999) jointly optimize component redundancy and replacement schedules for multi-state systems to achieve the desired reliability objectives. They employ genetic algorithms to minimize system costs, which include capital, maintenance, and random failures. Nourelfath et al. (2012) and Liu et al. (2013) address the redundancy-maintenance optimization problem for multi-state systems under imperfect repair. The focus of both studies is to achieve the desired system availability while minimizing investments in redundant units and maintenance activities.
Moghaddass et al. (2012) conduct a study comparing the trade-off between component redundancy and its maintenance frequency to maximize the profitability in a multi-state system, rather than solely focusing on cost reduction. They use a continuous-time Markov process model to estimate system availability and determine maintenance initiation criteria. Bei et al. (2017) formulate a two-stage stochastic optimization method assuming constant stress and perfect repair to determine component choice, redundancy level, and maintenance time for a series-parallel system. Later, Zhu et al. (2018) extend the redundancy-maintenance optimization model by incorporating time-varying usage stress and minimal repair. Bei et al. (2019) solve a similar problem by considering worst-case scenarios for future system usage. They minimize the conditional value-at-risk of the cost rate to obtain the risk-averse decision.
One common assumption in existing RRA-PM allocation models is the availability of spare parts is guaranteed. However, our paper acknowledges the backorder situation when spares inventory runs out. We aim to mitigate parts supply uncertainty and make a robust redundancy-maintenance decision by optimizing redundant components and replacement time.
2.3 Joint optimization of maintenance and spares inventory
This research stream is also known as maintenance service logistics (Vaughan, 2005; Van Horenbeek et al. 2013). The objective is to achieve high system availability by coordinating part replacement time with spares provisioning. For instance, de Smidt-Destombes et al. (2009) conduct a joint optimization of maintenance initiation, spares quantity, and repair capacity to minimize the ownership cost in a -out-of- system. Bjarnason and Taghipour (2016) coordinate inspection time, periodic reorders, and emergency order-up-to level using an replenishment policy to minimize the system cost rate. Zhu et al. (2020) utilize maintenance schedules and advance demand information to forecast intermittent spares demand and develop a dynamic inventory control mechanism to minimize costs. Wang and Zhu (2021) jointly coordinate condition-based replacement and spares stocking policies for a multi-state -out-of- system. Zhang et al. (2022) address a condition-based maintenance service logistics problem for a series-parallel system with both hard and soft failures. These studies assume a pre-defined component redundancy level. However, in our model, redundancy is treated as an endogenous decision variable that is optimized alongside replacement time and spares stocking level.
Jin et al. (2015) present a principal-agent game model to minimize the annualized cost of repairable systems through the coordination of maintenance, spares inventory, and repair and renewing times in the aftermarket. Our study expands their model in two aspects. First, in addition to PM and SPL, we adopt component redundancy as an alternative approach to enhancing system reliability and availability. Secondly, we consider the limited capacity of repair and renewing shops, which are operated in a decentralized mode to accommodate different levels of skills and resources.
For further research on PM-SPL, we refer readers to the works of Wang et al. (2009), Chen et al. (2013), Bjarnason et al. (2014), Olde Keizer et al. (2017), Basten and Ryan (2019), and Zhu et al. (2022). It is common for maintenance service logistics models to assume unlimited repair capacity. However, our paper distinguishes itself from existing PM-SPL works by considering a repairable inventory with limited repair capacity. Díaz and Fu (1997) and Sleptchenko et al. (2002) demonstrate that capacitated repair is more realistic due to constraints in facilities and manning hours.
2.4 Summary of the research gap
The literature review reveals a lack of joint optimization framework of RRA, PM, and SPL. Our paper contributes to the literature in three key ways. First, our proposed redundancy-maintenance-inventory allocation model is the first of its kind to drive system reliability and availability performance throughout the design, manufacturing, and field use stages. Secondly, we introduce two parallel Erlang-C queues to handle parts repair and renewing tasks, respectively. Both queues can effectively accommodate the distinctions in processing time, manning skills, and reasons for return. Thirdly, we derive a unified system availability model that captures ten performance drivers, including redundancy level, maintenance time, spares stocking, and repair and renewing capacity.
3 An integrated product-service supply chain
3.1 The network setting
As depicted in Fig.1, the system consists of multiple -out-of- active redundant subsystems (for ) connected in series. The components within each subsystem are identical, but they differ among subsystems. Therefore, the system is made of different part types. For the subsystem, represents the minimum required working units, with . As components are removable, they are also referred to as line replaceable units (LRUs). In this study, we use the terms component, part, and item interchangeably to refer to a repairable LRU.
The OEM implements an integrated product-service offering program to support systems at the customer site shown in Fig.2. Since the demand for spare parts is intermittent, a spares inventory is placed in proximity to these systems to facilitate replacement (Hekimoğlu et al., 2018). In the industry, age-based replacement is widely used due to its technical maturity and scheduling flexibility (El-Ferik, 2008; Huynh et al., 2012). For a part type , where , it is inspected at a predefined time interval . If the item survives through , it is proactively replaced with a spare item. If the item fails prior to , a corrective replacement is performed immediately. As a result, two types of spares demands are generated from the fleet: one for proactive replacement and the other for failure replacement. Upon renewal or repair, the part is put back into the inventory for future use.
Since repairing a failed part requires more time, resources, and skills than renewing an aging item, the OEM decides to decentralize the renewal and repair shops. Poisson process is commonly used to estimate spare part demands in repairable inventory literature (Lee, 1987; Kim et al., 2007; Öner et al., 2013). We adopt a similar approach to model the renewal and repair shops, respectively. Particularly, the model represents the renewal process, and the model represents the repair process, where and are the numbers of servers, respectively.
Tab.1 lists the decision variables that the OEM attempts to optimize, including component redundancy, base stock level, replacement age, and renewing and repair servers. Table A. in Appendix A summarizes the notation of the model parameters of this paper. The objective is to minimize the annualized system cost subject to system reliability and availability criteria which will be elaborated in Section 5.
3.2 Superimposed parts renewal process
We begin the analysis of parts renewal process from single-item system that contains only one LRU. Reliability of a single-item system under age-based maintenance is often characterized by the mean-time-between-replacements (MTBRs). Let be the component reliability, and be the cumulative distribution function. Its MTBR can be estimated as:
In age-based maintenance, the spare parts demand process can be treated as the superposition of two renewal processes: a proactive replacement stream and a failure (i.e., corrective) replacement stream (Jin et al., 2015). For a single-item system with one LRU, let and be the spare parts demand rate for proactive replacement and failure replacement, respectively. Based on Eq. (1), we have:
Here is the probability of a proactive replacement, and is the probability of a failure placement. Given a fleet with single-item systems, each system independently generates proactive replacement and failure replacement streams, respectively. Hence the aggregate spare part demand rate of a single-item system fleet, denoted as , can be estimated as:
The process formed by the union of fleet replacements is called a superimposed renewal process (SRP). Cox and Smith (1954) have demonstrated that as the fleet size () approaches infinity and the operating time is sufficiently large, the SRP becomes a homogeneous Poisson process, regardless of the lifetime distribution of each system. Wang (2012) further proves that the occurrence times between two successive replacements can be approximated as exponential as long as . The simulation done by Jin et al. (2021) also supports this statement, specifically in the context of age-based replacement. Wu (2019, 2021) has expanded the SRP theory to investigate systems under imperfect repair, incorporating non-exponential failures such as the arithmetic reduction of failure intensity and the arithmetic reduction of age. In our study, since a failed part is replaced with a spare part, the replacement is equivalent to a perfect repair.
3.3 Parts repair queueing model
Since SRP can be approximated as a homogeneous Poisson process, the Erlang-C queueing model can be used to characterize the performance of the repair shop. The Erlang-C queue accommodates a waiting line, which is commonly found in a capacitated repair shop. Let denote the number of repair servers, and denote the arrival rate of failed parts to the repair shop. If a fleet consists of single-item systems, then where is given in Eq. (3). The transition diagram of the queue is provided in Fig.3.
The state in the transition diagram represents the number of failed parts in the repair shop, and is the repair rate per server. Let denote the probability that an incoming part needs to wait in the queue. According to Winston (2004), we have:
where is called the traffic intensity rate. The queue is stable if and only if . The repair turn-around time, denoted as , measures the duration from when the part enters the repair shop to when it is fixed and put back to the spares inventory. If the part transportation time is small or can be ignored, can be obtained as:
3.4 Parts renewing queueing model
A separate Erlang-C queue denoted as is used to characterize the renewing shop. The probability that an incoming part needs to wait before being renewed can be estimated as:
where is called the renewing traffic intensity rate. Note that is the parts arrival rate to the renewing shop with , and is the renewing rate per server. The renewing queue is stable if and only if . The renewing turn-around time, denoted as , can be estimated as:
By combining Eqs. (6) and (8), the average part turn-around time (ATT), denoted as , is obtained as follows:
4 Availability of repairable system
4.1 Availability of single-item system
The availability of a single-item system is frequently used to manage preventive maintenance and spare parts logistics when the system’s unitization remains relatively stable (Louit et al., 2011; de Smidt-Destombes et al., 2009). It can be calculated using the following expression:
where MTBR is given in Eq. (1), and MDT stands for the system mean downtime either due to a planned or a failure replacement. System downtime under a planned replacement comprises of hands-on replacement time and delay if the inventory is out of stock. Let be a random variable representing the spare part demand of the inventory, and be the base-stock level with one-for-one replenishment. The downtime under a planned replacement, denoted as , can be expressed as:
where is the hands-on replacement time, and is the stockout probability. Similarly, the downtime of a failure replacement, denoted as , can be expressed as:
By combining both scenarios, the actual MDT of a single-item system is given as
where is the average part turn-around time in Eq. (9). Since is the fleet spare parts demand that follows the Poisson process, the stockout probability can be obtained as:
with
where is the mean spare parts demand during ATT, and is the parts demand rate of the fleet. If a fleet comprises single-item systems, we have as shown in Eq. (4). Now the single-item system availability, denoted as A, is obtained by substituting Eqs. (1), (13) and (14) into (10) as follows:
Note that Eq. (16) incorporates nine performance drivers. These are the part reliability , the maintenance interval , the base stock level , the fleet size , the hands-on replacement time , the number of renewing and repair servers and , and the parts renewing rate and repair rate that are embedded in through Eqs. (6), (8) and (15).
4.2 Availability of k-out-of-n redundant system
For a -out-of- system with active redundancy, the system is functional provided that at least components are good at any point in time. Hence the system availability, denoted as , is estimated by
where is the number of redundant units with . Note that A is the single-item system availability in Eq. (16). Together with x, there are ten performance drivers in . For a fleet with redundant systems, the spare parts demand rate of the fleet is . Two assumptions are made in Eqs. (16) and (17). First, the system is repairable with random up and down cycles. Second, the utilization of each system may vary, but the average utilization shall remain stable over time.
5 Redundancy-maintenance-inventory allocation model
5.1 Minimizing annualized system cost
Based on the integrated product-service supply chain depicted in Fig.2, we propose a redundancy-maintenance-inventory allocation (RMIA) model with the objective of minimizing the annualized system cost of the fleet. RMIA represents a lifetime approach to attaining system reliability and availability goal by integrating design, manufacturing, and maintenance logistics activities. The system cost is comprised of: 1) initial capital, 2) overhead costs for repairing and renewing parts, 3) inventory expenses for spare parts, and 4) operating costs for repair and renewal shops. Tab.1 lists the decision variables, which include redundancy level, spares stocking level, replacement age, renewing servers, and repair servers. We denote this model as RMIA, and it is formulated as follows:
Model RMIA
Min:
subject to:
The objective Eq. (18) captures the annualized system cost associated with initial capital, preventive maintenance, spares inventory, and parts repair and renewal activities. Note that , , , , and represent the decision variables. Model RMIA is also applicable to new product introduction phase, when the cost, reliability, repair skillset, and technology maturity differ significantly among different LRU types. To meet the time-to-market goal, the OEM utilizes both in-house and global resources to perform decentralized repairs in different locations with distinct repair crews. For instance, in the high-speed rail industry, maintenance tasks are assigned to different repair crews based on their individual skillsets to enhance accountability and categorize labor skills. Both and are the capital recovery factors for the system and spare parts, respectively. and are the spare parts demand rate for planned and failure replacements of part type . Additionally, and represent the renewal and repair costs of part type , respectively, while signifies the unit annual holding cost. Finally, represents the annual cost per repair server, and denotes the annual cost per renewing server.
Constraint (19) defines the system availability target, where stands for the availability of redundant subsystem , as given in Eq. (17). Constraint (20) defines the reliability criterion for each LRU type. That is, should exceeds certain percentage of component’s MTBF, and typically . Constraint (21) defines the physical limitations of each subsystem. Constraints (22) and (23) simply stipulate that x, s, p, and q are nonnegative integers.
5.2 Bisection search algorithm
The bisection algorithm is a highly effective method for solving non-convex optimization models that arise in a variety of fields including reliability, inventory, power systems, and space-trajectory problems. For example, Mouatasim (2018) proposes a reduced gradient and bisection method for optimizing a non-convex differentiable objective function, with results confirming the global convergence of the algorithm. Reddy and Bijwe (2018) combine the bisection method with simulation to efficiently solve a large-scale optimal power flow model involving non-convex and discrete variables. Jin et al. (2017) demonstrate the use of the bisection search to address a joint RRA and SPL allocation problem. More recently, Barnett and Gosselin (2021) have developed a bisection algorithm to minimize the time required to follow a path defined in space by dividing the global problem into a series of simpler subproblems. In this paper, we propose the use of bisection search coupled with neighborhood exploration to solve the RMIA model. Specifically, we utilize Algorithms 1 and 2 for solving the case of single -out-of- systems (i.e., when ), while Algorithm 3 becomes necessary when .
Algorithm 1: (Minimizing system cost)
Step 1: Initialization: estimate , , , and using Eqs. (B3), (B4), (B7), and (B9), respectively. Set , , , , , , and (an arbitrarily large value).
Step 2: Compute system availability using Eq. (17) based on current .
Step 3: If , let , and go to Step 2. Else, compute using Algorithm 2. If , let , , , , , and .
Step 4: If , let , and , and go to Step 2.
Step 5: If , let , , and , go to Step 2.
Step 6: If , let , , , and , go to Step 2.
Step 7: Output , and .
Algorithm 2: (Bisection search)
Let and be the lower and upper bounds of , and and are the corresponding objective function values for given , , , and . Fig.4 illustrates the working principle of the bisection search. The detailed procedures are given below.
Step 1: Let , and use Algorithm 1 to find .
Step 2: Let , and use Algorithm 1 to find .
Step 3: Let , and use Algorithm 1 to find .
Step 4: If , and , let , , and , go to step 2.
Step 5: If , let , and , or if , let , and , go to step 2.
Step 6: The algorithm terminates if , where and are the previous and the current values, and is a small threshold. Finally, the optimal solution is .
Algorithm 3: (Neighborhood exploration)
This algorithm solves Model RMIA for systems comprised of multiple -out-of- redundant subsystems for . First, Algorithms 1 and 2 are used to find the optimal solution for each subsystem. Next, a neighborhood search is employed to further reduce the cost by refining all the decision variables. The detailed procedures are as follows:
Step 1: Set where is the subsystem availability for . Find the optimal solution of subsystem using Algorithms 1 and 2. The results are kept as , , and . Note that .
Step 2: For subsystem , perform neighborhood exploration by increasing or decreasing by one step size, i.e., , and compute the new cost and subsystem availability for . The results are kept at and . Note that “+” stands for the increment, and “−” stands for the decrease.
Step 3: Among subsystems, choose the subsystem with the maximum cost saving and the smallest availability reduction, say subsystem . Also choose the subsystem with the minimum cost increase and the largest availability growth, say subsystem .
Step 4: If the cost saving of subsystem is less than the cost increase of subsystem , using the current solutions for subsystems and . Compute the new system availability .
Step 5: If , let , , and for subsystem . Let , , and for subsystem . Also update objective function , and . Go back to Step 2.
Step 6: The algorithm terminates if , where and are the previous and the current costs, and is a small threshold. The final solution is .
6 Applications to systems with single redundant subsystem
6.1 System description
Automated Test Equipment (ATE) is widely used for micro-device testing in the semiconductor manufacturing industry. ATE belongs to -out-of- redundant system, where is the number of primary working units, and is the total number of LRU items. Each LRU is made of a printed circuit board that is repairable. Without loss of generality, the lifetime of an LRU follows the Weibull distribution with shape and scale parameters and , respectively. Tab.2 lists the reliability and cost data associated with ATE design, manufacturing, and after-sales support. The second column data are for the benchmark study, and those in the third column are for sensitivity analysis. Both and are estimated assuming a 5% discount rate with a 10-year and 5-year payoff period for systems and LRU, respectively.
6.2 Result and discussion of benchmark study
The benchmark data in Tab.2 are used to solve Model RMIA for a fleet of -out-of- redundant systems. Algorithms 1 and 2 are used to search for the optimal solution. For systems, the optimal decisions are = 0, = 15, = 3.706, = 2, and = 2. The annualized system cost is = $126,407.18. The achieved system availability is = 0.9901, larger than = 0.99.
Now we examine how the fleet size influences the system cost, and the results are shown in Fig.5. Initially, the system cost decreases with due to economies of scale. However, it tends to level off as further increases. For instance, the system cost drops to $138,598.13 for , compared to $191,147.79 for , resulting in a decrease of 27.4%. However, the cost tends to remain relatively flat with an average of $120,415.45 for . The achieved system availability fluctuates between 0.99 and 0.992 as increases from 10 to 200.
Fig.6 shows the solutions for as increases from 10 to 200. Two observations can be made. First, it is not cost-effective to employ redundant units in order to achieve a system availability of 0.99. Secondly, the spares stock level does not increase monotonically with . For instance, for , but it decreases to 25 for . While for and 130, the OEM chooses to increase from 3 to 6 in exchange for . This contradicts the intuition that more spare parts are needed as the fleet size increases under the ample repair capacity assumption.
Fig.7 illustrates the relationship between and parts availability for . Firstly, the parts availability remains relatively stable between 0.924 and 0.981 regardless of . Secondly, the parts availability consistently falls below . Lastly, for , increasing proves to be an effective method of meeting . However, when , the inventory levels off or even decreases. Hence, expanding repair and renewing servers becomes more cost-effective in order to achieve . Regardless of , the utilization rate of renewing and repair servers is 0.84 and 0.86, respectively. This result aligns with the study by Sleptchenko et al. (2003), which demonstrates that capacitated repair shops typically have a utilization rate ranging from 0.8 to 0.95.
6.3 Comparison between redundancy and sparing
In this section, a sensitivity analysis is conducted by comparing redundancy allocation and spares stocking. First, five cases corresponding to parameters , , , and are examined. For each parameter, Model RMIA is solved with three different values considering redundancy and non-redundancy, respectively. The results are presented in Tab.3, where the optimal solutions are indicated by underscores.
In Case 1, we analyze the influence of on the decisions regarding . To achieve , both redundancy and a larger spares inventory are required, with and . The cost of the system is $137,348. It should be noted that there is no feasible solution for , indicating that spares inventory alone cannot guarantee an availability of 0.999. As decreases to 0.99, the optimal values are and , resulting in a system cost of $126,407. If further decreases to 0.9, and are sufficient to achieve the target availability with a lower cost of $123,757. Case 1 also demonstrates that as is relaxed, the system cost is reduced, but the values of , and remain relatively stable.
In Case 2, we increase from 0.5 to 1.1 and examine its impact on the decision variables. For = 0.5, the optimal solution is the same as that of = 0.7, indicating that a high replacement frequency is not necessarily optimal. For = 1.1, the optimal = 5.068, which is 1.17 times of the MTBF. A larger results in a lower proactive replacement frequency, but an increased corrective maintenance. As a result, the system cost increases to $129,623, compared to $126,407 for = 0.5 and 0.7.
In Case 3, we decrease the LRU reliability by increasing from 0.2 to 1. The system can achieve the target availability = 0.99 by using spare parts alone for = 0.2 and 0.5. However, redundancy with must be adopted for = 1 alone with . It is also observed that , , and increase with , which is expected due to the growing number of field returns.
Case 4 examines the impact of the hands-on replacement time on the decision making. It demonstrates that has no direct effect on , and . In addition, a smaller is sufficient to achieve = 0.99 if = 8 or 24 h. However, if = 48 h, must be adopted to attain the desired system availability.
In Case 5, we increase the inventory holding cost and exmine its impact on the decision variables. Redundancy is not necessary when the part’s annual holding cost is relatively low, with = 10,000 and 20,000. However, if reaches the item cost, and result in a lower system cost compared to the alternative solution of and .
Next, we compare five additional cases pertaining to , , , , and . Model RMIA is solved by varying one parameter, and the results are summarized in Tab.4. The solutions marked with an underscore represent the optimal decisions. A common observation from Cases 6 to 10 is that spares inventory is more cost-effective than redundancy in achieving = 0.99.
Case 6 examines the influence of the shape parameter on the decision variables and system cost. When increases from 1.5 to 4.5, there is a preference for more proactive replacements as evidenced by the increased value of from 0.21 to 0.86. This is because the life distribution with a higher becomes more concentrated, thereby benefiting proactive replacements. Consequently, increases from 1 to 3, and decreases from 4 to 1. Additionally, it is observed that the system cost decreases with due to the benefit of proactive replacements.
Case 7 explores the effects of on the decision variables and system cost. As increases from 3 to 12 days, increases from 12 to 17, and the cost rises from $115,023 to $139,453. This result is expected, as a slower renewing process requires more spare parts to ensure the system availability. Furthermore, the OEM chooses to extend from 2.5 to 5.09 years to tolerate more failure replacements.
In Case 8, the repair time increases from 6 to 24 days. Similar to Case 7, the increase in leads to an increase in the inventory level from to 32, and the cost from $112,741 to $145,399. Additionally, the OEM opts to adopt more renewing servers for proactive replacements. For instance, when = 6, we have = 1 and = 5.202. If = 24, becomes 3 while drops to 2.679.
In Case 9, the cost of is decreased from $480k to $240k. The value of increases from 2 to 4, while decreases from 2 to 1. Approximately 89% of field returns are proactive replacements. Conversely, if is increased by 50%, the opposite conclusion can be drawn.
Case 10 investigates how influences the decision variables. When is reduced from $640k to $320k, the OEM opts to use more repair servers rather than renewing servers, as expected. Consequently, increases from 3.728 to 5.068, a 36% increase. In fact, only 35.3% of field returns are proactive replacements. Conversely, if is increased by 50% from the benchmark cost, the opposite observation can be made.
6.4 Discussion of heuristic solution quality
Particle swarm optimization (PSO) and non-dominated genetic algorithm (GA) are also employed to solve Model RMIA using the benchmark data. The objective is to compare the solution quality of different heuristic algorithms. Both GA and PSO are frequently used to solve reliability, availability, and maintainability problems (Zaretalab et al., 2022), as well as PM planning (Alaswad and Xiang, 2017), and SPL models (Yan et al., 2023). The PSO and GA algorithms are implemented in Matlab and executed on a PC with an Intel(R) Core (TM) i5-7200U CPU @ 2.5GHz, 4 Core(s), 24 GB memory, and 4 Logical Processors.
Tab.5 summarizes the optimization results obtained from three algorithms as the fleet size increases from 10 to 200. In comparison to the PSO and GA, the BS algorithm yields the lowest cost in 15 out of 20 cases. However, for = 60, PSO proves to be the best option, with a cost lower than BS by $38.94. On the other hand, for = 90, 110, 130, and 150, GA outperforms both BS and PSO. Nevertheless, the cost difference between GA and BS is relatively small, ranging from $1.01 to $2.69. It is worth noting that in these cases, the values of , , , and are identical between GA and BS, and the only difference is . Similarly, the values of , , , and are identical between PSO and BS, and the only difference is . Furthermore, Tab.5 demonstrates that both GA and PSO tend to overestimate the system cost under a small fleet, such as = 10, 20, 30 and 40. For , the cost difference among all three algorithms is less than 0.9%, suggesting that BS, GA, and PSO are capable of converging to the lowest cost under a large fleet.
7 Applications to systems with multiple redundant subsystems
In this section we extend the application of Model RMIA to series-parallel systems each comprised of four -out-of- subsystems (i.e., ). Tab.6 provides the parameter values of individual subsystems. The target system availability is set at 0.99, indicating that the availability of each subsystem should be approximately 0.997. First, Algorithms 1 and 2 are used to find the optimal solutions for each subsystem with = 0.997. Then, Algorithm 3 utilizes neighborhood exploration to optimize the overall problem with = 0.99.
The results under different fleet sizes are summarized in Tab.7. The following observations can be made: first, as expected, the system cost decreases as increases from 10 to 100. Specifically, the cost is $776,664 for , and $610,007 for , down by 21.5%. Secondly, Subsystem 1 opts for redundancy for and 20. However, as becomes larger, redundancy is no longer the preferred option. Subsystems 2 and 3, on the other hand, prefer to install one redundant component regardless of the fleet size. For Subsystem 4, redundancy is never the option regardless of the fleet size. This is because the unit cost of the LRU for Subsystem 4 is the highest among the four subsystems, and only 3 working units are required, compared to 5, 7 and 10 for the other subsystems. Consequently, the marginal cost of using one redundant unit is considerably higher for Subsystem 4.
8 Conclusions
This study proposes a joint redundancy-maintenance-inventory allocation model to minimize the annualized system cost while achieving the desired reliability and availability targets during the lifetime period. This model is the first of its kind in bringing together the decisions of reliability-redundancy, preventive maintenance, and spare parts logistics. Two parallel Erlang-C queues are utilized to characterize the decentralized repair and renewal shops, respectively. The demand for fleet spare parts is modeled as a superimposed renewal process, consisting of proactive and failure placement streams. To solve the redundancy-maintenance-inventory allocation model, a bisection search algorithm that combines neighborhood exploration is developed. The numerical experiments provide several important insights. First, redundancy is preferred over spare parts when the fleet size is small, inventory holding costs are high, replacement time is extended, or extremely high system availability, such as 0.999, is required. Second, there is no monotonic correlation between spares inventory level, parts availability, and system availability in the joint allocation model. Third, both the spares inventory and the system cost decrease as the Weibull shape parameter increase, suggesting that age-based replacement becomes more cost-effective for items with a concentrated lifetime distribution.
In the future, the redundancy-maintenance-inventory model can be expanded in several directions. For example, with the increasing use of prognostics and health management systems, condition-based maintenance can be integrated into the joint allocation model. This will help prevent and reduce random failures, thereby improving spares provisioning efficiency. Additionally, multi-class queues can be employed to model repair and renewal tasks in a centralized facility. However, this may require theoretical advancements as current multi-class queueing models become computationally burdensome when dealing with multiple servers.
9 Appendix A: Notation of model parameters
10 Appendix B: The range of decision variables
Th range of the decision variables is analyzed to reduce the search space of Model RMIA. For a given subsystem, the upper limit of is governed by Constraint (21), namely . Hence the effort below is focused on , , and .
B.1. The range of
The maintenance interval is correlated with MTBF denoted as . For Weibull distribution, . Fig.8 plots the Weibull reliability in three cases: , , and . When increases from 1 to 6, we find that increases from 0.61 to 1, and decreases from 0.14 to 0. If , the chance of making a proactive replacement is only 0.09. Hence should not exceed . Otherwise, over 91% of replacements are due to failures. Thus the range of shall fall in . Note that the reliability curves in Fig.8 are independent to the scale parameter .
B.2. The range ofqandp
The Erlang-C queue is stable if and only if the repair traffic intensity rate . This implies that:
The value of depends on the maintenance interval . The lower limit of occurs at . For systems each with -out-of- configuration, the aggregate failure replacement rate is given as
Substituting Eq. (B2) into Eq. (B1) yields the lower limit for as follows:
where represents the smallest integer greater than . Similarly, the upper limit of is found at . That is:
where
The derivation of lower limit of is similar to , and the results are given below. The fleet generates the smallest proactive replacements when , and its rate is
Hence the lower limit of is obtained by
Similarly, when , the fleet generates the largest proactive replacements, and the rate is
Thus the upper limit of is given as
B.3. The range ofs
Since the lower limit of is zero, we just need to find its upper limit. Given , , , and , the value of increases with . According to Eq. (17), the redundant system availability must satisfy the following condition
where is the smallest component availability given in Eq. (16). After the re-arrangement, Eq. (16) becomes:
where
Based on Eq. (B11) the upper limit for can be derived using the procedure as follows.
Step 1: For given , estimate the , , and according to Eqs. (B3), (B4), (B7), and (B9), respectively.
Step 2: Compute the values of , , , and based on Eqs. (5) and (7), respectively.
Step 3: Based on Eq. (9), compute for and , respectively.
Step 4: Based on Eq. (B12), compute for and , respectively.
Step 5: Find and that satisfy Eq. (B11) with respect to and .
Step 6: choose as the upper limit of .
The rationality of this 6-step procedure is that the upper limit of occurs when and are in their lower limit either at or . If or is above their lower limit, deceases and becomes smaller. Hence less amounts of spare parts are needed to meet .
Alaswad S,Xiang Y, (2017). A review on condition-based maintenance optimization models for stochastically deteriorating system. Reliability Engineering & System Safety, 157: 54–63
[2]
Barnett E,Gosselin C, (2021). A bisection algorithm for time-optimal trajectory planning along fully specified paths. IEEE Transactions on Robotics, 37( 1): 131–145
[3]
Basten R J I,Ryan K J, (2019). The value of maintenance delay flexibility for improved spare parts inventory management. European Journal of Operational Research, 278( 2): 646–657
[4]
Basten R J I,van Houtum G J, (2014). System-oriented inventory models for spare parts. Surveys in Operations Research and Management Science, 19( 1): 34–55
[5]
Bei X,Chatwattanasiri N,Coit D W,Zhu X, (2017). Combined redundancy allocation and maintenance planning using a two-stage stochastic programming model for multiple component systems. IEEE Transactions on Reliability, 66( 3): 950–962
[6]
Bei X,Zhu X,Coit D W, (2019). A risk-averse stochastic program for integrated system design and preventive maintenance planning. European Journal of Operational Research, 276( 2): 536–548
[7]
Bjarnason E T S,Taghipour S, (2016). Periodic inspection frequency and inventory policies for a k-out-of-n system. IIE Transactions, 48( 7): 638–650
[8]
Bjarnason E T S,Taghipour S,Banjevic D, (2014). Joint optimal inspection and inventory for a k-out-of-n system. Reliability Engineering & System Safety, 131: 203–215
[9]
Chen L,Ye Z S,Xie M, (2013). Joint maintenance and spare component provisioning policy for k-out-of-n systems. Asia-Pacific Journal of Operational Research, 30( 6): 1350023
[10]
Coit D W,Zio E, (2019). The evolution of system reliability optimization. Reliability Engineering & System Safety, 192: 106259
[11]
Cox D R,Smith W L, (1954). On the superposition of renewal processes. Biometrika, 41( 1–2): 91–99
[12]
de Smidt-Destombes K S,van der Heijden M C,van Harten A, (2009). Joint optimisation of spare part inventory, maintenance frequency and repair capacity for k-out-of-n systems. International Journal of Production Economics, 118( 1): 260–268
[13]
Dekker R,Pinçe ÇZuidwijk R,Jalil M N, (2013). On the use of installed base information for spare parts logistics: A review of ideas and industry practice. International Journal of Production Economics, 143( 2): 536–545
[14]
Díaz A,Fu M, (1997). Models for multi-echelon repairable item inventory systems with limited repair capacity. European Journal of Operational Research, 97( 3): 480–492
[15]
El-Ferik S, (2008). Economic production lot-sizing for an unreliable machine under imperfect age-based maintenance policy. European Journal of Operational Research, 186( 1): 150–163
[16]
Hekimoğlu M,van der Laan E,Dekker R, (2018). Markov-modulated analysis of a spare parts system with random lead times and disruption risks. European Journal of Operational Research, 269( 3): 909–922
[17]
Hu Y,Miao X,Si Y,Pan E,Zio E, (2022). Prognostics and health management: A review from the perspectives of design, development and decision. Reliability Engineering & System Safety, 217: 108063
[18]
Huynh K T,Castro I T,Barros A,Bérenguer C, (2012). Modeling age-based maintenance strategies with minimal repairs for systems subject to competing failure modes due to degradation and shocks. European Journal of Operational Research, 218( 1): 140–151
[19]
Jin T, (2023). Bridging reliability and operations management for superior system availability: Challenges and opportunities. Frontiers of Engineering Management, 10( 3): 391–405
[20]
JinTLiHSunF (2021). System availability considering redundancy, maintenance and spare parts with dual repair processes. In: Proceedings of Industrial and Systems Engineer Conference, Montreal, Canada, 1–6
[21]
Jin T,Taboada H,Espiritu J,Liao H, (2017). Allocation of reliability-redundancy and spares inventory under Poisson fleet expansion. IISE Transactions, 49( 7): 737–751
[22]
Jin T,Tian Y, (2012). Optimizing reliability and service parts logistics for a time-varying installed base. European Journal of Operational Research, 218( 1): 152–162
[23]
Jin T,Tian Z,Xie M, (2015). A game-theoretical approach for optimizing maintenance, spares and service capacity in performance contracting. International Journal of Production Economics, 161: 31–43
[24]
Kim S H,Cohen M A,Netessine S, (2007). Performance contracting in after-sales service supply chains. Management Science, 53( 12): 1843–1858
[25]
Lee H L, (1987). A multi-echelon inventory model for repairable items with emergency lateral transshipments. Management Science, 33( 10): 1302–1316
[26]
Levitin G,Lisnianski A, (1999). Joint redundancy and maintenance optimization for multistate series–parallel systems. Reliability Engineering & System Safety, 64( 1): 33–42
[27]
Liu Y,Huang H Z,Wang Z,Li Y,Yang Y, (2013). A joint redundancy and imperfect maintenance strategy optimization for multi-state systems. IEEE Transactions on Reliability, 62( 2): 368–378
[28]
Louit D,Pascual R,Banjevic D,Jardine A K S, (2011). Optimization models for critical spare parts inventories—a reliability approach. Journal of the Operational Research Society, 62( 6): 992–1004
[29]
Moghaddass R,Zuo M J,Pandey M, (2012). Optimal design and maintenance of a repairable multi-state system with standby components. Journal of Statistical Planning and Inference, 142( 8): 2409–2420
[30]
Mouatasim A E, (2018). Implementation of reduced gradient with bisection algorithms for non-convex optimization problem via stochastic perturbation. Numerical Algorithms, 78( 1): 41–62
[31]
Nourelfath M,Châtelet E,Nahas N, (2012). Joint redundancy and imperfect preventive maintenance optimization for series–parallel multi-state degraded systems. Reliability Engineering & System Safety, 103: 51–60
[32]
Olde Keizer M C A,Teunter R H,Veldman J, (2017). Joint condition-based maintenance and inventory optimization for systems with multiple components. European Journal of Operational Research, 257( 1): 209–222
[33]
Öner K B,Scheller-Wolf A,van Houtum G J, (2013). Redundancy optimization for critical components in high-availability technical systems. Operations Research, 61( 1): 244–264
[34]
Reddy S S,Bijwe P R, (2018). An efficient optimal power flow using bisection method. Electrical Engineering, 100( 4): 2217–2229
[35]
Selçuk B,Agrali S, (2013). Joint spare parts inventory and reliability decisions under a service constraint. Journal of the Operational Research Society, 64( 3): 446–458
[36]
Selviaridis K,Wynstra F, (2015). Performance-based contracting: a literature review and future research directions. International Journal of Production Research, 53( 12): 3505–3540
[37]
Si S,Zhao J,Cai Z,Dui H, (2020). Recent advancement in system reliability optimization driven by importance measures. Frontiers of Engineering Management, 7( 3): 335–358
[38]
Sleptchenko A,van der Heijden M C, (2016). Joint optimization of redundancy level and spare part inventories. Reliability Engineering & System Safety, 153: 64–74
[39]
Sleptchenko A,van der Heijden M C,van Harten A, (2002). Effects of finite repair capacity in multi-echelon, multi-indenture service part supply systems. International Journal of Production Economics, 79( 3): 209–230
[40]
Sleptchenko A,van der Heijden M C,van Harten A, (2003). Trade-off between inventory and repair capacity in spare part networks. Journal of the Operational Research Society, 54( 3): 263–272
[41]
Van Horenbeek A,Scarf P,Cavalcante C,Pintelon L, (2013). The effect of maintenance quality on spare parts inventory for a fleet of assets. IEEE Transactions on Reliability, 62( 3): 596–607
[42]
Vaughan T S, (2005). Failure replacement and preventive maintenance spare parts ordering policy. European Journal of Operational Research, 161( 1): 183–190
[43]
Wang J,Zhu X, (2021). Joint optimization of condition-based maintenance and inventory control for a k-out-of-n: F system of multi-state degrading components. European Journal of Operational Research, 290( 2): 514–529
[44]
Wang L,Chu J,Mao W, (2009). A condition-based replacement and spare provisioning policy for deteriorating systems with uncertain deterioration to failure. European Journal of Operational Research, 194( 1): 184–205
[45]
Wang W, (2012). A stochastic model for joint spare parts inventory and planned maintenance optimization. European Journal of Operational Research, 216( 1): 127–139
[46]
Wang Z, (2021). Current status and prospects of reliability systems engineering in China. Frontiers of Engineering Management, 8( 4): 492–502
[47]
WinstonW (2004). Operations Research: Applications and Algorithms, 4th ed., Chapter 20, pp. 1051–1131, Brooke/Cole Cengage Learning, Belmont, CA, USA
Wu S, (2021). Two methods to approximate the superposition of imperfect failure processes. Reliability Engineering & System Safety, 207: 107332
[50]
Xie W,Liao H,Jin T, (2014). Maximizing system availability through joint decision on redundancy allocation and spares inventory. European Journal of Operational Research, 237( 1): 164–176
[51]
Yan B,Zhou Y,Zhang M,Li Z, (2023). Reliability-driven multiechelon inventory optimization with applications to service spare parts for wind turbines. IEEE Transactions on Reliability, 72( 2): 748–758
[52]
Zaretalab A,Sharifi M,Guilani P P,Taghipour S,Niaki S T A, (2022). A multi-objective model for optimizing the redundancy allocation, component supplier selection, and reliable activities for multi-state systems. Reliability Engineering & System Safety, 222: 108394
[53]
Zhang J,Zhao X,Song Y,Qiu Q, (2022). Joint optimization of condition-based maintenance and spares inventory for a series–parallel system with two failure modes. Computers & Industrial Engineering, 168: 108094
[54]
Zhang S,Huang K,Yuan Y, (2021). Spare parts inventory management: A literature review. Sustainability, 13( 5): 2460
[55]
Zhao X,Zhang J,Wang X, (2019). Joint optimization of components redundancy, spares inventory and repairmen allocation for a standby series system. Proceedings of the Institution of Mechanical Engineers. Part O, Journal of Risk and Reliability, 233( 4): 623–638
[56]
Zhu S,Jaarsveld W,Dekker R, (2020). Spare parts inventory control based on maintenance planning. Reliability Engineering & System Safety, 193: 106600
[57]
Zhu X,Bei X,Chatwattanasiri N,Coit D W, (2018). Optimal system design and sequential preventive maintenance under uncertain aperiodic-changing stresses. IEEE Transactions on Reliability, 67( 3): 907–919
[58]
Zhu X,Wang J,Coit D W, (2022). Joint optimization of spare part supply and opportunistic condition-based maintenance for onshore wind farms considering maintenance route. IEEE Transactions on Engineering Management, 71: 1086–1102
RIGHTS & PERMISSIONS
Higher Education Press
AI Summary 中Eng×
Note: Please be aware that the following content is generated by artificial intelligence. This website is not responsible for any consequences arising from the use of this content.