1 Introduction
To date, reliability has been a worldwide challenge for complex equipment in modern engineering (
Kuo, 2015;
Yang et al., 2018a;
Yu, 2019;
Si et al., 2020). For instance, the underlying causes of many accidents, such as the derailment accident of the ICE-1 high-speed train in Europe in 1998 (
Oestern et al., 2000), the crash of space shuttle Columbia in 2003 (
Smith, 2003), and a series of accidents of the Boeing 737 MAX from 2018 to 2019 (
Cusumano, 2021), are related to the inadequate design and misestimation of reliability.
Different solutions have been proposed globally since the 1960s, especially in the US, Europe, and Japan, to tackle reliability problems, and design and manufacture high-quality products. For instance, in the US, the engineering specialty integration and concurrent engineering (
Blanchard and Fabrycky, 1990;
Sohlenius, 1992) are proposed to bring engineering specialties related to reliability into the design process, so that the reliability and maintainability can be regarded as the design characteristics of products. The design, production, and support processes of products are conducted in parallel and interactively, thereby greatly improving the performance and quality of products and reducing the lifecycle cost. In Europe, ISO 9000-4 (dependability program management) is adopted to integrate engineering and management specialties related to the inherent reliability of products for controlling the reliability over their lifecycles (
Kaâniche et al., 2000). In Japan, the total quality management and robust design are proposed to regard the quality of products as the core and establish a scientific and efficient quality system (
Thornton et al., 2000). On the basis of the aforementioned advanced technology and previous experience, Professor Weimin Yang, who is the pioneer and leader in reliability engineering in China, proposed the concept and theory of reliability systems engineering (RSE) with Chinese characteristics (
Yang et al., 1995;
Yang, 1995). Compared with the relevant technologies abroad, China’s RSE is an independent discipline system with a unified goal and quantifiable indexes that aims to fight against failures focusing on a common product. After approximately 30 years of development, the standards, procedures, and technologies (related to reliability, maintainability, and safety) of RSE have been gradually developed in line with China’s national conditions, and some remarkable application achievements across Chinese industries have been made. This condition reveals the motivation and contribution of our study, which include: 1) systematically reviewing the advent and development of RSE and 2) introducing the latest development of RSE, namely, model-based RSE (MBRSE), and its future perspectives.
This study is accomplished mainly on the basis of the author’s understanding on the theory of RSE and more than 30 years of practical experience from more than 10 military and civil fields, such as aerospace, shipbuilding, and industrial manufacturing, in China. The rest of this article is organized as follows. Section 2 discusses the development and technological framework of RSE. Section 3 presents the conceptual and operational models of MBRSE. Section 4 explains the crucial technologies for MBRSE operation. Section 5 presents a platform and its application in MBRSE. Section 6 provides the conclusions and some representative directions.
2 Development of RSE
2.1 History of RSE
In the 1990s, customer requirements evolved from the product’s function and performance to effectiveness and cost–effectiveness ratio. Under these circumstances, effectiveness has become the synthesis of user concerns, including product availability, dependability, and capacity. Reliability is an important basis for product effectiveness and a key factor influencing lifecycle costs. Therefore, reliability engineering can be regarded as an important aspect of the high-quality requirements of users in the new era. In China, engineers and scientists are committed to developing RSE and solving practical reliability problems in the past 27 years. The development trajectory of RSE in China is shown in Fig. 1.
On the basis of issues related to the complexity of reliability engineering and the poor application of foreign technology, Professor Weimin Yang first developed the overall concept and fundamental theoretical framework of RSE by adopting effectiveness as the goal and product failures as the core elements. The preliminary definition of RSE is summarized as follows (
Yang et al., 1995): RSE is an engineering technique used to study the full lifecycle process of a product and its actions in terms of failure mitigation. Departing from the dialectical relationship between the entirety of a product and its surrounding environment, RSE investigates the intercorrelation between the reliability and lifetime of a product and the surrounding environment, the failure occurrence and evolution, the laws to prevent, detect, mitigate, and eliminate these failures, and a series of techniques and management activities to improve reliability, prolong life, and enhance effectiveness on the basis of various approaches, such as experimental research, field investigation, failure analysis, and maintenance. Professor Weimin Yang introduced the theoretical framework of RSE by analogy with the theories in medical engineering. He found that the RSE of products is extremely similar to the medical engineering of humans in many aspects, such as the “prevention and treatment of disease” and “good birth and healthcare conditions” (
Yang, 1995). RSE proposes a unified goal for integrating multiple specialties in reliability design. As such, RSE involves the application of systems engineering theory in reliability and the integration of reliability into the systems engineering process.
In 2005, the comprehensive quality view on three dimensions (CQVTD) pertaining to the overall characteristics, full lifecycle, and total system and the viewpoint of reforming quality engineering promoted by technology from manufacturing to designing the full lifecycle were proposed to guide the development of RSE and to further strengthen management and design (
Kang and Wang, 2007). In the CQVTD, the quality characteristics of a given product are divided into special quality characteristics (SQCs) corresponding to its function and performance and general quality characteristics (GQCs). At present, studies of GQCs in China have mainly involved reliability, safety, maintainability, testability, support-ability, and environmental adaptability (
Yang, 1995;
Kang and Wang, 2007). Therefore, GQCs are sometimes referred to as the “six characteristics” in China. The CQVTD systematically expounds the relationship between RSE and modern quality engineering and clarifies that RSE, with effectiveness as its goal and the synthesis of the “six characteristics” as its focus, aims to design comprehensive quality characteristics (
Kang and Wang, 2007). In 2005, with a focus on the prevention, diagnosis, and treatment of failures, the technical framework of RSE was further constructed in terms of its fundamental theory, fundamental technology, and application technology (
Kang and Wang, 2005).
Based on the development of RSE, the author of this study put forward a new definition of RSE in 2007: RSE is a synthetic cross-technology and management activity based on systems engineering theory that adopts failures as its core elements and effectiveness as its goal and is designed to investigate the laws of occurrence and the evolution of failures, including stages of prevention, diagnosis, and repair, throughout the full lifecycle of a complex system. The definition of RSE was formally indexed in the Chinese Military Encyclopedia, General Introduction to Military Technology (
Shi, 2007), and it suggests that RSE has been officially recognized as a discipline by the domestic engineering community in China. Since proposed, the development of RSE has been focused on solving the imbalance and inconsistency problems between the designs of GQCs and SQCs (
Kang and Wang, 2007). Synthesis within the GQCs, synthesis between the GQCs and SQCs, and synthesis between the technology and management of comprehensive quality characteristics must be continuously promoted to solve these problems.
In 2015, the core of RSE was further clarified as the combination of effectiveness design and GQC synthesis during the first RSE conference in China. With the development of model-based systems engineering (MBSE), the idea of MBRSE was proposed in 2016 (
Ren et al., 2021). MBRSE integrates a large amount of work that is relevant to GQCs to recognize failure laws on the basis of model evolution (
Li et al., 2017;
Zhao et al., 2018;
Wu et al., 2018;
Xia et al., 2018) and applies these laws to realize the closed-loop mitigation and control (M&C) of failures by adopting models, such as the product, failure, and environment models (
Fan et al., 2016;
Ren et al., 2018a;
Yang et al., 2018b), as core elements (
Ren et al., 2021). Such a process can be integrated into the MBRSE process of products.
2.2 Technological framework of RSE
Three principles are identified to establish the technological framework of RSE: 1) global view, 2) systematic process, and 3) synthetic method. These principles make the RSE studies in China relatively different from the counterparts in foreign countries. For the “global view”, MBRSE coordinates the functional/performance model groups and the GQC model groups in terms of the product, function, and usage at the global level by adopting effectiveness as the goal. For the “systematic process”, MBRSE is used to plan the model-driven reliability work throughout the full lifecycle of products on the basis of multidimensional failure logics, such as failure prevention before product delivery and failure prognosis and diagnosis during operation. For the “synthetic method”, MBRSE achieves data integration, process integration, and characteristic synthesis among the functional/performance model groups and the GQC model groups to further realize GQC technology and management synthesis, driven by failure identification and mitigation.
On the basis of these above principles, the current technological framework of RSE (
Kang and Wang, 2005) involves three levels, namely, fundamental theory, basic technology, and applied technology, as shown in Fig. 2.
(1) Fundamental theory
RSE was developed on the basis of failure recognition theory, which reveals failure mechanisms and recognizes failure laws to support failure prevention, control, and maintenance technologies (
Kang and Wang, 2005). Failure recognition theory integrates the physics of failure (PoF), which occurs under the load response and physical–chemical process (
Qian et al., 2020), the logic of failure (i.e., statics, dynamics, and emergence logics) (
Wang et al., 2009;
Yang et al., 2015), and human error, which is categorized under performance influence and ability limitation (
Che et al., 2019). Its mathematical and physical fundamentals are highly related to certainty and uncertainty theories and their combination (
Ren et al., 2018a).
(2) Basic technology
On the basis of failure laws, a number of basic technologies for failure prevention, diagnosis, and treatment can be developed for RSE. The failure prevention technology mainly relates to the technology of failure prevention over the full lifecycle of a product, including design, production, and use (
Yang et al., 2014). Existing redundancy technology, reduction technology, statistical process control technology, and reliability-centered maintenance (RCM) technology are all failure prevention technologies. Failure diagnosis technology refers to the diagnosis and prediction of failures over the lifecycle of a product. Failure diagnosis focuses on the process of timely monitoring and isolation of failures, and is concerned with the prediction of development trend and consequences of failures (
Tian et al., 2015). On the basis of failure diagnosis technology, failure treatment technology refers to the technology for the timely and effective recovery of product functions once an uncontrollable failure occurs. It aims to repair the product, that is, to quickly, economically, and effectively restore the product function, including the specific technology for repairing product failures, the procedures for repairing product failures, and the financing and supply of spare parts, tools, equipment, and personnel needed to repair product failures.
(3) Applied technology
The application scope of RSE can be described in terms of three dimensions: Lifecycle, object, and technology. The lifecycle dimension represents the full process of systems engineering activities, including concept demonstration, research and design, test, production, evaluation, validation, and operation. The object dimension refers to the physics items of all scales, including materials, components, assemblies, subsystems, and systems. The technology dimension refers to the GQCs that can be applied by RSE, including reliability, safety, maintainability, testability, supportability, and environmental adaptability. In particular, RSE can be applied as a failure prevention and control technology set constructed via the integration of overall characterization methods, full lifecycle processes, and total system elements. Its core technology involves synthetic GQC integration, including synthetic GQC requirement determination with effectiveness simulations as the core, synthetic GQC design with unified function and failure models as the core, and synthetic operation and maintenance technology with prognostics and health management (PHM) (
Wang et al., 2017) as the core.
3 Conceptual and operational model of MBRSE
Integrated with MBSE and RSE, the idea of MBRSE was first proposed by the author of this study in 2016. In this section, the conceptual model and V-model-based operation mode of MBRSE are introduced.
3.1 Conceptual model of MBRSE
Figure 3 illustrates the conceptual model of MBRSE. On the basis of the usage demand, a comprehensive design issue is initially constructed. This comprehensive design can be decomposed into function and failure M&C designs. The comprehensive design issue can be analyzed and solved with various engineering methods. During the solution process, the above two types of designs should cooperate to reduce the number of design iterations. Failure M&C design is based on the cognition of failures and their control laws. With the ever-deepening cognition of design, the product design scheme is becoming increasingly in-depth and detailed and ranges from qualitative descriptions to quantitative calculations. The M&C process can be used to recognize failures and associated control laws. This understanding is based on the knowledge of the operation process/environment (load), which becomes increasingly clear with the advancement of the design process. After solving all problems, system synthesis and evaluation are conducted to assess the solution process and verify the solution degree with regard to the comprehensive design issue. The above process may require several iterations during practical product design until a satisfactory solution is reached with regard to the comprehensive design issue.
3.2 V-model-based operation model of MBRSE
The V-model-based MBRSE is innovatively proposed on the basis of the V-model of MBSE to integrate the function, performance, and GQC model into the MBSE process organically, as shown in Fig. 4. Driven by effectiveness, a complete model system is established by adopting the identification, mitigation, testing, and verification of failures as core elements to achieve digitalized GQC engineering analysis throughout the entire product lifecycle. Synthetic function-GQC design and verification of multilevel products (up to the system of system (SOS) level) can be implemented by using approaches, such as multidimensional digital model coevolution, multitype failure simulation, and multithread closed-loop process management. By virtue of PHM technology, highly efficient and precise failure prognosis and prediction can be achieved to realize digital and intelligent product maintenance. After the incorporation of GQC digital engineering into the systems engineering process, a forward GQC design process can be established. This process adopts the GQC digital engineering V-model as the core and includes 3D synthesis among performance–failure–health, multilevel product data transmission, and synthetic interaction between design analysis and simulation verification. The crucial technology required by the specific execution of the V-model is shown in Section 4.
4 Crucial technology for MBRSE operation
4.1 Synthetic GQC requirement determination based on effectiveness simulation
The top left of the V-model indicates the effectiveness and task motivation-driven simulation verification techniques to obtain the GQC requirements. The goal is to convert the product effectiveness demands originating from the different tasks into GQC requirements, which are further applied as design inputs, via simulation methods. First, an agent theory-based dynamic layering and partitioning effectiveness simulation framework is established by considering four types of elements, namely, task, system, service, and environment (
Ren et al., 2019). The key issues must then be carefully considered and addressed. These issues include but are not limited to the failure occurrence mechanism and behavior modeling for maintenance purposes on the basis of agent action and state transition (
Feng et al., 2019). Subsequently, an effectiveness evaluation technology is developed with Monte Carlo simulation and multidimensional performance criteria (
Ren et al., 2018a). Finally, techniques suitable for parameter analysis, balance, comparison, and optimization are developed on the basis of these highly precise effectiveness simulations to support the determination of GQC requirements. The proposed GQC requirement determination technique overcomes the limitations of traditional methods, such as similarity and empirical methods, and provides a prerequisite for the independent development of novel domestic rescue products.
4.2 Model-driven comprehensive GQC design
The left side of the V-model indicates the model-driven comprehensive GQC design approach, which aims to allocate the reliability requirements obtained from effectiveness simulations to the different product levels in a scaled-down sequence and simultaneously acquire a corresponding digital design plan (
Yang et al., 2012;
2015). With the continuous development of the MBSE concept, the GQC design technique has been proposed with unified failure modeling and mitigation control as the core, as shown in Fig. 5.
Along with the evolutionary process of the unified model, the MBRSE model system has gradually established GQC design requirements by focusing on the identification and mitigation of the following three types of failures.
(1) Identification and mitigation of functional failures
The potential failure modes of functions can be systematically identified in accordance with multiple threads, such as malfunction, degradation, discontinuity, and unexpected functionality. A functional failure model can be established by considering the transfer relations among the effects of these functional failures (
Li et al., 2015;
Ren et al., 2018b). Association sets of the key physical failure modes can be determined by comprehensively considering the occurrence probabilities and consequences of these failure modes and the increase in the M&C status of relevant failures caused by the mitigation of an individual failure mode. Effective implementation of the corresponding improvement and compensation measures should be ensured with closed-loop mitigation control technology.
(2) M&C of physical failures
On the basis of the above functional failure models and mapping relationships between the functionalities and physical models, physical failure modes can be systematically identified in accordance with unmitigated functional failures considering physical and chemical processes, device/raw material/component characteristics, temperature/vibration, and other internal and external loads (
Qian et al., 2020). Under the premise of a clear understanding of the failure mechanisms of the physical units, such as mechanical devices, electronics, and software, association sets of the key physical failure modes can be determined by comprehensively considering the occurrence probability and consequences of these failure modes and the increase in the M&C status of relevant failures due to the mitigation of an individual failure mode (
Sun et al., 2015). PoF-based mitigation mechanisms are introduced to achieve closed-loop mitigation control for determining and optimizing the design parameters of corresponding physical units, mitigating underlying failure causes, avoiding failure occurrence, and preventing physical units from being controlled by these failures.
(3) M&C of coupled failures
In the synthetic process of a system, system-level failures can be identified by comprehensively considering the interface, transmission, error propagation, and potential functionality failures. These system-level failures are usually regarded as coupled failures and must be mitigated and controlled with their relevant failures (
Liu et al., 2019).
4.3 Multilevel GQC verification combined with practical experiments and virtual simulations
During the integration process of a given product, integrated verification should be conducted via a scaled-up sequence, starting from the assembly level, via the subsystem and system levels, to the SOS level. At each level, certain tests and weak link analysis steps are conducted to verify the identified failures and defects and to determine new failures and defects. The experimental data acquired from these tests can be further implemented to verify the product GQCs and effectiveness at the different levels.
The approach combining physical tests with virtual simulations has been widely applied for verification due to the notable advancement in simulation technology. For instance, verification has usually been conducted via multistress synthesis GQC tests and GQC simulations at the assembly level, highly accelerated life tests, highly accelerated stress screening tests, and virtual prototype-based GQC evaluation simulations at the subsystem level, product-level full-scale tests, multisource data-driven virtual GQC evaluation simulations, and availability simulations at the system level, and real-task tests and virtual effectiveness simulations at the SOS level.
4.4 Operation and maintenance by adopting PHM as the core during operation
The top right side of the V-model indicates the product operation and maintenance by adopting PHM technology as the core. As a notable development and supplement to the current reliability engineering field, this method focuses on the usage phase of the product via an organic integration of GQC. Its concrete implementation includes development, operation, and maintenance stages (
Li et al., 2020). In the development stage, PHM system design and verification are needed, including construction of the PHM index system, establishment of the system configuration with total elements, breakthrough of the key failure detection and prognosis techniques in the space, time and symptom dimensions, and completion of the PHM system on the basis of a variety of tools and methods (
Tian et al., 2015;
Wang et al., 2017). In this process, the development of the PHM system should be suitably coordinated with the design of the product function, performance, and GQCs.
At the operation and maintenance phase, the health status of a product containing a PHM system can be improved or maintained at a high level through failure prediction to plan reasonable maintenance tasks (
Wu et al., 2021) and support resources and advance scheduling on the basis of the concepts of autonomous assurance, task effects, and health status. The maintenance cost can be reduced.
4.5 Multithread closed-loop process management
Many GQC tasks involve the interaction of people, data, activities, and resources, which require effective management. Considering the practical application, three threads should be conducted as follows:
(1) The first thread is conducted on the basis of the allocation, prediction, and quantitative evaluation of the GQCs, reflecting the kernel realization of user requirements for developers, where various quantitative requirements are subject to hard design constraints.
(2) The second thread is conducted on the basis of the implementation and conformance inspection of qualitative GQC criteria. This process reflects the accumulation and reuse of the GQC design experience of the developer, which may effectively improve the GQC level of similar products.
(3) The third thread is conducted on the basis of the closed-loop mitigation of failures. This process is mainly applied to identify, eliminate, and control the consequences of new failures caused by new product principles, processes, materials, and system integration methods.
5 MBRSE platform and RSE promotion
5.1 MBRSE platform
On the basis of the fundamental theory and approaches of MBRSE, the Institute of Reliability Engineering at Beihang University developed the 4th generation of the MBRSE platform by innovating key technologies for the visualization of multilayer and multidimensional GQC data, multidimensional failure data analysis, flexible process instruction chains, full-scale failure recognition, closed-loop M&C, reliability knowledge mapping construction, and mining. This MBRSE platform integrates a digital development environment and unifies technology and management synthesis to reach an international leading level. This platform also includes more than 10 model-driven GQC design software tools to greatly reduce the number of reliability work items and improve the work effectiveness. It integrates a basic GQC knowledge system with storage, mining, and intelligent push functions, and a dynamic visual monitoring and decision-making system to support the realization of GQC requirements within the entire design domain, as shown in Fig. 6.
5.2 RSE promotion
RSE has been practiced in China for more than 25 years. It has undergone a series of processes, including concept change, knowledge improvement, pilot research on typical production, all-round generation, highlighting of key points, rectification, reformation, regulation analysis, rule formulation, operation mechanism, basic foundation construction, platform development and promotion, and capacity building. At present, China has established a complete military standard system for RSE (including 46 standards) and formed a reliability engineering technology system considering a broad production range of design requirements at all stages. It has developed relevant tools and methods, including reliability test and evaluation facilities, detection and screening facilities for components, and software evaluation systems, compiled many engineering databases, produced many RSE engineering practical cases, and trained a group of professional technicians.
RSE has been widely applied in the Chinese military industry, thereby providing a number of typical application scenarios, including prevention-, rectification- and addition-oriented scenarios. Specific cases include aircraft carriers, fighters, and large-scale cargo airplanes. An illustrative example of using MBRSE in the GQC design of an aircraft carrier is provided as follows. In the phase of GQC acquisition, an effectiveness simulation model of the carrier is constructed on the basis of the agent method. The relationships between the carrier’s effectiveness and the GQCs of carrier-based aircrafts and carrier support system are established. After simulation and optimization, the GQC requirements of the carrier-based aircraft and carrier support system are clarified. In the design stage, a data fusion model is established to ensure the sharing, consistency, and traceability of data among different institutions, units, design stages, and characteristics. In the development stage, the GQC requirements of the carrier-based aircraft and carrier support system are decomposed downward to equipment-level products on the basis of the GQC assignment model. At the same time, a unified model of the function/physical structure and reliability is established to realize the collaborative design and optimization design of the function, performance, and reliability of the aircraft carrier at different levels. In the verification stage, the GQC of the carrier-based aircraft is verified at the equipment level, system level, and device level on the basis of the multilevel virtual-real integrated verification model. The verification results are used in weakness detection and design iteration for the aircraft carrier in the design stage. This verification work can reduce the amount of experimental work in the future, resulting in a great reduction in costs and advancement in development. In the usage phase, the PHM technology, intelligent operation, and maintenance technology are used to support aircraft carriers throughout the entire lifecycle. The application of RSE has been gradually extended to civil fields, such as large passenger aircraft, high-speed trains, new energy, and intelligence manufacturing.
6 Conclusions
This study reviewed the development history of RSE in China over the past 30 years. The fundamental theories and technologies of RSE have experienced a typical development process from statistics-based methods to PoF-based methods, and they now occur at a new level that emphasizes collaboration between mathematics and physics in addition to integrated optimization. RSE technology has followed the development direction on interdisciplinary and professional integration and has currently entered the stage of health engineering. In particular, equal attention has been given to “good birth” and “healthcare conditions”, and failures have been adopted as core elements. Health has been adopted as the goal, prevention, diagnosis, and treatment have been adopted as approaches, and the synthetic design of performance and GQCs and a PHM-integrated platform have been adopted as the support.
The technological framework, conceptual and operational models, crucial technologies, and methodology of MBRSE are emphatically introduced. Combined with PHM, MBRSE has transformed the original concept from RCM to intelligent prognosis and health management by adopting health as the goal. This approach has been transformed from the pursuit of the perfect stage without failure to the allowance of disease to a certain extent while ensuring health, which occurs more often in practical situations. Representative directions include but are not limited to the following.
(1) Cross-scale-based synthetic GQC design focusing on the macroscopic effectiveness, microscale failure mechanisms, and intelligent design processes. SOS level: To develop effectiveness simulation analysis and design optimization methods of intelligent SOS. System and subsystem level: To study intelligent GQC design by providing a preliminary model design, intelligent failure identification, and mitigation. Component and part level: Synthesis-based design technology considering multiphysics, multiperspectives, new processes, and new materials.
(2) The reliability digital twin covers the entire process of design, manufacturing, operation, and maintenance, the total lifecycle, and multiple production levels. This process also realizes the synchronous delivery of the reliability digital twin and the physical entity. In the operation and maintenance phase, the system health state can be accurately captured, and dynamic operation and maintenance decision-making can be achieved through the simulation of individual states.
(3) Cognitive computing-based health assessment, diagnosis, and prognosis techniques. These techniques may improve the perception, cognition, and the capability of failure prediction throughout the full lifecycle of equipment.
(4) Government-industry data exchange program in China. A vast amount of raw data can be retrieved by mining with a certain deposition quality, and higher profits can be obtained through data exchange. This condition can be achieved by establishing a quality information exchange platform jointly organized by the military, government, and industry with regular/real-time interactive engineering data, failure experience data, reliability and maintainability data, measurement data, etc.
In the future, RSE technologies and platform will be continuously innovated and promoted in the military and civil fields. They will be capable to promote the normalization, quantification, and optimization ability of reliability work.