To improve transportation capacity, dual overhead crane systems (DOCSs) are playing an increasingly important role in the transportation of large/heavy cargos and containers. Unfortunately, when trying to deal with the control problem, current methods fail to fully consider such factors as external disturbances, input dead zones, parameter uncertainties, and other unmodeled dynamics that DOCSs usually suffer from. As a result, dramatic degradation is caused in the control performance, which badly hinders the practical applications of DOCSs. Motivated by this fact, this paper designs a neural network-based adaptive sliding mode control (SMC) method for DOCS to solve the aforementioned issues, which achieves satisfactory control performance for both actuated and underactuated state variables, even in the presence of matched and mismatched disturbances. The asymptotic stability of the desired equilibrium point is proved with rigorous Lyapunov-based analysis. Finally, extensive hardware experimental results are collected to verify the efficiency and robustness of the proposed method.
Digital technologies are becoming more pervasive and industrial companies are exploiting them to enhance the potentialities related to Prognostics and Health Management (PHM). Indeed, PHM allows to evaluate the health state of the physical assets as well as to predict their future behaviour. To be effective in developing PHM programs, the most critical assets should be identified so to direct modelling efforts. Several techniques could be adopted to evaluate asset criticality; in industrial practice, criticality analysis is amongst the most utilised. Despite the advancement of artificial intelligence for data analysis and predictions, the criticality analysis, which is built upon both quantitative and qualitative data, has not been improved accordingly. It is the goal of this work to propose an ontological formalisation of a multi-attribute criticality analysis in order to i) fix the semantics behind the terms involved in the analysis, ii) standardize and uniform the way criticality analysis is performed, and iii) take advantage of the reasoning capabilities to automatically evaluate asset criticality and associate a suitable maintenance strategy. The developed ontology, called MOCA, is tested in a food company featuring a global footprint. The application shows that MOCA can accomplish the prefixed goals; specifically, high priority assets towards which direct PHM programs are identified. In the long run, ontologies could serve as a unique knowledge base that integrate multiple data and information across facilities in a consistent way. As such, they will enable advanced analytics to take place, allowing to move towards cognitive Cyber Physical Systems that enhance business performance for companies spread worldwide.
Circular economy enables to restore product value at the end of life i.e. when no longer used or damaged. Thus, the product life cycle is extended and this economy permits to reduce waste increase and resources rarefaction. There are several revaluation options (reuse, remanufacturing, recycling, …). So, decision makers need to assess these options to determine which is the best decision. Thus, we will present a study about an End-Of-Life (EoL) decision making which aims to facilitate the industrialization of circular economy. For this, it is essential to consider all variables and parameters impacting the decision of the product trajectory. A first part of the work proposes to identify the variables and parameters impacting the decision making. A second part proposes an assessment approach based on a modeling by Generalized Colored Stochastic Petri Net (GCSPN) and on a Monte-Carlo simulation. The approach developed is tested on an industrial example from the literature to analyze the efficiency and effectiveness of the model. This first application showed the feasibility of the approach, and also the limits of the GCSPN modelling.
Vehicle mass is an important parameter for motion control of intelligent vehicles, but is hard to directly measure using normal sensors. Therefore, accurate estimation of vehicle mass becomes crucial. In this paper, a vehicle mass estimation method based on fusion of machine learning and vehicle dynamic model is introduced. In machine learning method, a feedforward neural network (FFNN) is used to learn the relationship between vehicle mass and other state parameters, namely longitudinal speed and acceleration, driving or braking torque, and wheel angular speed. In dynamics-based method, recursive least square (RLS) with forgetting factor based on vehicle dynamic model is used to estimate the vehicle mass. According to the reliability of each method under different conditions, these two methods are fused using fuzzy logic. Simulation tests under New European Driving Cycle (NEDC) condition are carried out. The simulation results show that the estimation accuracy of the fusion method is around 97%, and that the fusion method performs better stability and robustness compared with each single method.
Autonomous driving has attracted significant research interests in the past two decades as it offers many potential benefits, including releasing drivers from exhausting driving and mitigating traffic congestion, among others. Despite promising progress, lane-changing remains a great challenge for autonomous vehicles (AV), especially in mixed and dynamic traffic scenarios. Recently, reinforcement learning (RL) has been widely explored for lane-changing decision makings in AVs with encouraging results demonstrated. However, the majority of those studies are focused on a single-vehicle setting, and lane-changing in the context of multiple AVs coexisting with human-driven vehicles (HDVs) have received scarce attention. In this paper, we formulate the lane-changing decision-making of multiple AVs in a mixed-traffic highway environment as a multi-agent reinforcement learning (MARL) problem, where each AV makes lane-changing decisions based on the motions of both neighboring AVs and HDVs. Specifically, a multi-agent advantage actor-critic (MA2C) method is proposed with a novel local reward design and a parameter sharing scheme. In particular, a multi-objective reward function is designed to incorporate fuel efficiency, driving comfort, and the safety of autonomous driving. A comprehensive experimental study is made that our proposed MARL framework consistently outperforms several state-of-the-art benchmarks in terms of efficiency, safety, and driver comfort.
Distributed Nash equilibrium seeking of aggregative games is investigated and a continuous-time algorithm is proposed. The algorithm is designed by virtue of projected gradient play dynamics and aggregation tracking dynamics, and is applicable to games with constrained strategy sets and weight-balanced communication graphs. The key feature of our method is that the proposed projected dynamics achieves exponential convergence, whereas such convergence results are only obtained for non-projected dynamics in existing works on distributed optimization and equilibrium seeking. Numerical examples illustrate the effectiveness of our methods.
In this paper, we aim to develop distributed continuous-time algorithms over directed graphs to seek the Nash equilibrium in a noncooperative game. Motivated by the recent consensus-based designs, we present a distributed algorithm with a proportional gain for weight-balanced directed graphs. By further embedding a distributed estimator of the left eigenvector associated with zero eigenvalue of the graph Laplacian, we extend it to the case with arbitrary strongly connected directed graphs having possible unbalanced weights. In both cases, the Nash equilibrium is proven to be exactly reached with an exponential convergence rate. An example is given to illustrate the validity of the theoretical results.
Machine learning and in particular deep learning techniques have demonstrated the most efficacy in training, learning, analyzing, and modelling large complex structured and unstructured datasets. These techniques have recently been commonly deployed in different industries to support robotic and autonomous system (RAS) requirements and applications ranging from planning and navigation to machine vision and robot manipulation in complex environments. This paper reviews the state-of-the-art with regard to RAS technologies (including unmanned marine robot systems, unmanned ground robot systems, climbing and crawler robots, unmanned aerial vehicles, and space robot systems) and their application for the inspection and monitoring of mechanical systems and civil infrastructure. We explore various types of data provided by such systems and the analytical techniques being adopted to process and analyze these data. This paper provides a brief overview of machine learning and deep learning techniques, and more importantly, a classification of the literature which have reported the deployment of such techniques for RAS-based inspection and monitoring of utility pipelines, wind turbines, aircrafts, power lines, pressure vessels, bridges, etc. Our research provides documented information on the use of advanced data-driven technologies in the analysis of critical assets and examines the main challenges to the applications of such technologies in the industry.
In this paper, the constrained Nash equilibrium seeking problem of aggregative games is investigated for uncertain nonlinear Euler-Lagrange (EL) systems under unbalanced digraphs, where the cost function for each agent depends on its own decision variable and the aggregate of all other decisions. By embedding a distributed estimator of the left eigenvector associated with zero eigenvalue of the digraph Laplacian matrix, a dynamic adaptive average consensus protocol is employed to estimate the aggregate function in the unbalanced case. To solve the constrained Nash equilibrium seeking problem, an integrated distributed protocol based on output-constrained nonlinear control and projected dynamics is proposed for uncertain EL players to reach the Nash equilibrium. The convergence analysis is established by using variational inequality technique and Lyapunov stability analysis. Finally, a numerical example in electricity market is provided to validate the effectiveness of the proposed method.
We propose a two-stage reward allocation method with decay using an extension of replay memory to adapt this rewarding method for deep reinforcement learning (DRL), to generate coordinated behaviors for tasks that can be completed by executing a few subtasks sequentially by heterogeneous agents. An independent learner in cooperative multi-agent systems needs to learn its policies for effective execution of its own responsible subtask, as well as for coordinated behaviors under a certain coordination structure. Although the reward scheme is an issue for DRL, it is difficult to design it to learn both policies. Our proposed method attempts to generate these different behaviors in multi-agent DRL by dividing the timing of rewards into two stages and varying the ratio between them over time. By introducing the coordinated delivery and execution problem with an expiration time, where a task can be executed sequentially by two heterogeneous agents, we experimentally analyze the effect of using various ratios of the reward division in the two-stage allocations on the generated behaviors. The results demonstrate that the proposed method could improve the overall performance relative to those with the conventional one-time or fixed reward and can establish robust coordinated behavior.
An industrial robot is a complex mechatronics system, whose failure is hard to diagnose based on monitoring data. Previous studies have reported various methods with deep network models to improve the accuracy of fault diagnosis, which can get an accurate prediction model when the amount of data sample is sufficient. However, the failure data is hard to obtain, which leads to the few-shot issue and the bad generalization ability of the model. Therefore, this paper proposes an attention enhanced dilated convolutional neural network (D-CNN) approach for the cross-axis industrial robotics fault diagnosis method. Firstly, key feature extraction and sliding window are adopted to pre-process the monitoring data of industrial robots before D-CNN is introduced to extract data features. And self-attention is used to enhance feature attention capability. Finally, the pre-trained model is used for transfer learning, and a small number of the dataset from another axis of the multi-axis industrial robot are used for fine-tuning experiments. The experimental results show that the proposed method can reach satisfactory fault diagnosis accuracy in both the source domain and target domain.
Fault diagnosis plays a vital role in assessing the health management of industrial robots and improving maintenance schedules. In recent decades, artificial intelligence-based data-driven approaches have made significant progress in machine fault diagnosis using monitoring data. However, current methods pay less attention to correlations and internal differences in monitoring data, resulting in limited diagnostic performance. In this paper, a data-driven method is proposed for the fault diagnosis of industrial robot reducers, that is, a dual-module attention convolutional neural network (DMA-CNN). This method aims to diagnose the fault state of industrial robot reducer. It establishes two parallel convolutional neural networks with two different attentions to capture the different features related to the fault. Finally, the features are fused to obtain the fault diagnosis results (normal or abnormal). The fault diagnosis effect of the DMA-CNN method and other attention models are compared and analyzed. The effectiveness of the method is verified on a dataset of real industrial robots.
Digital Twins are essential in establishing intelligent asset management for an asset or machine. They can be described as the bidirectional communication between a cyber representation and a physical asset. Predictive Maintenance is dependent on the existence of three data sets: Fault history, Maintenance/Repair History, and Machine Conditions. Current Digital Twin solutions can fail to simulate the behaviour of a faulty asset. These solutions also prove to be difficult to implement when an asset’s fault history is incomplete. This paper presents the novel methodology, LIVE Digital Twin, to develop Digital Twins with the focus of Predictive Maintenance. The four phases, Learn, Identify, Verify, and Extend are discussed. A case study analyzes the relationship of component stiffness and vibration in detecting the health of various components. The Learning phase is implemented to demonstrate the process of locating a preliminary sensor network and develop the faulty history of a Sand Removal Skid assembly. Future studies will consider fewer simplifying assumptions and expand on the results to implement the proceeding phases.
The key to successful product development is better understanding of customer requirements and efficiently identifying the product attributes. In recent years, a growing number of researchers have studied the mining of customer requirements and preferences from online reviews. However, since customer requirements often change dynamically on multi-generation products, most existing studies failed to discover the correlations between customer satisfaction and continuous product improvement. In this work, we propose a novel dynamic customer requirement mining method to analyze the dynamic changes of customer satisfaction of product attributes based on sentiment and attention expressed in online reviews, aiming to better meet customer requirements and provide the direction and content of future product improvement. Specifically, this method is divided into three parts. Firstly, text mining is adopted to collect online review data of multi-generation products and identify product attributes. Secondly, the attention and sentiment scores of product attributes are calculated with a natural language processing tool, and further integrated into the corresponding satisfaction scores. Finally, the improvement direction for next-generation products is determined based on the changing satisfaction scores of multi-generation product attributes. In addition, a case study on multi-generation phone products based on online reviews was conducted to illustrate the effectiveness and practicality of the proposed methodology. Our research completes the field of requirements analysis and provides a new dynamic approach to requirements analysis for continuous improvement of multi-generation products, which can help enterprises to accurately understand customer requirements and improve the effectiveness and efficiency of continuous product improvement.
Predictive maintenance (PdM) cannot only avoid economic losses caused by improper maintenance but also maximize the operation reliability of product. It has become the core of operation management. As an important issue in PdM, the time between failures (TBF) prediction can realize early detection and maintenance of products. The reliability information is the main basis for TBF prediction. Therefore, the main purpose of this paper is to establish an intelligent TBF prediction model for complex mechanical products. The reliability information conversion method is used to solve the problems of reliability information collection difficulty, high collection cost and small data samples in the process of TBF prediction based on reliability information for complex mechanical products. The product reliability information is fully mined and enriched to obtain more reliable and accurate TBF prediction results. Firstly, the Fisher algorithm is employed to convert the reliability information to expand the sample, and the compatibility test is also discussed. Secondly, BP neural network is used to realize the final prediction of TBF, and PSO algorithm is used to optimize the initial weight and threshold of BP neural network to avoid falling into local extreme value and improve the convergence speed. Thirdly, the mean-absolute-percentage-error and the Coefficient of determination are selected to evaluate the performance of the proposed model and method. Finally, a case study of TBF prediction for a remanufactured CNC milling machine tool (XK6032-01) is studied in this paper, and the results show that the feasibility and superiority of the proposed TBF prediction method.
Electrical equipment maintenance is of vital importance to management companies. Efficient maintenance can significantly reduce business costs and avoid safety accidents caused by catastrophic equipment failures. In the current context, predictive maintenance (PdM) is becoming increasingly popular based on machine learning approaches, while its research on electrical equipment such as low-voltage contactors is in its infancy. The failure modes are mainly fusion welding and explosion, and a few are unable to switch on. In this study, a data-driven approach is proposed to predict the remaining useful life (RUL) of the low-voltage contactor. Firstly, the three-phase alternating voltage and current records the life of electrical equipment by tracking the number of times it has been operated. Secondly, the failure-relevant features are extracted by using the time domain, frequency domain, and wavelet methods. Then, a CNN-LSTM network is designed and used to train an electrical equipment RUL prediction model based on the extracted features. An experimental study based on ten datasets collected from low-voltage AC contactors reveals that the proposed method shows merits in comparison with the prevailing deep learning algorithms in terms of MAE and RMSE.
Highly Automated Vehicles (HAVs) are expected to improve the performance of terrestrial transportations by providing safe and efficient travel experience to drivers and passengers. As HAVs will be equipped with different driving automation levels, they should be capable to dynamically adapt their Level of Autonomy (LoA), in order to tackle sudden and recurrent changes in their environment (i.e., inclement weather, complex terrain, unexpected on-road obstacles, etc.). In this respect, HAVs should be able to respond not only on causal reasoning effects, which depend on present and past inputs from the external driving environment, but also on non-causal reasoning situations depending on future states associated with the external driving scene. On the other hand, driver’s personal preferences and profile characteristics should be assessed and managed properly, in order to enhance travel experience. In the light of the above, the present paper aims to tackle these challenges on how cognitive computing enables HAVs to operate each time in the best available LoA by responding quickly to changing environment situations and driver’s preferences. On this basis, an in-vehicle cognitive functionality is introduced, which collects data from various sources (sensor and driver layers), intelligently processing it to the decision-making layer, and finally, selecting the optimal LoA by integrating previous knowledge and experience. The overall approach includes the identification and utilization of a hybrid (data-driven and event-driven) algorithmic process towards reaching intelligent and proactive decisions. An indicative discrete event simulation analysis showcases the efficiency of the developed approach in proactively adapting the vehicle’s LoA.
Increasing energy cost and environmental problems push forward research on energy saving and emission reduction strategy in the manufacturing industry. Energy assessment of machining, as the basis for energy saving and emission reduction, plays an irreplaceable role in engineering service and maintenance for manufacturing enterprises. Due to the complex energy nature and relationships between machine tools, machining parts, and machining processes, there is still a lack of practical energy evaluation methods and tools for manufacturing enterprises. To fill this gap, a serviced-oriented energy assessment system is designed and developed to assist managers in clarifying the energy consumption of machining in this paper. Firstly, the operational requirements of the serviced-oriented energy assessment system are analyzed from the perspective of enterprises. Then, based on the establishment of system architecture, three key technologies, namely data integration, process integration, and energy evaluation, are studied in this paper. In this section, the energy characteristics of machine tools and the energy relationships are studied through the working states of machine tools, machining features of parts and process activities of processes, and the relational database, BPMN 2.0 specification, and machine learning approach are employed to implement the above function respectively. Finally, a case study of machine tool center stand base machining in a manufacturing enterprise was applied to verify the effectiveness and practicality of the proposed approach and system.
Under the background of the fourth industrial revolution driven by the new generation information technology and artificial intelligence, human–robot collaboration has become an important part of smart manufacturing. The new “human–robot–environment” relationship conducts industrial robots to collaborate with workers to adapt to environmental changes harmoniously. How to determine a reasonable human–robot interaction operations allocation strategy is the primary problem, by comprehensively considering the workers’ flexibility and industrial robots’ automation. In this paper, a human–robot collaborative operation framework based on CNC (Computer Number Control) machine tool was proposed, which divided into three stages: pre-machining, machining and post-machining. Then, an action-based granularity decomposition method was used to construct the human–robot interaction hierarchical model. Further, a collaboration effectiveness-based operations allocation function was established through normalizing the time, cost, efficiency, accuracy and complexity of human–robot interaction. Finally, a simulated annealing algorithm was adopted to solve preferable collaboration scheme; a case was used to verify the feasibility and effectiveness of the proposed method. It is expected that this study can provide useful guidance for human–robot interaction operations allocation on CNC machine tools.
Tower cranes find wide use in construction works, in ports and in several loading and unloading procedures met in industry. A nonlinear optimal control approach is proposed for the dynamic model of the 4-DOF underactuated tower crane. The dynamic model of the robotic crane undergoes approximate linearization around a temporary operating point that is recomputed at each time-step of the control method. The linearization relies on Taylor series expansion and on the associated Jacobian matrices. For the linearized state-space model of the system a stabilizing optimal (H-infinity) feedback controller is designed. To compute the controller’s feedback gains an algebraic Riccati equation is repetitively solved at each iteration of the control algorithm. The stability properties of the control method are proven through Lyapunov analysis. The proposed control approach is advantageous because: (i) unlike the popular computed torque method for robotic manipulators, the new control approach is characterized by optimality and is also applicable when the number of control inputs is not equal to the robot’s number of DOFs, (ii) it achieves fast and accurate tracking of reference setpoints under minimal energy consumption by the robot’s actuators, (iii) unlike the popular Nonlinear Model Predictive Control method, the article’s nonlinear optimal control scheme is of proven global stability and convergence to the optimum.
The road is the most commonly used means of transportation and serves as a country’s arteries, so it is extremely important to keep the roads in good condition. Potholes that happen to appear in the road must be repaired to keep the road in good condition. Spotting potholes on the road is difficult, especially in a country like India where roads stretch millions of kilometres across the country. Therefore, there is a need to automate the identification of potholes with high speed and real-time precision. YOLOX is an object detection algorithm and our main goal of this article is to train and analyse the YOLOX model for pothole detection. The YOLOX model is trained with a pothole dataset and the results obtained are analysed by calculating the accuracy, recall and size of the model which is then compared to other YOLO algorithms. The experimental results in this article show that the YOLOX-Nano model predicts potholes with higher accuracy compared to other models while having low computational costs. We were able to achieve an Average Precision (AP) value of 85.6% from training the model and the total size of the model is 7.22 MB. The pothole detection capabilities of the newly developed YOLOX algorithm have never been tested before and this paper is one of the first to detect potholes using the YOLOX object detection algorithm. The research conducted in this paper will help reduce costs and increase the speed of pothole identification and will be of great help in road maintenance.
This article develops a novel approach for multi-objective optimization on the basis of ratio analysis plus the full multiplicative form (MULTIMOORA) using spherical fuzzy sets (SFSs) to obtain proper evaluations. SFSs surpass Pythagorean and intuitionistic fuzzy sets in modeling human cognition since the degree of hesitation is expressed explicitly in a three-dimensional space. In the spherical fuzzy environment, the implementation of the MULTIMOORA encounters two major problems in the aggregation operators and the distance measures that might lead to erroneous results. The extant aggregation operators in some cases can result in a biased evaluation. Therefore, two aggregation functions for SFSs are proposed. These functions guarantee balanced evaluation and avoid false ranking. In the reference point technique, when comparing SFSs, being closer to the ideal solution does not necessarily imply an SFS with a better score. To make up for this drawback, two reference points are employed instead of one, and the distance is not expressed as a crisp value but as an SFS instead. To overcome the disadvantages of the dominance theory in large-scale applications, the results of the three techniques are aggregated to get the overall utility on which the ranking is based. The illustration and validation of the proposed spherical fuzzy MULTIMOORA are examined through two applications, personnel selection, and energy storage technologies selection. The results are compared with the results of other methods to explicate the adequacy of the proposed method and validate the results.
Wafer yield prediction, as the basis of quality control, is dedicated to predicting quality indices of the wafer manufacturing process. In recent years, data-driven machine learning methods have received a lot of attention due to their accuracy, robustness, and convenience for the prediction of quality indices. However, the existing studies mainly focus on the model level to improve the accuracy of yield prediction does not consider the impact of data characteristics on yield prediction. To tackle the above issues, a novel wafer yield prediction method is proposed, in which the improved genetic algorithm (IGA) is an under-sampling method, which is used to solve the problem of data overlap between finished products and defective products caused by the similarity of manufacturing processes between finished products and defective products in the wafer manufacturing process, and the problem of data imbalance caused by too few defective samples, that is, the problem of uneven distribution of data. In addition, the high-dimensional alternating feature selection method (HAFS) is used to select key influencing processes, that is, key parameters to avoid overfitting in the prediction model caused by many input parameters. Finally, SVM is used to predict the yield. Furthermore, experiments are conducted on a public wafer yield prediction dataset collected from an actual wafer manufacturing system. IGA-HAFS-SVM achieves state-of-art results on this dataset, which confirms the effectiveness of IGA-HAFS-SVM. Additionally, on this dataset, the proposed method improves the AUC score, G-Mean and F1-score by 21.6%, 34.6% and 0.6% respectively compared with the conventional method. Moreover, the experimental results prove the influence of data characteristics on wafer yield prediction.
This paper introduces the worldwide history of fully automatic operation (FAO) system in urban rail transit, followed by the development status in China. Then, the architecture and characteristics of the FAO system are described, and the analysis method of system design requirements is proposed based on the human factors engineering. The key technologies are introduced from the aspects of signaling system, vehicle system, communication system, traffic integrated automation system and reliability, availability, maintainability, and safety (RAMS) assurance. Furthermore, based on the independent practical experience of the FAO system, this paper summarizes the management methods for the construction and operation of FAO lines and prospects its future development trends toward a more intelligent urban rail transit system.
Affected by parameter drift and coupling organization, nonlinear dynamical systems exhibit suppressed oscillations. This phenomenon is called amplitude death. In various complex systems, amplitude death is a typical critical phenomenon, which may lead to the functional collapse of the system. Therefore, an important issue is how to effectively predict critical phenomena based on the data in the system oscillation state. This paper proposes an enhanced Informer model to predict amplitude death. The model employs an attention mechanism to capture the long-range associations of the system time series and tracks the effect of parameter drift on the system dynamics through an accompanying parameter input channel. The experimental results based on the coupled Rössler and Lorentz systems show that the enhanced informer has higher prediction accuracy and longer effective prediction distance than the original algorithm and can predict the amplitude death of a system.
In the near future, autonomous vehicles (AVs) may cohabit with human drivers in mixed traffic. This cohabitation raises serious challenges, both in terms of traffic flow and individual mobility, as well as from the road safety point of view. Mixed traffic may fail to fulfill expected security requirements due to the heterogeneity and unpredictability of human drivers, and autonomous cars could then monopolize the traffic. Using multi-agent reinforcement learning (MARL) algorithms, researchers have attempted to design autonomous vehicles for both scenarios, and this paper investigates their recent advances. We focus on articles tackling decision-making problems and identify four paradigms. While some authors address mixed traffic problems with or without social-desirable AVs, others tackle the case of fully-autonomous traffic. While the latter case is essentially a communication problem, most authors addressing the mixed traffic admit some limitations. The current human driver models found in the literature are too simplistic since they do not cover the heterogeneity of the drivers’ behaviors. As a result, they fail to generalize over the wide range of possible behaviors. For each paper investigated, we analyze how the authors formulated the MARL problem in terms of observation, action, and rewards to match the paradigm they apply.