Safe integration of Large Language Models into industrial process control: a multi-agent architecture with P&ID-grounded validation

Daniel Schall

doi:10.1007/s43684-026-00136-1

Autonomous Intelligent Systems ›› 2026, Vol. 6 ›› Issue (1) :14 DOI: 10.1007/s43684-026-00136-1

Original Article

research-article

Safe integration of Large Language Models into industrial process control: a multi-agent architecture with P&ID-grounded validation

Daniel Schall ¹^,^a

Author information +

History +

PDF

Abstract

Large Language Models (LLMs) offer powerful reasoning capabilities for industrial process control, yet their non-deterministic nature, susceptibility to hallucination, and lack of intrinsic physical understanding make direct deployment in safety-critical environments unacceptable. This paper addresses five research questions on safely integrating LLM-based reasoning into industrial process automation through the Autonomous Action Execution (AAE) framework. For safe architectural integration (RQ1), we present a four-layer multi-agent architecture that confines LLM inference to an observation-only Monitor layer while safety-critical decisions are made by deterministic Verification and Execution agents. For structuring heterogeneous plant data (RQ2), we introduce a text-level aggregation framework with pluggable analyzers that transforms SCADA states, time-series measurements, Piping and Instrumentation Diagrams (P&IDs), and Standard Operating Procedures (SOPs) into contextually rich documents for LLM consumption. For automated validation (RQ3), a P&ID-grounded method uses graph traversal over the P&ID topology to verify physical consistency of LLM-generated proposals, checking tag existence, actuatability, fail-state consistency, and downstream impact. For quantifiable context enrichment (RQ4), a graduated baseline comparison (B0–B3) demonstrates the incremental value of each pipeline component. For cross-domain generalisability (RQ5), evaluation across five industrial scenarios—three derived from the Tennessee Eastman Process (TEP) benchmark (Downs & Vogel, 1993) providing community-standard validation, plus two retained scenarios (PolyReactor, Dryer) that establish performance boundaries from best-case (zero hallucination) to worst-case (70% safety violations)—demonstrates portability of the framework across continuous and batch processes of varying P&ID complexity within the evaluated synthetic scenarios. An error injection study across 43 crafted proposals demonstrates that the validation layer achieves 100% recall (zero false negatives) on the covered failure modes (P&ID-grounded structural checks together with the scenario-specific forbidden-action rubric). A statistical robustness study (N = 50 LLM runs) shows that even when LLMs propose unsafe actions in 10%–70% of runs, the deterministic validation layer catches every invalid proposal in the covered categories (structurally invalid or forbidden-action) under the evaluated scenarios. The core finding is that architecturally constrained advisory integration of LLMs in process industries is fundamentally an architectural challenge, and that established systems engineering principles of separation of concerns, independent protection layers, and deterministic safety logic provide deterministic checks against the covered validation-layer failure modes (structural and rubric-defined) when deploying LLM-based advisory systems in process industries.

Keywords

Multi-agent systems / Large Language Models / Industrial process control / Functional safety / P&ID validation / SCADA / Tennessee Eastman Process

Cite this article

Download citation ▾

Daniel Schall. Safe integration of Large Language Models into industrial process control: a multi-agent architecture with P&ID-grounded validation. Autonomous Intelligent Systems, 2026, 6 (1) : 14 DOI:10.1007/s43684-026-00136-1

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Khan A., Nahar R., Chen H., Flores G.E.C., Li C.. FaultExplainer: leveraging large language models for interpretable fault detection and diagnosis. Comput. Chem. Eng., 2025, 199. ArticleID: 109152

[2]	Alnegheimish S., Nguyen L., Berti-Equille L., Veeramachaneni K.. Can large language models be anomaly detectors for time series?. Proc. IEEE 11th Int. Conf. Data Sci. Adv. Analytics (DSAA), 20241-10

[3]	Sun Q., Li Y., Zhou C., Tian Y.-C.. Root cause analysis for industrial process anomalies through the integration of knowledge graph and large language model. Proc. 43rd Chinese Control Conf. (CCC), 20246855-6860

[4]	Ji Z., Lee N., Frieske R., Yu T., Su D., Xu Y., Ishii E., Bang Y.J., Madotto A., Fung P.. Survey of hallucination in natural language generation. ACM Comput. Surv., 2023, 55(12): 248.

[5]	International Electrotechnical Commission. IEC 61511 Functional Safety — Safety Instrumented Systems for the Process Industry Sector, 2016, 2, Geneva, IEC

[6]	Dowell A.M.III. Layer of protection analysis for determining safety integrity level. ISA Trans., 1998, 37(3): 155-165.

[7]	OASIS, MQTT Version 5.0. OASIS Standard (2019). https://docs.oasis-open.org/mqtt/mqtt/v5.0/mqtt-v5.0.html

[8]	Brussel H.V., Wyns J., Valckenaers P., Bongaerts L., Peeters P.. Reference architecture for holonic manufacturing systems: PROSA. Comput. Ind., 1998, 37(3): 255-274.

[9]	Leitão P., Restivo F.. ADACOR: a holonic architecture for agile and adaptive manufacturing control. Comput. Ind., 2006, 57(2): 121-130.

[10]	Leitão P., Karnouskos S., Ribeiro L., Lee J., Strasser T., Colombo A.W.. Smart agents in industrial cyber-physical systems. Proc. IEEE, 2016, 104(5): 1086-1101.

[11]	International Electrotechnical Commission. IEC 62264: Enterprise-Control System Integration, 2013. Geneva, IEC

[12]	Yao S., Zhao J., Yu D., Du N., Shafran I., Narasimhan K., Cao Y.. ReAct: synergizing reasoning and acting in language models. Proc. 11th Int. Conf. Learning Representations (ICLR), 2023

[13]	Lewis P., Perez E., Piktus A., Petroni F., Karpukhin V., Goyal N., Küttler H., Lewis M., Yih W., Rocktäschel T., Riedel S., Kiela D.. Retrieval-augmented generation for knowledge-intensive NLP tasks. Adv. Neural Inf. Process. Syst., 2020, 33(eurIPS): 9459-9474

[14]	Rebedea T., Dinu R., Sreedhar M.N., Parisien C., Cohen J.. NeMo Guardrails: a toolkit for controllable and safe LLM applications with programmable rails. Proc. Conf. Empirical Methods in Natural Language Processing: System Demonstrations (EMNLP), 2023431-445

[15]	International Society of Automation. ISA-5.1: Instrumentation Symbols and Identification, 2009ISA, Research Triangle Park

[16]	International Organization for Standardization. ISO 15926: Industrial Automation Systems and Integration — Integration of Life-Cycle Data for Process Plants, 2003. Geneva, ISO

[17]	Theissler A., Pérez-Velázquez J., Kettelgerdes M., Elger G.. Predictive maintenance enabled by machine learning: use cases and challenges in the automotive industry. Reliab. Eng. Syst. Saf., 2021, 215. ArticleID: 107864

[18]	Ahmed I., Jeon G., Piccialli F.. From artificial intelligence to explainable artificial intelligence in Industry 4.0: a survey on what, how, and where. IEEE Trans. Ind. Inform., 2022, 18(8): 5031-5042.

[19]	Zio E., Miqueles L.. Digital twins in safety analysis, risk assessment and emergency management. Reliab. Eng. Syst. Saf., 2024, 246. ArticleID: 110040

[20]	Parasuraman R., Riley V.. Humans and automation: use, misuse, disuse, abuse. Hum. Factors, 1997, 39(2): 230-253.

[21]	Al-Fuqaha A., Guizani M., Mohammadi M., Aledhari M., Ayyash M.. Internet of Things: a survey on enabling technologies, protocols, and applications. IEEE Commun. Surv. Tutor., 2015, 17(4): 2347-2376.

[22]	Dizdarević J., Carpio F., Jukan A., Masip-Bruin X.. A survey of communication protocols for Internet of Things and related challenges of fog and cloud computing integration. ACM Comput. Surv., 2019, 51(6): 116.

[23]	Ghosh S., Sampalli S.. A survey of security in SCADA networks: current issues and future challenges. IEEE Access, 2019, 7: 135812-135831.

[24]	Bass L., Clements P., Kazman R.. Software Architecture in Practice, 2003, 2, Boston, Addison-Wesley

[25]	Downs J.J., Vogel E.F.. A plant-wide industrial process control problem. Comput. Chem. Eng., 1993, 17(3): 245-255.

[26]	Qwen Team, Qwen2.5-VL: a frontier multimodal model for understanding and interacting with the world (2024). arXiv preprint. arXiv:2502.13923

[27]	Bach T.A., Kristiansen J.K., Babic A., Jacovi A.. Unpacking human-AI interaction in safety-critical industries: a systematic literature review. IEEE Access, 2024, 12: 106385-106414.

[28]	International Electrotechnical Commission. IEC 61508 Functional Safety of Electrical/Electronic/Programmable Electronic Safety-Related Systems, 2010, 2, Geneva, IEC