Empowering intelligent quality control with large models: A comprehensive survey of methods, challenges, and perspectives

Xinyu PAN , Di WANG , Fugee TSUNG

Eng. Manag ›› 2026, Vol. 13 ›› Issue (1) : 42 -64.

PDF (3909KB)
Eng. Manag ›› 2026, Vol. 13 ›› Issue (1) :42 -64. DOI: 10.1007/s42524-026-5273-5
Industrial Engineering and Intelligent Manufacturing
REVIEW ARTICLE
Empowering intelligent quality control with large models: A comprehensive survey of methods, challenges, and perspectives
Author information +
History +
PDF (3909KB)

Abstract

Quality control (QC) serves as a cornerstone of modern manufacturing, exerting a decisive influence on production efficiency, product reliability and customer satisfaction. However, traditional QC systems, which largely rely on rule-based frameworks and narrowly defined statistical methods, face increasing limitations in handling the scale, diversity and complexity of contemporary industrial data. This limitation provides a strong motivation to explore the potential of large models (LMs) for advancing QC. Distinguished by their powerful capabilities in knowledge integration, contextual understanding and adaptive reasoning, LMs offer transformative opportunities to modernize QC. This review begins by analyzing why LMs are particularly well positioned to enhance QC, focusing on three crucial dimensions: input alignment, which enables seamless integration of heterogeneous data sources; task adaptability, which supports associative learning across multiple QC tasks and allows knowledge transfer; and augmented intelligence, which supports human experts in complex decision-making. Recent advances in industrial applications are summarized, with particular attention to methodological innovations, deployment practices and integration pathways into manufacturing workflows. To systematically structure the current landscape, the key challenges are categorized into three interrelated dimensions, i.e., data, model and evaluation, which correspond to the core requirements for model training, practical implementation and sustainable adaptability in real-world scenarios. Building on this foundation, the review further outlines future research directions, highlighting secure data collaboration, system-level integration and continual learning under dynamic environments as critical priorities for the next stage of development. Collectively, these insights underscore the promise of LMs in reshaping QC into an intelligent, resilient and future-ready paradigm.

Graphical abstract

Keywords

quality control / large models / foundation models / industrial artificial intelligence survey / manufacturing intelligence

Cite this article

Download citation ▾
Xinyu PAN, Di WANG, Fugee TSUNG. Empowering intelligent quality control with large models: A comprehensive survey of methods, challenges, and perspectives. Eng. Manag, 2026, 13(1): 42-64 DOI:10.1007/s42524-026-5273-5

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Abhilash P M, Luo X, Liu Q, Madarkar R, Walker C, (2024). Towards next-gen smart manufacturing systems: The explainability revolution. npj. Advances in Manufacturing, 1( 1): 8

[2]

Ahmed S, Nielsen I E, Tripathi A, Siddiqui S, Ramachandran R P, Rasool G, (2023). Transformers in time-series analysis: A tutorial. Circuits, Systems, and Signal Processing, 42( 12): 7433–7466

[3]

Alayrac J BDonahue JLuc PMiech ABarr IHasson YLenc KMensch AMillican K, et al. (2022). Flamingo: a visual language model for few-shot learning. In: Proceedings of Advances in Neural Information Processing Systems. New Orleans, USA: Curran Associates, Inc., 35: 23716–23736

[4]

Atrey P K, Hossain M A, El Saddik A, Kankanhalli M S, (2010). Multimodal fusion for multimedia analysis: A survey. Multimedia Systems, 16( 6): 345–379

[5]

Badini S, Regondi S, Frontoni E, Pugliese R, (2023). Assessing the capabilities of ChatGPT to improve additive manufacturing troubleshooting. Advanced Industrial and Engineering Polymer Research, 6( 3): 278–287

[6]

Baltrušaitis T, Ahuja C, Morency L P, (2019). Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41( 2): 423–443

[7]

Bergmann PJin XSattlegger DSteger C (2021). The MVTec 3D-AD dataset for unsupervised 3D anomaly detection and localization. Preprint at arXiv. arXiv:2112.09045

[8]

Bi XChen DChen GChen SDai DDeng CDing HDong KDu QFu Z, et al. (2024). DeepSeek LLM: Scaling open-source language models with longtermism. Preprint at arXiv. arXiv:2107.03374

[9]

Bianchini F, Calamo M, Marinacci M, Rossi J, Mecella M, (2025). Automating industrial quality control: a multimodal LLM and RAG framework for anomaly detection. In: Artificial Intelligence Appiclations and Innovations April. Cham: Springer, 758: 253–266

[10]

Bommasani RHudson D AAdeli EAltman RArora Svon Arx SBernstein M SBohg JBosselut A, et al. (2022). On the opportunities and risks of foundation models. Preprint at arXiv. arXiv:2108.07258

[11]

Brown T BMann BRyder NSubbiah MKaplan JDhariwal PNeelakantan AShyam PSastry , et al. (2020). Language models are few-shot learners. In: Proceedings of Advances in Neural Information Processing Systems December. Vancouver, Canada: Curran Associates, Inc., 33: 1877–1901

[12]

Cao K, Zhang T, Huang J, (2024). Advanced hybrid LSTM-transformer architecture for real-time multi-task prediction in engineering systems. Scientific Reports, 14( 1): 4890

[13]

Cao Y, Xu X, Cheng Y, Sun C, Du Z, Gao L, Shen W, (2025). Personalizing vision-language models with hybrid prompts for zero-shot anomaly detection. IEEE Transactions on Cybernetics, 55( 4): 1917–1929

[14]

Capezza C, Capizzi G, Centofanti F, Lepore A, Palumbo B, (2025). An adaptive multivariate functional EWMA control chart. Journal of Quality Technology, 57( 1): 1–15

[15]

Capezza C, Centofanti F, Lepore A, Palumbo B, (2024). Robust multivariate functional control chart. Technometrics, 66( 4): 531–547

[16]

Caruana R, (1997). Multitask learning. Machine Learning, 28( 1): 41–75

[17]

Chen XDjolonga JPadlewski PMustafa BChangpinyo SWu JRuiz C RGoodman SWang X (2024a). On scaling up a multilingual vision and language model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 14432–14444

[18]

Chen X, Shi S, Ma T, Zhou J, See S, Cheung K C, Li H, (2025). M3net: Multimodal multi-task learning for 3D detection, segmentation, and occupancy prediction in autonomous driving. In: Proceedings of AAAI Conference on Artificial Intelligence. Philadelphia, USA: AAAI Press, 39: 2275–2283

[19]

Chen MTworek JJun HYuan Qde Oliveira Pinto H PKaplan JEdwards HBurda YJoseph N, et al. (2021). Evaluating large language models trained on code. Preprint at arXiv. arXiv:2107.03374

[20]

Chen S, Zhang Y, Yang Q, (2024b). Multi-task learning in natural language processing: An overview. ACM Computing Surveys, 56( 12): 1–32

[21]

Chkirbene ZHamila RGouissem ADevrim U (2024). Large Language Models in Industry: A survey of applications, challenges, and trends. In: Proceedings of the 21st IEEE International Conference on Smart Communities: Improving Quality of Life using AI, Robotics and IoT. Charlotte, USA: IEEE, 229–234

[22]

Costanzino ARamirez P ZLisanti GDi Stefano L (2024). Multimodal industrial anomaly detection by crossmodal feature mapping. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 17234–17243

[23]

Deng J, Liu G, Wang L, Yuan B, Huang H, (2023). Research progress of intelligent optimization design of manufacturing process parameters. Manufacturing Technology & Machine Tool, 0( 5): 74–80

[24]

Devlin J, Chang M W, Lee K, Toutanova K, (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, USA: Association for Computational Linguistics, 1: 4171–4186

[25]

Dosovitskiy ABeyer LKolesnikov AWeissenborn DZhai XUnterthiner TDehghani MMinderer MHeigold GGelly SUszkoreit JHoulsby N (2020). An image is worth 16×16 words: Transformers for image recognition at scale. Preprint at arXiv. arXiv:2010.11929

[26]

Ebadi M, Chenouri S, Lin D K J, H. Steiner S., (2021). Statistical monitoring of the covariance matrix in multivariate processes: A literature review. Journal of Quality Technology, 54( 3): 269–289

[27]

El-Sappagh S, Abuhmed T, Riazul Islam S M, Kwak K S, (2020). Multimodal multitask deep learning model for Alzheimer’s disease progression detection based on time series data. Neurocomputing, 412( 28): 197–215

[28]

Eldele ERagab MQing X EdwardChen ZWu MLi XLee J (2025). UniFault: A fault diagnosis foundation model from bearing data. Preprint at arXiv. arXiv:2504.01373

[29]

Era I ZAhmed ILiu ZDas S (2024). An unsupervised approach towards promptable defect segmentation in laser-based additive manufacturing by segment anything. Preprint at arXiv. arXiv:2312.04063

[30]

Escobar C A, Cantoral-Ceballos J A, Morales-Menendez R, (2025). Quality 4.0: Learning quality control, the evolution of SQC/SPC. Quality Engineering, 37( 1): 92–117

[31]

Fang QXiong GWang FShen ZDong XWang F (2024). Large language models as few-shot defect detectors for additive manufacturing. In: Proceedings of China Automation Congress . Qingdao, IEEE: 6900–6905

[32]

Fatima S S W, Rahimi A, (2024). A review of time-series forecasting algorithms for industrial manufacturing systems. Machines, 12( 6): 380–400

[33]

Gaikwad A, Yavari R, Montazeri M, Cole K, Bian L, Rao P, (2020). Toward the digital twin of additive manufacturing: Integrating thermal simulations, sensing, and analytics to detect process faults. IISE Transactions, 52( 11): 1204–1217

[34]

Gao S, Koker T, Queen O, Hartvigsen T, Tsiligkaridis T, Zitnik M, (2024). UniTS: A unified multi-task time series model. 37: 140589–140631

[35]

Gomaa A H, (2025). RCM 4.0: A novel digital framework for reliability-centered maintenance in smart industrial systems. International Journal of Emerging Science and Engineering, 13( 5): 32–43

[36]

Gu Z, Zhu B, Zhu G, Chen Y, Tang M, Wang J, (2024). AnomalyGPT: Detecting industrial anomalies using large vision-language models. In: Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, 38( 3): 1932–1940

[37]

Guo J, Yan H, Zhang C, (2023). A bayesian partially observable online change detection approach with Thompson sampling. Technometrics, 65( 2): 179–191

[38]

Guo M, Xu T, Liu J, Liu Z, Jiang P, Mu T, Zhang S H, Martin R R, Cheng M M, Hu S M, (2022). Attention mechanisms in computer vision: A survey. Computational Visual Media, 8( 3): 331–368

[39]

Han Y, Zhang F, Li Z, Wang Q, Li C, Lai P, Li T, Teng F, Jin Z, (2025). MT-ConvFormer: A multitask bearing fault diagnosis method using a combination of CNN and Transformer. IEEE Transactions on Instrumentation and Measurement, 74: 3501816

[40]

Ho S L, Xie M, (1998). The use of ARIMA models for reliability forecasting and analysis. Computers & Industrial Engineering, 35( 1–2): 213–216

[41]

Hu T, Zhang J, Yi R, Du Y, Chen X, Liu L, Wang Y, Wang C, (2024). Anomaly diffusion: Few-shot anomaly image generation with diffusion model. In: Proceedings of AAAI Conference on Artificial Intelligence. Vancouver, Canada: AAAI Press, 38: 8526–8534

[42]

Hu Y, Miao X, Si Y, Pan E, Zio E, (2022). Prognostics and health management: A review from the perspectives of design, development and decision. Reliability Engineering & System Safety, 217: 108063

[43]

Huang Y, Wang E H, Liu Z, Pan L, Li H, Liu X, (2023). Modeling task relationships in multivariate soft sensor with balanced mixture-of-experts. IEEE Transactions on Industrial Informatics, 19( 5): 6556–6564

[44]

Huang Y, Zhu J, Zhong X, Deng Y, (2025). SAID: segment all industrial defects with scene prompts. Sensors, 25( 16): 4929

[45]

Jacobs R A, Jordan M I, Nowlan S J, Hinton G E, (1991). Adaptive mixtures of local experts. Neural Computation, 3( 1): 79–87

[46]

Jeong JZou YKim TZhang DRavichandran ADabeer O (2023). Winclip: Zero-/few-shot anomaly classification and segmentation. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE, 19606–19616

[47]

Jia C, Yang Y, Xia Y, Chen Y, Parekh Z, Pham H, Le Q, Sung Y, Li Z, Duerig T, (2021). Scaling up visual and vision-language representation learning with noisy text supervision. In: Proceedings of the 38th International Conference on Machine Learning. PMLR, 139: 4904–4916

[48]

Jiang F, Qin C, Yao K, Fang C, Zhuang F, Zhu H, Xiong H, (2024). Enhancing question answering for enterprise knowledge bases using large language models. In: Proceedings of International Conference on Database Systems for Advanced Applications April. Singapore: Springer, 13941: 273–290

[49]

Jiao T, Guo C, Feng X, Chen Y, Song J, (2024). A comprehensive survey on deep learning multi-modal fusion: methods, technologies and applications. Computers, Materials & Continua, 80( 1): 1–35

[50]

Jignasu AMarshall KGanapathysubramanian BBalu AHegde CKrishnamurthy A (2023). Towards foundational AI models for additive manufacturing: language models for G-code debugging, manipulation, and comprehension. Preprint at arXiv. arXiv:2309.02465

[51]

Jin Q, Jiang Y, Lu X, Liu Y, Chen Y, Gao D, Sun Q, Zhuo C, (2024a). SEM-CLIP: Precise few-shot learning for nanoscale defect detection in scanning electron microscope image. In: Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design. New Jersey: IEEE/ACM, 134: 1–8

[52]

Jin MWang SMa LChu ZZhang JShi XChen PLiang YLi YPan SWen Q (2024b). Time-LLM: Time series forecasting by reprogramming large language models. In: International Conference on Representation Learning. Vienna: OpenReview, 23857–23880

[53]

Kawaharazuka K, Obinata Y, Kanazawa N, Okada K, Inaba M, (2023). Robotic applications of pre-trained vision-language models to various recognition behaviors. In: Proceedings of IEEE-RAS 22nd International Conference on Humanoid Robots. Austin: IEEE, 22: 1–8

[54]

Kim DPark JLee JKim H (2024). Are self-attentions effective for time series forecasting? In: Proceedings of Advances in Neural Information Processing Systems December. Vancouver: Curran Associates, Inc., 37: 114180–114209

[55]

Kim G, Lim H, Kim Y, Kwon O, Choi J H, (2023). Intra-person multi-task learning method for chronic-disease prediction. Scientific Reports, 13( 1): 1069

[56]

Kirillov AMintun ERavi NMao HRolland CGustafson LXiao TWhitehead SBerg ALo WDollar PGirshick R (2023). Segment anything. In: Proceedings of IEEE/CVF International Conference on Computer Vision. Paris, France: IEEE, 4015–4026

[57]

Klingenberg C O, Borges M A V, Antunes J A V Jr, (2019). Industry 4.0 as a data-driven paradigm: A systematic literature review on technologies. Journal of Manufacturing Technology Management, 32( 3): 570–592

[58]

Kottapalli S R KHubli KChandrashekhara SJain GHubli SBotla GDoddaiah R (2025). Foundation models for time series: A survey. Preprint at arXiv. arXiv:2504.04011

[59]

Kouchakzadeh A, ElMaraghy W, (2024). The effect of fault detection, diagnosis, and recovery on resilience in manufacturing systems. International Journal of Advanced Manufacturing Technology, 135( 11–12): 5893–5909

[60]

Kernan Freire S, Wang C, Foosherian M, Wellsandt S, Ruiz-Arenas S, Niforatos E, (2024). Knowledge sharing in manufacturing using LLM-powered tools: User study and model benchmarking. Frontiers in Artificial Intelligence, 7: 1293084

[61]

Kudo TRichardson J (2018). SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Brussels: Association for Computational Linguistics, 66–71

[62]

Lang Q, Tian S, Wang M, Wang J, (2024). Exploring the answering capability of large language models in addressing complex knowledge in entrepreneurship education. IEEE Transactions on Learning Technologies, 17: 2053–2062

[63]

Lee S M, Lee D, Kim Y S, (2019). The quality management ecosystem for predictive maintenance in the industry 4.0 era. International Journal of Quality Innovation, 5( 1): 4

[64]

Lei Y, Yang B, Jiang X, Jia F, Li N, Nandi A K, (2020). Applications of machine learning to machine fault diagnosis: A review and roadmap. Mechanical Systems and Signal Processing, 138: 106587

[65]

Leng J, Li R, Xie J, Zhou X, Li X, Liu Q, Chen X, Shen W, Wang L, (2025). Federated learning-empowered smart manufacturing and product lifecycle management: A review. Advanced Engineering Informatics, 65( Part A): 103179

[66]

Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, Küttler H, Lewis M, Yih W T, Rocktäschel T, Riedel S, Kiela D, (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. 33: 9459–9474

[67]

Li Y, Du J, Jiang W, (2024a). Reinforcement learning for process control with application in semiconductor manufacturing. IISE Transactions, 56( 6): 585–599

[68]

Li STang H (2024). Multimodal alignment and fusion: A survey. Preprint at arXiv. arXiv:2411.17040

[69]

Li Y F, Wang H, Sun M, (2024b). ChatGPT-like large-scale foundation models for prognostics and health management: A survey and roadmaps. Reliability Engineering & System Safety, 243: 109850

[70]

Li J, Xie Y, Tian Y, Yin Z, Sun Z, Zhang W, (2024c). Industrial process fault diagnosis based on video recognition and multi-source information fusion. Chemical Engineering Research & Design, 208: 820–836

[71]

Li YZhao HJiang HPan YLiu ZWu ZShu PTian JYang TXu S, et al. (2024d). Large language models for manufacturing. Preprint at arXiv. arXiv:2410.21418

[72]

Li HZhang QLi WLiang X (2024e). Multi-modal quality prediction algorithm based on anomalous energy tracking attention. In: Proceedings of International Conference on Intelligent Computing. Fuzhou, China: Springer, 150–162

[73]

Li QZhang XHuang JHe HZhang FQin ZChu F (2024f). VSLLaVA: A pipeline of large multimodal foundation model for industrial vibration signal analysis. Preprint at arXiv. arXiv:2409.07482

[74]

Liang X, Zhang M, Feng G, Wang D, Xu Y, Gu F, (2023). Few-shot learning approaches for fault diagnosis using vibration data: a comprehensive review. Sustainability, 15( 20): 14975

[75]

Lim JVogel-Heuser BKovalenko I (2024). Large language model-enabled multi-agent manufacturing systems. In: Proceedings of IEEE 20th International Conference on Automation Science and Engineering August. Lyon, France: IEEE, 3940–3946

[76]

Liu S, Bao J, Zheng P, (2023a). A review of digital twin-driven machining: From digitization to intellectualization. Journal of Manufacturing Systems, 67: 361–378

[77]

Liu X, He P, Chen W, Gao J, (2019). Multi-task deep neural networks for natural language understanding. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, 57: 4487–4496

[78]

Liu FLi GZhao YJin Z (2020). Multi-task learning based pre-trained language model for code completion. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. Melbourne, Australia: IEEE, 473–485

[79]

Liu ZLin YCao YHu HWei YZhang ZLin SGuo B (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of IEEE/CVF International Conference on Computer Vision October. Montreal, Canada: IEEE, 10012–10022

[80]

Liu D, Wang Y, Liu C, Yuan X, Yang C, Gui W, (2023b). Data mode related interpretable transformer network for predictive modeling and key sample analysis in industrial processes. IEEE Transactions on Industrial Informatics, 19( 9): 9325–9336

[81]

Liu D, Wang Y, Liu C, Yuan X, Yang C, (2024). Multirate-Former: An efficient transformer-based hierarchical network for multistep prediction of multirate industrial processes. IEEE Transactions on Instrumentation and Measurement, 73: 2502313

[82]

Lu J, Batra D, Parikh D, Lee S, (2019). ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. 2: 11

[83]

Lukens S, McCabe L H, Gen J, Ali A, (2024). Large Language Model Agents as Prognostics and Health Management Copilots. Annual Conference of the PHM Society, 16( 1): 1–1

[84]

Lv Y, Zhang X, Cheng Y, Lee C, (2024). Intelligent fault diagnosis of machinery based on hybrid deep learning with multi temporal correlation feature fusion. Quality and Reliability Engineering International, 40( 6): 3517–3536

[85]

Madaan ATandon NGupta PHallinan SGao LWiegreffe SAlon UDziri NPrabhumoye S (2023). Self-refine: Iterative refinement with self-feedback. In: Proceedings of Advances in Neural Information Processing Systems. New Orleans, USA: Curran Associates, Inc., 46534–46594

[86]

Mahesh N, Devishamani C S, Raghu K, Mahalingam M, Bysani P, Chakravarthy A V, Raman R, (2024). Advancing healthcare: the role and impact of AI and foundation models. American Journal of Translational Research, 16( 6): 2166–2179

[87]

Masserano L, Ansari A F, Han B, Zhang X, Faloutsos C, Mahoney M W, Wilson A G, Park Y, Rangapuram S, Maddix D C, Wang Y, (2024). Enhancing foundation models for time series forecasting via wavelet-based tokenization. In: Proceedings of the 42nd International Conference on Machine Learning. Vancouver: PMLR, 267: 43248–43275

[88]

McKinney M, Garland A, Cillessen D, Adamczyk J, Bolintineanu D, Heiden M, Fowler E, Boyce B L, (2025). Unsupervised multimodal fusion of in-process sensor data for advanced manufacturing process monitoring. Journal of Manufacturing Systems, 78: 271–282

[89]

Megahed F MChen Y JColosimo B MGrasso M L GJones-Farmer L AKnoth SSun HZwetsloot I (2025). Adapting OpenAI’s CLIP model for few-shot image inspection in manufacturing quality control: An expository case study with multiple application examples. Preprint at arXiv. arXiv:2501.12596

[90]

Megahed F M, Chen Y J, Zwetsloot I M, Knoth S, Montgomery D C, Jones-Farmer L A, (2024). Introducing ChatSQC: Enhancing statistical quality control with augmented AI. Journal of Quality Technology, 56( 5): 474–497

[91]

Moosavi S, Farajzadeh-Zanjani M, Razavi-Far R, Palade V, Saif M, (2024). Explainable AI in manufacturing and industrial cyber–physical systems: A survey. Electronics, 13( 17): 3497

[92]

Nagrani A, Yang S, Arnab A, Jansen A, Schmid C, Sun C, (2021). Attention bottlenecks for multimodal fusion. 34: 14200–14213

[93]

O’Leary D E, (2023). Enterprise large language models: Knowledge characteristics, risks, and organizational activities. Intelligent Systems in Accounting, Finance & Management, 30( 3): 113–119

[94]

Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C L, Mishkin P, Zhang C, Agarwal S, Slama K, et al. (2022). Training language models to follow instructions with human feedback. 35: 27730–27744

[95]

Peng WLi GJiang YWang ZOu DZeng XChen E (2024). Large language model based long-tail query rewriting in Taobao search. In: Proceedings of Companion Proceedings of the ACM Web Conference 2024. Singapore: ACM, 20–28

[96]

Peršak EAnjos M FLautz SKolev A (2024). Multiple-resolution tokenization for time series forecasting with an application to pricing. Preprint at arXiv. arXiv:2407.03185

[97]

Psarommatis F, May G, (2023). A literature review and design methodology for digital twins in the era of zero-defect manufacturing. International Journal of Production Research, 61( 16): 5723–5743

[98]

Qiu P, Xie X, (2022). Transparent sequential learning for statistical process control of serially correlated data. Technometrics, 64( 4): 487–501

[99]

Radford AKim J WHallacy CRamesh AGoh GAgarwal SSastry GAskell AMishkin PClark JKrueger GSutskever I (2021). Learning transferable visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning. Virtual Event: 8748–8763

[100]

Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu P J, (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21( 140): 1–67

[101]

Rasul KAshok AWilliams ARGhonia HBhagwatkar RKhorasani ABayazi MJAdamopoulos GRiachi RHassen NBiloš M (2023). Lag-llama: Towards foundation models for probabilistic time series forecasting. Preprint at arXiv. arXiv:2310.08278

[102]

Sahoo PMeharia PGhosh ASaha SJain VChadha A (2024). A comprehensive survey of hallucination in large language, image, video and audio foundation models. In: Findings of the Association for Computational Linguistics: EMNLP November. Miami: Association for Computational Linguistics, 11709–11724

[103]

Schuster MNakajima K (2012). Japanese and Korean voice search. In: Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing March. Kyoto: IEEE, 5149–5152

[104]

Sennrich RHaddow BBirch A (2016). Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics August. Berlin: Association for Computational Linguistics, 1715–1725

[105]

Singhal K, Azizi S, Tu T, Mahdavi S S, Wei J, Chung H W, Scales N, Tanwani A, Cole-Lewis H, et al. (2023). Large language models encode clinical knowledge. Nature, 620( 7972): 172–180

[106]

Singhal K, Tu T, Gottweis J, Sayres R, Wulczyn E, Amin M, Hou L, Clark K, Pfohl S R., (2025). Toward expert-level medical question answering with large language models. Nature Medicine, 31( 3): 943–950

[107]

Song K, Cui W, Yu H, Li X, Yan Y, (2024). SAM Era: Can it segment any industrial surface defects. Computers, Materials & Continua, 78( 3): 3953–3969

[108]

Su JJiang CJin XQiao YXiao TMa HWei RJing ZXu JLin J (2024). Large language models for forecasting and anomaly detection: A systematic literature review. Preprint at arXiv. arXiv:2402.10350

[109]

Sun JLiao Q VMuller MAgarwal MHoude STalamadupula KWeisz D (2022). Investigating explainability of generative AI for code through scenario-based design. In: Proceedings of the 27th International Conference on Intelligent User Interfaces March. Helsinki, Finland: ACM, 212–228

[110]

Talukder SYue YGkioxari G (2024). TOTEM: Tokenized time series embeddings for general time series analysis. Preprint at arXiv. arXiv:2402.16412

[111]

Tan HBansal M (2019). LXMERT: Learning cross-modality encoder representations from transformers. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing November. Hong Kong: Association for Computational Linguistics, 5100–5111

[112]

Tercan H, Meisen T, (2022). Machine learning and deep learning based predictive quality in manufacturing: A systematic review. Journal of Intelligent Manufacturing, 33( 7): 1879–1905

[113]

Tsai Y HBai SLiang P PKolter J ZMorency LSalakhutdinov R (2019). Multimodal transformer for unaligned multimodal language sequences. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: Association for Computational Linguistics, 6558–6569

[114]

Vaswani AShazeer NParmar NUszkoreit JJones LGomez A NKaiser LPolosukhin I (2017). Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems December. California: Curran Associates, 4–9: 5998–6008

[115]

van Dinter R, Tekinerdogan B, Catal C, (2022). Predictive maintenance using digital twins: A systematic literature review. Information and Software Technology, 151: 107008

[116]

Wang C, Zhu H, Peng J, Wang Y, Yi R, Wu Y, Ma L, Zhang J, (2025). M3DM-NR: RGB-3D noisy-resistant industrial anomaly detection via multimodal denoising. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47( 11): 9981–9993

[117]

Wang HLi YXie M (2024a). Empowering ChatGPT-like large-scale language models with local knowledge base for industrial prognostics and health management. Preprint at arXiv. arXiv:2312.14945

[118]

Wang P, Qu H, Zhang Q, Xu X, Yang S, (2023a). Production quality prediction of multistage manufacturing systems using multi-task joint deep learning. Journal of Manufacturing Systems, 70: 48–68

[119]

Wang X, Chen G, Qian G, Gao P, Wei X Y, Wang Y, Tian Y, Gao W., (2023b). Large-scale multi-modal pre-trained models: A comprehensive survey. Machine Intelligence Research, 20( 4): 447–482

[120]

Wang XZhang XCao YWang WShen CHuang T (2023c). SegGPT: Segmenting everything in context. Preprint at arXiv. arXiv:2304.03284

[121]

Wang Y, Dai R, Liu D, Wang K, Yuan X, Liu C, (2024b). A task-oriented deep learning framework based on target-related transformer network for industrial quality prediction applications. Engineering Applications of Artificial Intelligence, 133( Part D): 108361

[122]

Wen Q, Zhou T, Zhang C, Chen W, Ma Z, Yan J, Sun L, (2022). Transformers in time series: A survey. In: Proceedings of the 39th International Conference on Machine Learning July. Baltimore, USA: ACM, 39: 1–15

[123]

Woodall W H, Montgomery D C, (2014). Some current directions in the theory and application of statistical process monitoring. Journal of Quality Technology, 46( 1): 78–94

[124]

Wu W, Peng W, Liu J, Li X, Zhang D, Sun J, (2025a). An attention-based weight adaptive multi-task learning framework for slab head shape prediction and optimization during the rough rolling process. Journal of Manufacturing Processes, 133( 17): 408–429

[125]

Wu G, Zhang Y, Deng L, Zhang J, Chai T, (2025b). Cross-modal learning for anomaly detection in complex industrial processes: Methodology and benchmark. IEEE Transactions on Circuits and Systems for Video Technology, 35( 3): 2632–2645

[126]

Wu L, Zheng Z, Qiu Z, Wang H, Gu H, Shen T, Qin C, Zhu C, Zhu H, Liu Q, Xiong H, Chen E, (2024). A survey on large language models for recommendation. World Wide Web (Bussum), 27( 5): 60

[127]

Wu Y, Meng Y, Shao C, (2022). End-to-end online quality prediction for ultrasonic metal welding using sensor fusion and deep learning. Journal of Manufacturing Processes, 83: 685–694

[128]

Xie Z, Chen J, Feng Y, Zhang K, Zhou Z, (2022). End to end multi-task learning with attention for multi-objective fault diagnosis under small sample. Journal of Manufacturing Systems, 62: 301–316

[129]

Xu D, Chen W, Peng W, Zhang C, Xu T, Zhao X, Wu X, Zheng Y, Wang Y, Chen E, (2024). Large language models for generative information extraction: A survey. Frontiers of Computer Science, 18( 6): 186357

[130]

Xu Q, Qiu F, Zhou G, Zhang C, Ding K, Chang F, Lu F, Yu Y, Ma D, Liu J, (2025). A large language model-enabled machining process knowledge graph construction method for intelligent process planning. Adv Eng Inform, 65( Part B): 103244

[131]

Yan H, Sergin N D, Brenneman W A, Lange S J, Ba S, (2021). Deep multistage multi-task learning for quality prediction of multistage manufacturing systems. Journal of Quality Technology, 53( 5): 526–544

[132]

Yang TChang LYan JLi JWang ZZhang K (2025). A survey on foundation-model-based industrial defect detection. Preprint at arXiv. arXiv:2502.19106

[133]

Yang X, Zhang C, (2024). Online directed-structural change-point detection: A segment-wise time-varying dynamic Bayesian network approach. IISE Transactions, 56( 5): 527–540

[134]

Yang YGao RTang YAntic S LDeppen SHuo YSandler KMassion PLandman B (2020). Internal-transfer weighting of multi-task learning for lung cancer detection. In: Proceedings of SPIE International Society for Optics and Photonics. Houston, USA: SPIE, 10

[135]

Yin S, Ding S X, Xie X, Luo H, (2014). A review on basic data-driven approaches for industrial process monitoring. IEEE Transactions on Industrial Electronics, 61( 11): 6418–6428

[136]

Yu Y, Xue J, Dai S, Bao Q, Zhao G, (2021). Quality prediction and process parameter optimization method for machining parts. Journal of Zhejiang University (Engineering Science), 55( 3): 44–51

[137]

Yuan XLin ZKuen JZhang JWang YMaire MKale AFaieta B (2021). Multimodal contrastive training for visual representation learning. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition June. Nashville, USA: IEEE, 6995–7004

[138]

Zajec P, Rožanec J M, Theodoropoulos S, Fontul M, Koehorst E, Fortuna B, Mladenić D, (2024). Few-shot learning for defect detection in manufacturing. International Journal of Production Research, 62( 19): 6979–6998

[139]

Zhai XWang XMustafa BSteiner AKeysers DKolesnikov ABeyer L (2022). LiT: Zero-shot transfer with locked-image text tuning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition June. New Orleans: IEEE, 18102–18112

[140]

Zhang H, Dereck S S, Wang Z, Lv X, Xu K, Wu L, Jia Y, Wu J, Long Z, Liang W, Ma X G, Zhuang R, (2026). Large scale foundation models for intelligent manufacturing applications: A survey. J Intell Manuf, 37: 119–170

[141]

Zhang Y, Yang Q, (2022). A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering, 34( 12): 5586–5609

[142]

Zhao F, Zhang C, Geng B, (2024). Deep multimodal data fusion. ACM Computing Surveys, 56( 9): 1–36

[143]

Zhong RLee KZhang ZKlein D (2021). Adapting language models for zero-shot learning by meta-tuning on dataset and prompt collections. In: Findings of the Association for Computational Linguistics: EMNLP 2021 November. Punta Cana: Association for Computational Linguistics, 2856–2878

[144]

Zhou B, Li X, Liu T, Xu K, Liu W, Bao J, (2024). CausalKGPT: Industrial structure causal knowledge-enhanced large language model for cause analysis of quality problems in aerospace product manufacturing. Advanced Engineering Informatics, 59: 102333

[145]

Zhou L, Wang H, (2024). An adaptive multi-scale feature fusion and adaptive mixture-of-experts multi-task model for industrial equipment health status assessment and remaining useful life prediction. Reliability Engineering & System Safety, 248: 110190

[146]

Zhu X, Zhang R, He B, Guo Z, Zeng Z, Qin Z, Zhang S, Gao P, (2023). PointCLIP v2: Prompting CLIP and GPT for powerful 3D open-world learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision October. Paris: IEEE, 2023: 2639–2650

[147]

Zuo ZDong JWu YQu YWu Z (2024). Clip3D-AD: Extending CLIP for 3D few-shot anomaly detection with multi-view images generation. Preprint at arXiv. arXiv:2406.18941

RIGHTS & PERMISSIONS

Higher Education Press

PDF (3909KB)

1458

Accesses

0

Citation

Detail

Sections
Recommended

/