Semantic and lexical analysis of pre-trained vision language artificial intelligence models for automated image descriptions in civil engineering

Pedram Bazrafshan , Kris Melag , Arvin Ebrahimkhanlou

AI in Civil Engineering ›› 2025, Vol. 4 ›› Issue (1)

PDF
AI in Civil Engineering ›› 2025, Vol. 4 ›› Issue (1) DOI: 10.1007/s43503-025-00063-9
Original Article
research-article

Semantic and lexical analysis of pre-trained vision language artificial intelligence models for automated image descriptions in civil engineering

Author information +
History +
PDF

Abstract

This paper investigates the application of pre-trained Vision-Language Models (VLMs) for describing images from civil engineering materials and construction sites, with a focus on construction components, structural elements, and materials. The novelty of this study lies in the investigation of VLMs for this specialized domain, which has not been previously addressed. As a case study, the paper evaluates ChatGPT-4v’s ability to serve as a descriptor tool by comparing its performance with three human descriptions (a civil engineer and two engineering interns). The contributions of this work include adapting a pre-trained VLM to civil engineering applications without additional fine-tuning and benchmarking its performance using both semantic similarity analysis (SentenceTransformers) and lexical similarity methods. Utilizing two datasets—one from a publicly available online repository and another manually collected by the authors—the study employs whole-text and sentence pair-wise similarity analyses to assess the model’s alignment with human descriptions. Results demonstrate that the best-performing model achieved an average similarity of 76% (4% standard deviation) when compared to human-generated descriptions. The analysis also reveals better performance on the publicly available dataset.

Keywords

Vision language models / Artificial intelligence / Image description / Pre-Trained Transformers / Civil engineering / Digital twin

Cite this article

Download citation ▾
Pedram Bazrafshan, Kris Melag, Arvin Ebrahimkhanlou. Semantic and lexical analysis of pre-trained vision language artificial intelligence models for automated image descriptions in civil engineering. AI in Civil Engineering, 2025, 4(1): DOI:10.1007/s43503-025-00063-9

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

AlayracJ-B, DonahueJ, LucP, MiechA, BarrI, HassonY, LencK, MenschA, MillicanK, ReynoldsM, RingR, RutherfordE, CabiS, HanT, GongZ, SamangooeiS, MonteiroM, MenickJ, BorgeaudS, SimonyanK. Flamingo: A visual language model for few-shot learning. Advances in Neural Information Processing Systems, 2022, 3523716

[2]

AlexanderQG, HoskereV, NarazakiY, MaxwellA, SpencerBF. Fusion of thermal and RGB images for automated deep learning based crack detection in civil infrastructure. AI in Civil Engineering, 2022, 113.

[3]

Azizi ZadeF, EbrahimkhanlouA. Point clouds to as-built two-node wireframe digital twin: A novel method to support autonomous robotic inspection. Autonomous Intelligent Systems, 2024, 4125.

[4]

Baechler, G., Sunkara, S., Wang, M., Zubach, F., Mansoor, H., Etter, V., Cărbune, V., Lin, J., Chen, J. & Sharma, A. (2024). ScreenAI: A Vision-Language Model for UI and Infographics Understanding. http://arxiv.org/abs/2402.04615

[5]

Banerjee, S. & Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. Proceedings of the Acl Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, 65–72. https://aclanthology.org/W05-0909.pdf

[6]

Bazrafshan, P., On, T. & Ebrahimkhanlou, A. (2022). A computer vision-based crack quantification of reinforced concrete shells using graph theory measures. In D. Zonta, Z. Su & B. Glisic (Eds.), Sensors and Smart Structures Technologies for Civil, Mechanical, and Aerospace Systems 2022 (Vol. 12046, p. 25). SPIE. https://doi.org/10.1117/12.2612359

[7]

Bazrafshan, P. & Ebrahimkhanlou, A. (2023). A virtual-reality framework for graph-based damage evaluation of reinforced concrete structures. In P. J. Shull, T. Yu, A. L. Gyekenyesi & H. F. Wu (Eds.), Nondestructive Characterization and Monitoring of Advanced Materials, Aerospace, Civil Infrastructure, and Transportation XVII (Vol. 12487, p. 5). SPIE. https://doi.org/10.1117/12.2657736

[8]

Bazrafshan, P. & Ebrahimkhanlou, A. (2024). Detection of cracking mechanism transition on reinforced concrete shear walls using graph theory. In P. J. Shull, T. Yu, A. L. Gyekenyesi & H. F. Wu (Eds.), Nondestructive Characterization and Monitoring of Advanced Materials, Aerospace, Civil Infrastructure, and Transportation XVIII (Vol. 12950, p. 28). SPIE. https://doi.org/10.1117/12.3011092

[9]

BazrafshanP, OnT, BaserehS, OkumusP, EbrahimkhanlouA. A graph-based method for quantifying crack patterns on reinforced concrete shear walls. Computer-Aided Civil and Infrastructure Engineering, 2024, 39(4): 498-517.

[10]

ChenR, ZhouC, ChengL. Computer-vision-guided semi-autonomous concrete crack repair for infrastructure maintenance using a robotic arm. AI in Civil Engineering, 2022, 119.

[11]

ChengX, WangC, LiangF, WangH, YuXB. A preliminary investigation on enabling digital twin technology for operations and maintenance of urban underground infrastructure. AI in Civil Engineering, 2024, 314.

[12]

DriessenT, DodouD, BazilinskyyP, de WinterJ. Putting ChatGPT vision (GPT-4V) to the test: risk perception in traffic images. Royal Society Open Science, 2024.

[13]

Ghadimzadeh AlamdariA, EbrahimkhanlouA. A multi-scale robotic approach for precise crack measurement in concrete structures. Automation in Construction, 2024, 158105215.

[14]

Ghadimzadeh AlamdariA, ZadeFA, EbrahimkhanlouA. A review of simultaneous localization and mapping for the robotic-based nondestructive evaluation of infrastructures. Sensors, 2025, 253712.

[15]

HamidiaM, KaboodkhaniM, BayestehH. Vision-oriented machine learning-assisted seismic energy dissipation estimation for damaged RC beam-column connections. Engineering Structures, 2024, 301117345.

[16]

IchiE, DorafshanS. Evaluation of infrared thermography dataset for delamination detection in reinforced concrete bridge decks. Applied Sciences, 2024, 1462455.

[17]

JohnsonOV, Mohammed AlyasiriO, AkhtomD, JohnsonOE. Image analysis through the lens of ChatGPT-4. Journal of Applied Artificial Intelligence, 2023, 4(2): 31-46.

[18]

JungY, ChoI, HsuS-H, Golparvar-FardM. VisualSiteDiary: A detector-free vision-language transformer model for captioning photologs for daily construction reporting and image retrievals. Automation in Construction, 2024, 165105483.

[19]

JungY, HockenmaierJ, Golparvar-FardM. Transformer language model for mapping construction schedule activities to uniformat categories. Automation in Construction, 2024, 157105183.

[20]

LattanziD, MillerG. Review of robotic infrastructure inspection systems. Journal of Infrastructure Systems, 2017, 23304017004.

[21]

Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out, 74–81. https://aclanthology.org/W04-1013.pdf

[22]

LinJJ, IbrahimA, SarwadeS, Golparvar-FardM. Bridge inspection with aerial robots: automating the entire pipeline of visual data capture, 3D mapping, defect detection, analysis, and reporting. Journal of Computing in Civil Engineering, 2021, 35204020064.

[23]

LuleciF, CatbasFN. A brief introductory review to deep generative models for civil structural health monitoring. AI in Civil Engineering, 2023, 219.

[24]

MaibaumF, KriebelJ, FoegeJN. Selecting textual analysis tools to classify sustainability information in corporate reporting. Decision Support Systems, 2024, 183114269.

[25]

Microsoft Corporation. (2024). Bing Chat. https://www.bing.com/chat. Accessed 23 July 2024

[26]

MomtazM, LiT, HarrisDK, LattanziD. Multi-modal deep fusion for bridge condition assessment. Journal of Infrastructure Intelligence and Resilience, 2023, 24100061.

[27]

Office of the Federal Register, National Archives and Records Administration. (2023). DCPD-202300949 - Executive Order 14110-Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. Govinfo.Gov. https://www.govinfo.gov/app/details/DCPD-202300949. Accessed 2 July 2024

[28]

OpenAI. (2023). GPT-4V(ision) system card. https://cdn.openai.com/papers/GPTV_System_Card.pdf. Accessed 23 July 2024

[29]

Papineni, K., Roukos, S., Ward, T., Zhu, W.-J. (2002). Bleu: a method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 311–318. https://aclanthology.org/P02-1040.pdf

[30]

RadfordA, KimJW, HallacyC, RameshA, GohG, AgarwalS, SastryG, AskellA, MishkinP, ClarkJ, KruegerG, SutskeverI. Learning transferable visual models from natural language supervision. Proceedings of Machine Learning Research, 2021, 139: 8748-8763

[31]

Reimers, N., Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 3980–3990. https://doi.org/10.18653/v1/D19-1410

[32]

Roboflow: Computer vision tools for developers and enterprises. (2024). https://roboflow.com/

[33]

RogageK, MahamediE, BrilakisI, KassemM. Beyond digital shadows: a digital twin for monitoring earthwork operation in large infrastructure projects. AI in Civil Engineering, 2022, 117.

[34]

Schroeppel, K. (2010). Larimer Street Improvements Update. DenverInfill. https://denverinfill.com/2010/11/larimer-street-improvements-update.htmlDate. Accessed 2 July 2024

[35]

SongY, XuX, DuttaK, LiZ. Improving answer quality using image-text coherence on social Q&A sites. Decision Support Systems, 2024, 180114191.

[36]

SpencerBF, HoskereV, NarazakiY. Advances in computer vision-based civil infrastructure inspection and monitoring. Engineering, 2019, 5(2): 199-222.

[37]

SuS, ZhongRY, JiangY, SongJ, FuY, CaoH. Digital twin and its potential applications in construction industry: State-of-art review and a conceptual framework. Advanced Engineering Informatics, 2023, 57102030.

[38]

TangW, JahanshahiMR. Active perception based on deep reinforcement learning for autonomous robotic damage inspection. Machine Vision and Applications, 2024, 355110.

[39]

Vaswani, A., Brain, G., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. https://doi.org/10.48550/arXiv.1706.03762. https://user.phil.hhu.de/~cwurm/wp-content/uploads/2020/01/7181-attention-is-all-you-need.pdf. Accessed 2 July 2024

[40]

Wang, Z., Yu, J., Yu, A. W., Dai, Z., Tsvetkov, Y., Cao, Y. (2021). SimVLM: Simple Visual Language Model Pretraining with Weak Supervision. International Conference on Learning Representations. http://arxiv.org/abs/2108.10904

[41]

WangS, RodgersC, ZhaiG, MatikiTN, WelshB, NajafiA, WangJ, NarazakiY, HoskereV, SpencerBF. A graphics-based digital twin framework for computer vision-based post-earthquake structural inspection and evaluation using unmanned aerial vehicles. Journal of Infrastructure Intelligence and Resilience, 2022, 11100003.

[42]

WangY, XiaoB, BoufergueneA, Al-HusseinM, LiH. Content-based image retrieval for construction site images: leveraging deep learning-based object detection. Journal of Computing in Civil Engineering, 2023, 37604023035.

[43]

WangY, XiaoB, BoufergueneA, Al-HusseinM. Proactive safety hazard identification using visual–text semantic similarity for construction safety management. Automation in Construction, 2024, 166105602.

[44]

Workersafety. (2023). Worker safety_v1 Dataset. Roboflow Universe. https://universe.roboflow.com/workersafety/worker-safety_v1. Accessed 2 July 2024

[45]

YeumCM, ChoiJ, DykeSJ. Automated region-of-interest localization and classification for vision-based visual assessment of civil infrastructure. Structural Health Monitoring, 2019, 18(3): 675-689.

[46]

YuZ, GongY. ChatGPT, AI-generated content, and engineering management. Frontiers of Engineering Management, 2024, 11(1): 159-166.

[47]

YuanY, ZhanY, XiongZ. Parameter-efficient transfer learning for remote sensing image-text retrieval. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 1-14.

[48]

ZhaiG, XuY, SpencerBF. Bidirectional graphics-based digital twin framework for quantifying seismic damage of structures using deep learning networks. Structural Health Monitoring, 2024.

[49]

Zhan, Y., Xiong, Z., Yuan, Y. (2024). SkyEyeGPT: Unifying remote sensing vision-language tasks via instruction tuning with large language model. http://arxiv.org/abs/2401.09712

RIGHTS & PERMISSIONS

The Author(s)

AI Summary AI Mindmap
PDF

245

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/