Utilizing large language models for semantic enrichment of infrastructure condition data: a comparative study of GPT and Llama models

Lea Höltgen; Sven Zentgraf; Philipp Hagedorn; Markus König

doi:10.1007/s43503-025-00055-9

AI in Civil Engineering ›› 2025, Vol. 4 ›› Issue (1) :14 DOI: 10.1007/s43503-025-00055-9

Original Article

Utilizing large language models for semantic enrichment of infrastructure condition data: a comparative study of GPT and Llama models

Author information +

History +

PDF

Abstract

Relational databases containing construction-related data are widely used in the Architecture, Engineering, and Construction (AEC) industry to manage diverse datasets, including project management and building-specific information. This study explores the use of large language models (LLMs) to convert construction data from relational databases into formal semantic representations, such as the resource description framework (RDF). Transforming this data into RDF-encoded knowledge graphs enhances interoperability and enables advanced querying capabilities. However, existing methods like R2RML and Direct Mapping face significant challenges, including the need for domain expertise and scalability issues. LLMs, with their advanced natural language processing capabilities, offer a promising solution by automating the conversion process, reducing the reliance on expert knowledge, and semantically enriching data through appropriate ontologies. This paper evaluates the potential of four LLMs (two versions of GPT and Llama) to enhance data enrichment workflows in the construction industry and examines the limitations of applying these models to large-scale datasets.

Keywords

Architecture, Engineering, and Construction (AEC) / Large language models / Relational databases / Semantic enrichment / R2RML / Llama / GPT

Cite this article

Download citation ▾

Lea Höltgen, Sven Zentgraf, Philipp Hagedorn, Markus König. Utilizing large language models for semantic enrichment of infrastructure condition data: a comparative study of GPT and Llama models. AI in Civil Engineering, 2025, 4(1): 14 DOI:10.1007/s43503-025-00055-9

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Abu BakarH, RazaliR, JambariDI. A qualitative study of legacy systems modernisation for citizen-centric digital government. Sustainability, 2022

[2]	Amini, R., Norouzi, S.S., Hitzler, P., & Amini, R. (2024). Towards complex ontology alignment using large language models. https://arxiv.org/abs/2404.10329. Accessed 18 Nov 2024.

[3]	Arenas, M., Bertails, A., Prud’hommeaux, E., & Sequeda, J. (2012). A direct mapping of relational data to RDF. In: W3C Recommendation. Retrieved July 5, 2024, from https://www.w3.org/TR/rdb-direct-mapping/

[4]	Beckett, D., Berners-Lee, T., Prud’hommeaux, E., & Carothers, G. (2014). RDF 1.1 Turtle. In: W3C Recommendation. Retrieved March 28, 2024, from https://www.w3.org/TR/turtle/

[5]	BilalM, OyedeleLO, QadirJ, MunirK, AjayiSO, AkinadeOO, OwolabiHA, AlakaHA, PashaM. Big data in the construction industry: A review of present status, opportunities, and future trends. Advanced Engineering Informatics, 2016, 30(3): 500-521

[6]	BorrmannA, KönigM, KochC, BeetzJ Building Information Modeling: Technology Foundations and Industry Practice, 2018 1 Springer

[7]	Brickley, D., & Guha, R.V. (2014). RDF Schema 1.1. In: W3C Recommendation. Retrieved March 28, 2024, from https://www.w3.org/TR/rdf11-schema/

[8]	ChangY, WangX, WangJ, WuY, YangL, ZhuK, ChenH, YiX, WangC, WangY, YeW, ZhangY, ChangY, YuPS, YangQ, XieX. A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology, 2024

[9]	CoddEF. A relational model of data for large shared data banks. Communications of the ACM, 1970, 13(6): 377-387

[10]	Das, S., Sundara, S., & Cyganiak, R.(2012). R2RML: RDB to RDF Mapping Language. In: W3C Recommendation. Retrieved March 30, 2024, from https://www.w3.org/TR/r2rml/

[11]	De GiacomoG, LemboD, LenzeriniM, PoggiA, RosatiR FlescaS, GrecoS, MasciariE, SaccàD. Using ontologies for semantic data integration. A comprehensive guide through the Italian database research over the last 25 years: studies in big data, 2018 Springer International Publishing and Imprint 187-202 1

[12]

Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., Goyal, A., Hartshorn, A., Yang, A., Mitra, A., Sravankumar, A., Korenev, A., Hinsvark, A., Rao, A., Zhang, A., Rodriguez, A., et al. (2024). The Llama 3 Herd of Models . https://arxiv.org/abs/2407.21783. Accessed 18 Nov 2024.

[13]	Ehrlinger, L., & Wöß, W. (2016). Towards a definition of knowledge graphs. In: International Conference on Semantic Systems. https://ceur-ws.org/Vol-1695/paper4.pdf

[14]	ElmasriR, NavatheSB Fundamentals of database systems—global edition, 2016 7 Pearson

[15]

Embers, S., Zentgraf, S., Herbers, P., Faltin, B., Celik, F., König, M., Braun, J.-D., Steinjan, J., Schammler, D., Nieborowsk, S., & Holst, R.(2022). An artificial intelligence and mixed reality approach for optimizing the bridge inspection workflow. In: Proceedings of the 2022 European Conference on Computing in Construction. Computing in Construction, (vol. 3. European Council on Computing in Construction). https://doi.org/10.35490/EC3.2022.195

[16]

FreundM, DorschR, SchmidS, WehrT, HarthA TiwariS, Villazón-TerrazasB, Ortiz-RodríguezF, SahriS. Enriching RDF data with LLM based named entity recognition and linking on embedded natural language annotations. knowledge graphs and semantic web lecture notes in computer science, 2025 Springer Nature 109-122 15459

[17]	Frey, J., Meyer, L.-P., Arndt, N., Brei, F., & Bulert, K. (2023). Benchmarking the abilities of large language models for RDF knowledge graph creation and comprehension: How well do LLMs speak turtle?. https://arxiv.org/abs/2309.17122. Accessed 18 Nov 2024.

[18]

Göbels, A., Rivadeneyra, F., & Beetz, J. (2023). Transfer of implicit semi-formal textual location descriptions in three-dimensional model contexts. In: Proceedings of the 2023 European Conference on Computing in Construction and the 40th International CIB W78 Conference (vol. 4. European Council on Computing in Construction) .https://doi.org/10.35490/EC3.2023.268

[19]	GruberTR. A translation approach to portable ontology specifications. Knowledge Acquisition, 1993, 5(2): 199-220

[20]	Guo, J., Du, L., Liu, H., Zhou, M., He, X., & Han, S. (2023). GPT4Graph: Can large language models understand graph structured data ? An Empirical evaluation and benchmarking. arXiv. https://arxiv.org/abs/2305.15066. Accessed 18 Nov 2024.

[21]	HagedornP, LiuL, KönigM, HajdinR, BlumenfeldT, StöcknerM, BillmaierM, GrossauerK, GavinK. BIM-enabled infrastructure asset management using information containers and semantic web. ASCE Journal of Computing in Civil Engineering, 2023

[22]	Hamdan, A.-H., Bonduel, M., & Scherer, R.J. (2019). An ontological model for the representation of damage to constructions. CEUR Workshop Proceedings, Aachen, Germany. In: Proceedings of the 7th Linked Data in Architecture and Construction Workshop (LDAC). https://ceur-ws.org/Vol-2389/05paper.pdf

[23]

Hazber, M.A.G., Li, R., Xu, G., & Alalayah, K.M. (2016). An approach for automatically Generating R2RML-based direct mapping from relational databases. In: Social computing. Communications in computer and information science (vol. 623, Springer. pp. 151–169). https://doi.org/10.1007/978-981-10-2053-7_15

[24]	Heise, I., Göbels, A., Borrmann, A., & Beetz, J. (2024). Enabling comprehensive querying of road and civil structure data using graph-based methods. In: Proceedings of the 41st International Conference of CIB W78. https://itc.scix.net/paper/w78-2024-18

[25]	Hertling, S., & Paulheim, H. (2023). OLaLa: Ontology matching with large language models. In: Proceedings of the 12th Knowledge Capture Conference 2023 (ACM, pp. 131–139). https://doi.org/10.1145/3587259.3627571

[26]

Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., Melo, G.d., Gutierrez, C., Kirrane, S., Gayo, J.E.L., Navigli, R., Neumaier, S., Polleres, A., Rashid, S., Rula, A., Zimmermann, A., Schmelzeisen, L., Ngomo, A.-C.N., Sequeda, J., & Staab, S. (2022). Knowledge graphs. In: Synthesis lectures on data, semantics, and knowledge edn (Springer) https://doi.org/10.1007/978-3-031-01918-0

[27]	Hong, Z., Yuan, Z., Zhang, Q., Chen, H., Dong, J., Huang, F., & Huang, X. (2024). Next-generation database interfaces: a survey of LLM-based text-to-SQL. https://arxiv.org/abs/2406.08426. Accessed 18 Nov 2024.

[28]	Hu, Y., Zhang, Z., & Zhao, L. (2023). Beyond text: A deep dive into large language models’ ability on understanding graph data. https://doi.org/10.48550/ARXIV.2310.04944. https://arxiv.org/abs/2310.04944

[29]

Iglesias-Molina, A., Van Assche, D., Arenas-Guerrero, J., De Meester, B., Debruyne, C., Jozashoori, S., Maria, P., Michel, F., Chaves-Fraga, D., & Dimou, A. (2023). The RML ontology: A community-driven modular redesign after a decade of experience in mapping heterogeneous data to RDF. In: The Semantic Web – ISWC 2023 (Springer, pp. 152–175). https://doi.org/10.1007/978-3-031-47243-5_9

[30]	Jiang, B., Xie, Y., Hao, Z., Wang, X., Mallick, T., Su, W.J., Taylor, C.J., & Roth, D. (2024). A peek into token bias: Large language models are not yet genuine reasoners https://arxiv.org/abs/2406.11050. Accessed 18 Nov 2024.

[31]	Jiang, J., Zhou, K., Dong, Z., Ye, K., Zhao, W.X., & Wen, J.-R. (2023). StructGPT: A general framework for large language model to reason over structured data. https://arxiv.org/abs/2305.09645. Accessed 18 Nov 2024.

[32]	KambhampatiS. Can large language models reason and plan?. Annals of the New York Academy of Sciences, 2024, 1534(1): 15-18

[33]	KhalilA, StravoravdisS, BackesD. Categorisation of building data in the digital documentation of heritage buildings. Applied Geomatics, 2021, 13(1): 29-54

[34]	Klyne, G., Carroll, J.J., & McBride, B.(2004). Resource description framework (RDF): Concepts and abstract syntax. In: W3C Recommendation. Retrieved March 27, 2024, from https://www.w3.org/TR/rdf10-concepts/

[35]	Knublauch, H., & Kontokostas, D. (2017). Shapes constraint language (SHACL): W3C recommendation 20 July 2017. Retrieved March 28, 2024, from https://www.w3.org/TR/shacl/

[36]	Lin, W., Babyn, P., Yan, Y., & Zhang, W. (2023). Context-based ontology modelling for database: Enabling ChatGPT for semantic database management. https://arxiv.org/abs/2303.07351. Accessed 18 Nov 2024.

[37]

Liu, L., Hagedorn, P., & König, M.(2022). BIM-based organization of inspection data using semantic web technology for infrastructure asset management. In: Proceedings of the 1st Conference of the European Association on Quality Control of Bridges and Structures (Springer, pp. 1117–1126). https://doi.org/10.1007/978-3-030-91877-4_127

[38]	Marvin, G., Hellen, N., Jjingo, D., Nakatumba-Nabende, J. (2024). Prompt engineering in large language models. In: Data Intelligence and Cognitive Informatics (Springer, pp. 387–402). https://doi.org/10.1007/978-981-99-7962-2_30

[39]	Mayer, H. (2018). Digitalization of legacy building data - preparation of printed building plans for the BIM process. In: Proceedings of the 7th International Conference on Smart Cities and Green ICT Systems - SMARTGREENS (pp. 304–310) https://doi.org/10.5220/0006783103040310

[40]	Meester, B.D., Heyvaert, P., & Delva, T.(2022). RDF mapping language (RML). In: IDLab. Retrieved from April 4, 2024, from https://rml.io/specs/rml/. Accessed 18 Nov 2024.

[41]	MeierA, KaufmannM SQL & NoSQL databases—models, languages, consistency options and architectures for big data management, 2019 Springer

[42]	Meyer, L.-P., Frey, J., Junghanns, K., Brei, F., Bulert, K., Gründer-Fahrer, S., & Martin, M. (2023). Developing a scalable benchmark for assessing large language models in knowledge graph engineering. https://arxiv.org/abs/2308.16622. Accessed 18 Nov 2024.

[43]	Minaee, S., Mikolov, T., Nikzad, N., Chenaghlu, M., Socher, R., Amatriain, X., & Gao, J. (2024). Large language models: A survey. https://arxiv.org/abs/2402.06196. Accessed 18 Nov 2024.

[44]	Minsky, M. (1974). A framework for representing knowledge. In: Technical report, Massachusetts Institute of Technology, Artificial Intelligence Laboratory.

[45]	ModoniGE, DoukasM, TerkajW, SaccoM, MourtzisD. Enhancing factory data integration through the development of an ontology: from the reference models reuse to the semantic conversion of the legacy models. International Journal of Computer Integrated Manufacturing, 2017, 30(10): 1043-1059

[46]	Naveed, H., Khan, A.U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N., & Mian, A. (2024). A comprehensive overview of large language models. https://arxiv.org/abs/2307.06435. Accessed 18 Nov 2024.

[47]

OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., Avila, R., Babuschkin, I., Balaji, S., Balcom, V., Baltescu, P., Bao, H., Bavarian, M., Belgum, J., Bello, I., Berdine, J., et al. (2024). GPT-4 Technical Report. https://arxiv.org/abs/2303.08774. Accessed 18 Nov 2024.

[48]	OpenAI: Completions. Retrieved August 14, 2024, from https://platform.openai.com/docs/api-reference/completions. Accessed 18 Nov 2024.

[49]	Parciak, M., Vandevoort, B., Neven, F., Peeters, L.M., & Vansummeren, S. (2024). Schema matching with large language models: An experimental study. https://arxiv.org/abs/2407.11852. Accessed 18 Nov 2024.

[50]	PaulheimH. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web, 2016, 8: 489-508

[51]

Pauwels, P., Costin, A., & Rasmussen, M.H. (2022). Knowledge graphs and linked data for the built environment. In: Industry 4.0 for the Built Environment (Springer eBook Collection. Springer International and Springer International Publishing, pp. 157–183). https://doi.org/10.1007/978-3-030-82430-3_7

[52]	PauwelsP, ZhangS, LeeY-C. Semantic web technologies in AEC industry: A literature overview. Automation in Construction, 2017, 73: 145-165

[53]	Peeters, R., & Bizer, C. (2024). Entity matching using large language models. https://arxiv.org/abs/2310.11244. Accessed 18 Nov 2024.

[54]	Peng, C., Xia, F., Naseriparsa, M., & Osborne, F. (2023). Knowledge graphs: Opportunities and challenges. Artificial Intelligence Review, 56(12), 13071–13102. https://doi.org/10.1007/s10462-023-10465-9.

[55]	PregnolatoM, GunnerS, VoyagakiE, RisiR, CarhartN, GavrielG, TullyP, TryfonasT, MacdonaldJ, TaylorC. Towards civil engineering 4.0: Concept, workflow and application of digital twins for existing infrastructure. Automation in Construction, 2022, 141, ArticleID: 104421

[56]	RahmE, BernsteinPA. A survey of approaches to automatic schema matching. The VLDB Journal, 2001, 10(4): 334-350

[57]	Rajkumar, N., Li, R., & Bahdanau, D. (2022). Evaluating the text-to-SQL capabilities of large language models. https://arxiv.org/abs/2204.00498. Accessed 18 Nov 2024.

[58]	RasmussenMH, LefrançoisM, SchneiderGF, PauwelsP. BOT: The building topology ontology of the W3C linked building data group. Semantic Web, 2021, 12: 143-161

[59]	Ratnayake, H., & Wang, C. (2024). A prompting framework to enhance language model output. In: AI 2023: Advances in Artificial Intelligence (Springer, pp. 66–81). https://doi.org/10.1007/978-981-99-8391-9_6

[60]	Replicate: Meta//meta-llama-3.1-405b-instruct. Retrieved August 14, 2024, from https://replicate.com/meta/meta-llama-3.1-405b-instruct. Accessed 14 Aug 2024.

[61]	Replicate: Meta/meta-llama-3-70b-instruct. Retrieved August 14, 2024, from https://replicate.com/meta/meta-llama-3-70b-instruct

[62]	Schneider, M., Carroll, J., Herman, I., & Patel-Schneider, P. (2012). OWL 2 web ontology language RDF-based semantics (Second Edition): W3C Recommendation 11 December 2012. Retrieved March 28, 2024, from https://www.w3.org/TR/2012/REC-owl2-syntax-20121211/. Accessed 28 Mar 2024.

[63]	Sequeda, J., Allemang, D., & Jacob, B. (2023). A benchmark to understand the role of knowledge graphs on large language model’s accuracy for question answering on enterprise SQL databases. https://arxiv.org/abs/2311.07509. Accessed 18 Nov 2024.

[64]	Sequeda, J.F. (2013). On the semantics of R2RML and its relationship with the direct mapping. In: Proceedings of the 12th International Semantic Web Conference (Posters & Demonstrations Track) (vol. 1035, pp. 193–196)

[65]

SequedaJF BertossiL, FaberW, GlimmB, GottlobG, IanniG, LemboD, StaabS. Integrating relational databases with the semantic web: A reflection. Reasoning web. Semantic interoperability on the web. Information systems and applications, incl. internet/web, and HCI, 2017 Springer International Publishing and Imprint 68-160

[66]	Sheetrit, E., Brief, M., Mishaeli, M., & Elisha, O. (2024). ReMatch: Retrieval enhanced schema matching with LLMs. https://arxiv.org/abs/2403.01567. Accessed 18 Nov 2024.

[67]	SowaJF Conceptual structures: Information processing in mind and machine, 1984 Addison-Wesley Longman Publishing Co. Inc

[68]

Stevens, N., Lydon, M., Campbell, K., Neeson, T., Marshall, A., & Taylor, S. (2020). Conversion of legacy inspection data to bridge condition index (BCI) to establish baseline deterioration condition history for predictive maintenance models. In: Civil Engineering Research in Ireland 2020. https://sword.cit.ie/ceri/2020/3/2

[69]	Sun, R., Arik, S., Muzio, A., Miculicich, L., Gundabathula, S., Yin, P., Dai, H., Nakhost, H., Sinha, R., Wang, Z., & Pfister, T. (2024). SQL-PaLM: Improved large language model adaptation for Text-to-SQL (extended). https://arxiv.org/abs/2306.00739. Accessed 18 Nov 2024.

[70]	Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In: Proceedings 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. arxiv1706.03762

[71]	Wang, H., Feng, S., He, T., Tan, Z., Han, X., & Tsvetkov, Y. (2023). Can language models solve graph problems in natural language? . https://arxiv.org/abs/2305.10037

[72]	Wang, S., Yan, J., Liu, Y., Hu, P., Cai, H., & Jiang, L. (2024). Parallel construction of knowledge graphs from relational databases. In: PRICAI 2023: Trends in Artificial Intelligence (Springer, pp. 467–479). https://doi.org/10.1007/978-981-99-7019-3_42

[73]	WangY, TangP, LiuK, CaiJ, RenR, LinJJ, CaiH, ZhangJ, El-GoharyN, BergesM, Golparvar FardM. Characterizing data sharing in civil infrastructure engineering: Current practice, future vision, barriers, and promotion strategies. Journal of Computing in Civil Engineering, 2023

[74]	WerbrouckJ, PauwelsP, BeetzJ, VerborghR, MannensE. ConSolid: A federated ecosystem for heterogeneous multi-stakeholder projects. Semantic Web, 2023

[75]	Wu, X., & Tsioutsiouliklis, K. (2024). Thinking with knowledge graphs: Enhancing LLM reasoning through structured data. https://arxiv.org/abs/2412.10654. Accessed 18 Nov 2024.

[76]	Zhang, B., Ye, Y., Du, G., Hu, X., Li, Z., Yang, S., Liu, C.H., Zhao, R., Li, Z., & Mao, H. (2024). Benchmarking the text-to-SQL capability of large language models: A comprehensive evaluation. https://arxiv.org/abs/2403.02951. Accessed 18 Nov 2024.

[77]

Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., Liu, P., Nie, J.-Y., & Wen, J.-R. (2023). A survey of large language models. https://arxiv.org/abs/2303.18223. Accessed 18 Nov 2024.

[78]	Zhu, Y., Wang, X., Chen, J., Qiao, S., Ou, Y., Yao, Y., Deng, S., Chen, H., & Zhang, N. (2023). LLMs for knowledge graph construction and reasoning: Recent capabilities and future opportunities. https://arxiv.org/abs/2305.13168. Accessed 18 Nov 2024.