Utilizing large language models for semantic enrichment of infrastructure condition data: a comparative study of GPT and Llama models
Lea Höltgen , Sven Zentgraf , Philipp Hagedorn , Markus König
AI in Civil Engineering ›› 2025, Vol. 4 ›› Issue (1) : 14
Utilizing large language models for semantic enrichment of infrastructure condition data: a comparative study of GPT and Llama models
Relational databases containing construction-related data are widely used in the Architecture, Engineering, and Construction (AEC) industry to manage diverse datasets, including project management and building-specific information. This study explores the use of large language models (LLMs) to convert construction data from relational databases into formal semantic representations, such as the resource description framework (RDF). Transforming this data into RDF-encoded knowledge graphs enhances interoperability and enables advanced querying capabilities. However, existing methods like R2RML and Direct Mapping face significant challenges, including the need for domain expertise and scalability issues. LLMs, with their advanced natural language processing capabilities, offer a promising solution by automating the conversion process, reducing the reliance on expert knowledge, and semantically enriching data through appropriate ontologies. This paper evaluates the potential of four LLMs (two versions of GPT and Llama) to enhance data enrichment workflows in the construction industry and examines the limitations of applying these models to large-scale datasets.
Architecture, Engineering, and Construction (AEC) / Large language models / Relational databases / Semantic enrichment / R2RML / Llama / GPT
| [1] |
|
| [2] |
Amini, R., Norouzi, S.S., Hitzler, P., & Amini, R. (2024). Towards complex ontology alignment using large language models. https://arxiv.org/abs/2404.10329. Accessed 18 Nov 2024. |
| [3] |
Arenas, M., Bertails, A., Prud’hommeaux, E., & Sequeda, J. (2012). A direct mapping of relational data to RDF. In: W3C Recommendation. Retrieved July 5, 2024, from https://www.w3.org/TR/rdb-direct-mapping/ |
| [4] |
Beckett, D., Berners-Lee, T., Prud’hommeaux, E., & Carothers, G. (2014). RDF 1.1 Turtle. In: W3C Recommendation. Retrieved March 28, 2024, from https://www.w3.org/TR/turtle/ |
| [5] |
|
| [6] |
|
| [7] |
Brickley, D., & Guha, R.V. (2014). RDF Schema 1.1. In: W3C Recommendation. Retrieved March 28, 2024, from https://www.w3.org/TR/rdf11-schema/ |
| [8] |
|
| [9] |
|
| [10] |
Das, S., Sundara, S., & Cyganiak, R.(2012). R2RML: RDB to RDF Mapping Language. In: W3C Recommendation. Retrieved March 30, 2024, from https://www.w3.org/TR/r2rml/ |
| [11] |
|
| [12] |
Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., Goyal, A., Hartshorn, A., Yang, A., Mitra, A., Sravankumar, A., Korenev, A., Hinsvark, A., Rao, A., Zhang, A., Rodriguez, A., et al. (2024). The Llama 3 Herd of Models . https://arxiv.org/abs/2407.21783. Accessed 18 Nov 2024. |
| [13] |
Ehrlinger, L., & Wöß, W. (2016). Towards a definition of knowledge graphs. In: International Conference on Semantic Systems. https://ceur-ws.org/Vol-1695/paper4.pdf |
| [14] |
|
| [15] |
Embers, S., Zentgraf, S., Herbers, P., Faltin, B., Celik, F., König, M., Braun, J.-D., Steinjan, J., Schammler, D., Nieborowsk, S., & Holst, R.(2022). An artificial intelligence and mixed reality approach for optimizing the bridge inspection workflow. In: Proceedings of the 2022 European Conference on Computing in Construction. Computing in Construction, (vol. 3. European Council on Computing in Construction). https://doi.org/10.35490/EC3.2022.195 |
| [16] |
|
| [17] |
Frey, J., Meyer, L.-P., Arndt, N., Brei, F., & Bulert, K. (2023). Benchmarking the abilities of large language models for RDF knowledge graph creation and comprehension: How well do LLMs speak turtle?. https://arxiv.org/abs/2309.17122. Accessed 18 Nov 2024. |
| [18] |
Göbels, A., Rivadeneyra, F., & Beetz, J. (2023). Transfer of implicit semi-formal textual location descriptions in three-dimensional model contexts. In: Proceedings of the 2023 European Conference on Computing in Construction and the 40th International CIB W78 Conference (vol. 4. European Council on Computing in Construction) .https://doi.org/10.35490/EC3.2023.268 |
| [19] |
|
| [20] |
Guo, J., Du, L., Liu, H., Zhou, M., He, X., & Han, S. (2023). GPT4Graph: Can large language models understand graph structured data ? An Empirical evaluation and benchmarking. arXiv. https://arxiv.org/abs/2305.15066. Accessed 18 Nov 2024. |
| [21] |
|
| [22] |
Hamdan, A.-H., Bonduel, M., & Scherer, R.J. (2019). An ontological model for the representation of damage to constructions. CEUR Workshop Proceedings, Aachen, Germany. In: Proceedings of the 7th Linked Data in Architecture and Construction Workshop (LDAC). https://ceur-ws.org/Vol-2389/05paper.pdf |
| [23] |
Hazber, M.A.G., Li, R., Xu, G., & Alalayah, K.M. (2016). An approach for automatically Generating R2RML-based direct mapping from relational databases. In: Social computing. Communications in computer and information science (vol. 623, Springer. pp. 151–169). https://doi.org/10.1007/978-981-10-2053-7_15 |
| [24] |
Heise, I., Göbels, A., Borrmann, A., & Beetz, J. (2024). Enabling comprehensive querying of road and civil structure data using graph-based methods. In: Proceedings of the 41st International Conference of CIB W78. https://itc.scix.net/paper/w78-2024-18 |
| [25] |
Hertling, S., & Paulheim, H. (2023). OLaLa: Ontology matching with large language models. In: Proceedings of the 12th Knowledge Capture Conference 2023 (ACM, pp. 131–139). https://doi.org/10.1145/3587259.3627571 |
| [26] |
Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., Melo, G.d., Gutierrez, C., Kirrane, S., Gayo, J.E.L., Navigli, R., Neumaier, S., Polleres, A., Rashid, S., Rula, A., Zimmermann, A., Schmelzeisen, L., Ngomo, A.-C.N., Sequeda, J., & Staab, S. (2022). Knowledge graphs. In: Synthesis lectures on data, semantics, and knowledge edn (Springer) https://doi.org/10.1007/978-3-031-01918-0 |
| [27] |
Hong, Z., Yuan, Z., Zhang, Q., Chen, H., Dong, J., Huang, F., & Huang, X. (2024). Next-generation database interfaces: a survey of LLM-based text-to-SQL. https://arxiv.org/abs/2406.08426. Accessed 18 Nov 2024. |
| [28] |
Hu, Y., Zhang, Z., & Zhao, L. (2023). Beyond text: A deep dive into large language models’ ability on understanding graph data. https://doi.org/10.48550/ARXIV.2310.04944. https://arxiv.org/abs/2310.04944 |
| [29] |
Iglesias-Molina, A., Van Assche, D., Arenas-Guerrero, J., De Meester, B., Debruyne, C., Jozashoori, S., Maria, P., Michel, F., Chaves-Fraga, D., & Dimou, A. (2023). The RML ontology: A community-driven modular redesign after a decade of experience in mapping heterogeneous data to RDF. In: The Semantic Web – ISWC 2023 (Springer, pp. 152–175). https://doi.org/10.1007/978-3-031-47243-5_9 |
| [30] |
Jiang, B., Xie, Y., Hao, Z., Wang, X., Mallick, T., Su, W.J., Taylor, C.J., & Roth, D. (2024). A peek into token bias: Large language models are not yet genuine reasoners https://arxiv.org/abs/2406.11050. Accessed 18 Nov 2024. |
| [31] |
Jiang, J., Zhou, K., Dong, Z., Ye, K., Zhao, W.X., & Wen, J.-R. (2023). StructGPT: A general framework for large language model to reason over structured data. https://arxiv.org/abs/2305.09645. Accessed 18 Nov 2024. |
| [32] |
|
| [33] |
|
| [34] |
Klyne, G., Carroll, J.J., & McBride, B.(2004). Resource description framework (RDF): Concepts and abstract syntax. In: W3C Recommendation. Retrieved March 27, 2024, from https://www.w3.org/TR/rdf10-concepts/ |
| [35] |
Knublauch, H., & Kontokostas, D. (2017). Shapes constraint language (SHACL): W3C recommendation 20 July 2017. Retrieved March 28, 2024, from https://www.w3.org/TR/shacl/ |
| [36] |
Lin, W., Babyn, P., Yan, Y., & Zhang, W. (2023). Context-based ontology modelling for database: Enabling ChatGPT for semantic database management. https://arxiv.org/abs/2303.07351. Accessed 18 Nov 2024. |
| [37] |
Liu, L., Hagedorn, P., & König, M.(2022). BIM-based organization of inspection data using semantic web technology for infrastructure asset management. In: Proceedings of the 1st Conference of the European Association on Quality Control of Bridges and Structures (Springer, pp. 1117–1126). https://doi.org/10.1007/978-3-030-91877-4_127 |
| [38] |
Marvin, G., Hellen, N., Jjingo, D., Nakatumba-Nabende, J. (2024). Prompt engineering in large language models. In: Data Intelligence and Cognitive Informatics (Springer, pp. 387–402). https://doi.org/10.1007/978-981-99-7962-2_30 |
| [39] |
Mayer, H. (2018). Digitalization of legacy building data - preparation of printed building plans for the BIM process. In: Proceedings of the 7th International Conference on Smart Cities and Green ICT Systems - SMARTGREENS (pp. 304–310) https://doi.org/10.5220/0006783103040310 |
| [40] |
Meester, B.D., Heyvaert, P., & Delva, T.(2022). RDF mapping language (RML). In: IDLab. Retrieved from April 4, 2024, from https://rml.io/specs/rml/. Accessed 18 Nov 2024. |
| [41] |
|
| [42] |
Meyer, L.-P., Frey, J., Junghanns, K., Brei, F., Bulert, K., Gründer-Fahrer, S., & Martin, M. (2023). Developing a scalable benchmark for assessing large language models in knowledge graph engineering. https://arxiv.org/abs/2308.16622. Accessed 18 Nov 2024. |
| [43] |
Minaee, S., Mikolov, T., Nikzad, N., Chenaghlu, M., Socher, R., Amatriain, X., & Gao, J. (2024). Large language models: A survey. https://arxiv.org/abs/2402.06196. Accessed 18 Nov 2024. |
| [44] |
Minsky, M. (1974). A framework for representing knowledge. In: Technical report, Massachusetts Institute of Technology, Artificial Intelligence Laboratory. |
| [45] |
|
| [46] |
Naveed, H., Khan, A.U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N., & Mian, A. (2024). A comprehensive overview of large language models. https://arxiv.org/abs/2307.06435. Accessed 18 Nov 2024. |
| [47] |
OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., Avila, R., Babuschkin, I., Balaji, S., Balcom, V., Baltescu, P., Bao, H., Bavarian, M., Belgum, J., Bello, I., Berdine, J., et al. (2024). GPT-4 Technical Report. https://arxiv.org/abs/2303.08774. Accessed 18 Nov 2024. |
| [48] |
OpenAI: Completions. Retrieved August 14, 2024, from https://platform.openai.com/docs/api-reference/completions. Accessed 18 Nov 2024. |
| [49] |
Parciak, M., Vandevoort, B., Neven, F., Peeters, L.M., & Vansummeren, S. (2024). Schema matching with large language models: An experimental study. https://arxiv.org/abs/2407.11852. Accessed 18 Nov 2024. |
| [50] |
|
| [51] |
Pauwels, P., Costin, A., & Rasmussen, M.H. (2022). Knowledge graphs and linked data for the built environment. In: Industry 4.0 for the Built Environment (Springer eBook Collection. Springer International and Springer International Publishing, pp. 157–183). https://doi.org/10.1007/978-3-030-82430-3_7 |
| [52] |
|
| [53] |
Peeters, R., & Bizer, C. (2024). Entity matching using large language models. https://arxiv.org/abs/2310.11244. Accessed 18 Nov 2024. |
| [54] |
Peng, C., Xia, F., Naseriparsa, M., & Osborne, F. (2023). Knowledge graphs: Opportunities and challenges. Artificial Intelligence Review, 56(12), 13071–13102. https://doi.org/10.1007/s10462-023-10465-9. |
| [55] |
|
| [56] |
|
| [57] |
Rajkumar, N., Li, R., & Bahdanau, D. (2022). Evaluating the text-to-SQL capabilities of large language models. https://arxiv.org/abs/2204.00498. Accessed 18 Nov 2024. |
| [58] |
|
| [59] |
Ratnayake, H., & Wang, C. (2024). A prompting framework to enhance language model output. In: AI 2023: Advances in Artificial Intelligence (Springer, pp. 66–81). https://doi.org/10.1007/978-981-99-8391-9_6 |
| [60] |
Replicate: Meta//meta-llama-3.1-405b-instruct. Retrieved August 14, 2024, from https://replicate.com/meta/meta-llama-3.1-405b-instruct. Accessed 14 Aug 2024. |
| [61] |
Replicate: Meta/meta-llama-3-70b-instruct. Retrieved August 14, 2024, from https://replicate.com/meta/meta-llama-3-70b-instruct |
| [62] |
Schneider, M., Carroll, J., Herman, I., & Patel-Schneider, P. (2012). OWL 2 web ontology language RDF-based semantics (Second Edition): W3C Recommendation 11 December 2012. Retrieved March 28, 2024, from https://www.w3.org/TR/2012/REC-owl2-syntax-20121211/. Accessed 28 Mar 2024. |
| [63] |
Sequeda, J., Allemang, D., & Jacob, B. (2023). A benchmark to understand the role of knowledge graphs on large language model’s accuracy for question answering on enterprise SQL databases. https://arxiv.org/abs/2311.07509. Accessed 18 Nov 2024. |
| [64] |
Sequeda, J.F. (2013). On the semantics of R2RML and its relationship with the direct mapping. In: Proceedings of the 12th International Semantic Web Conference (Posters & Demonstrations Track) (vol. 1035, pp. 193–196) |
| [65] |
|
| [66] |
Sheetrit, E., Brief, M., Mishaeli, M., & Elisha, O. (2024). ReMatch: Retrieval enhanced schema matching with LLMs. https://arxiv.org/abs/2403.01567. Accessed 18 Nov 2024. |
| [67] |
|
| [68] |
Stevens, N., Lydon, M., Campbell, K., Neeson, T., Marshall, A., & Taylor, S. (2020). Conversion of legacy inspection data to bridge condition index (BCI) to establish baseline deterioration condition history for predictive maintenance models. In: Civil Engineering Research in Ireland 2020. https://sword.cit.ie/ceri/2020/3/2 |
| [69] |
Sun, R., Arik, S., Muzio, A., Miculicich, L., Gundabathula, S., Yin, P., Dai, H., Nakhost, H., Sinha, R., Wang, Z., & Pfister, T. (2024). SQL-PaLM: Improved large language model adaptation for Text-to-SQL (extended). https://arxiv.org/abs/2306.00739. Accessed 18 Nov 2024. |
| [70] |
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In: Proceedings 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. arxiv1706.03762 |
| [71] |
Wang, H., Feng, S., He, T., Tan, Z., Han, X., & Tsvetkov, Y. (2023). Can language models solve graph problems in natural language? . https://arxiv.org/abs/2305.10037 |
| [72] |
Wang, S., Yan, J., Liu, Y., Hu, P., Cai, H., & Jiang, L. (2024). Parallel construction of knowledge graphs from relational databases. In: PRICAI 2023: Trends in Artificial Intelligence (Springer, pp. 467–479). https://doi.org/10.1007/978-981-99-7019-3_42 |
| [73] |
|
| [74] |
|
| [75] |
Wu, X., & Tsioutsiouliklis, K. (2024). Thinking with knowledge graphs: Enhancing LLM reasoning through structured data. https://arxiv.org/abs/2412.10654. Accessed 18 Nov 2024. |
| [76] |
Zhang, B., Ye, Y., Du, G., Hu, X., Li, Z., Yang, S., Liu, C.H., Zhao, R., Li, Z., & Mao, H. (2024). Benchmarking the text-to-SQL capability of large language models: A comprehensive evaluation. https://arxiv.org/abs/2403.02951. Accessed 18 Nov 2024. |
| [77] |
Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., Liu, P., Nie, J.-Y., & Wen, J.-R. (2023). A survey of large language models. https://arxiv.org/abs/2303.18223. Accessed 18 Nov 2024. |
| [78] |
Zhu, Y., Wang, X., Chen, J., Qiao, S., Ou, Y., Yao, Y., Deng, S., Chen, H., & Zhang, N. (2023). LLMs for knowledge graph construction and reasoning: Recent capabilities and future opportunities. https://arxiv.org/abs/2305.13168. Accessed 18 Nov 2024. |
The Author(s)
/
| 〈 |
|
〉 |